homepage Welcome to WebmasterWorld Guest from 107.22.37.143
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Redirecting url with special characters
phpmaven




msg:4108382
 4:15 pm on Apr 1, 2010 (gmt 0)

I realize that this has been dealt with in quite a few threads, but I've tried all of the examples that I've seen, and I just can't get it to work.

I'm trying to redirect the following url:
http://www.example.com/> Blue Widgets</a>

Which is being seen by Apache as:
http://www.example.com/%3E%20Blue%20Widgets%3C/a%3E

I've tried all of the examples I've seen of escaping the url, but it still just 404s on me.

Any guidance would be greatly appreciated.

Mark

 

g1smd




msg:4108393
 4:46 pm on Apr 1, 2010 (gmt 0)

Have you got just one of these or a whole bunch?

I worked on a site last year that had lots of duff incoming links with extraneous spaces and punctuation in the URL.

For that, we created a landing page and then simply redirected any URL request with a space, bracket, comma, or % sign in it to that page - the worry being malicious incoming 'bad' links with 'bad' words in them that we might otherwise have 'corrected' to point at a real content page, and hence associate it with the unwanted 'bad' words.

We sacrificed the unindexed landing page for that. That page had links to major site sections, a few featured products, a button to report linking problems, and so on.

WebmasterTools was also great for finding duff incoming links from other sites. Do make sure you check both the www and non-www WMT reports for your site.

phpmaven




msg:4108398
 4:55 pm on Apr 1, 2010 (gmt 0)

Actually WebmasterTools is where I discovered it. It's just one incoming link, but I would like to still benefit from the "link juice" and not just 404 it. I certainly don't want to setup any rules that would just 301 any wacky url to my home page.

phpmaven




msg:4109000
 5:48 pm on Apr 2, 2010 (gmt 0)

I tried the following and it just 404s:

RewriteRule ^>\ Blue\ Widgets</a>
RewriteRule ^\%3E\%20Blue\%20Widgets\%3C/a\%3E

And various other combinations and I can't get it to work.

I would appreciate a bit of guidance.

Thanks,

Mark

jdMorgan




msg:4109034
 6:47 pm on Apr 2, 2010 (gmt 0)

You need to look at your raw server access log and see exactly what URL-path is being requested. It's evident from the rules that you posted that you aren't quite sure, and for mod_rewrite code, you need to be very sure...

If Apache isn't decoding this URL-path as expected, then the more-complex

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /\%(25)*3[Ee]\%(25)*20Blue\%(25)*20Widgets[^\ ]*\ HTTP/
RewriteRule Blue http://www.example.com/path-to-actual-blue-widget-page [R=301,L]

should work.

Note that the un-anchored RewriteRule pattern "Blue" is used to reduce the number of requests for which the RewriteCond must be processed. If this RewriteRule pattern doesn't match, the RewriteCond won't even be parsed. I used only "Blue" so that we can be sure that it will match, without concern for whether the surrounding characters are decoded or remain as URL-encoded entities.

The "(25)*" subpatterns appearing in the RewriteCond pattern allow for multiply-encoded characters. For example, all of %20 %2520, %25252520, and %25252525252520 will be decoded by Apache to a single space.

I also didn't exactly-match the 'tail' of the malformed URL-path in the RewriteCond pattern, matching it with the generic "zero or more characters, anything but a space" subpattern following "Widgets" and preceding the " HTTP/" at the end.

Once you get something working, you can go ahead and make the patterns more-specific if you like, to improve performance slightly.

Jim

phpmaven




msg:4109123
 9:13 pm on Apr 2, 2010 (gmt 0)

Thank you Jim,
As usual, you are "da' man" when it comes to anything Apache.

Mark

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved