Forum Moderators: phranque

Message Too Old, No Replies

mod_rewrite GET /%20foobar.html

wrong GET from Yahoo! Slurp

         

xcomm

4:29 am on Jul 3, 2004 (gmt 0)

10+ Year Member



Hi All,

I get a lot of 404 in my logs from 'Yahoo! Slurp' cause this bot likes to prefix my url's like '/foobar.html' with spaces like '/%20foobar.html' for unknown reason (I never have used spaces in url's in my life ;-)).

66.196.90.94 - - [02/Jul/2004:14:15:50 +0200] "GET /%20foobar.html HTTP/1.0" 404 286 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]

So before trying to reach anyone human at Yahoo, I wanna use mod_rewrite to point this GET /%foobar.html to GET /foobar.html but I seem a little to dumb whith it:

RewriteEngine on
RewriteRule ^/%20([a-z]*)$ /$1 [R,L]

Thank you in advance!
Regards, Jan

jdMorgan

3:44 pm on Jul 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



xcomm,

Welcome to WebmasterWorld [webmasterworld.com]!

You could try escaping the percent sign, and using a more generic pattern to allow for slashes and periods in the local URL-path:


RewriteEngine on
RewriteRule ^/\%20(.+)$ /$1 [R=301,L]

Also, this uses a permanent (301) redirect to tell the search engines to correct the URL.
This code is intended for use in httpd.conf as written. For use in .htaccess, omit the leading slash on the pattern:

RewriteRule ^\%20(.+)$ /$1 [R=301,L]

Jim

xcomm

2:21 pm on Jul 4, 2004 (gmt 0)

10+ Year Member



Hi jdMorgan,

Thank you very much for your help! This works great!

One additional issue:

/%20foobar.html goes fine now to /foobar.html

but I moved some of the /foobar.html' deeper into the directory tree. So I had set up some right before.

Redirect /foobar.html /deeper/foobar.html

Now if the requested URL meets the requirements of the mod_rewrite condition above, the Redirect clauses do not come in place anymore...

Thank you in advance!

Regards, Jan