Welcome to WebmasterWorld Guest from 54.196.208.187

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Redirect spoof query string requests for HTML pages

How to stop junk requests for /foo.html?23Abz

     
4:46 pm on Dec 1, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Oct 26, 2004
posts:319
votes: 0


This isn't working:

RewriteCond %{REQUEST_URI} ^/(.*)\.([htm¦html¦shtml])$ [NC]
RewriteCond %{QUERY_STRING} ^([a-zA-Z0-9]+)$
RewriteRule ^(.*)$ [L,G]

I want to send a 'gone' or 404 response to requests for www.mysite.com/widget.html?Xzy03 and its ilk.

Thank you.

6:35 pm on Dec 1, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


You've got some spurious regex tokens in there, and an unneeded RewriteCond:

# If non-blank query string
RewriteCond %{QUERY_STRING} . [NC]
# on static filetype, then rewrite to nonexistent path to force a 404
RewriteRule \.s?html?$ /path_to_file_that_does_not_exist [NC,L]

Alternately, force a 410-Gone response:

# If non-blank query string
RewriteCond %{QUERY_STRING} . [NC]
# on static filetype, then force a 410 response
RewriteRule \.s?html?$ - [NC,G]

or "correct" the URL with a 301 redirect:

# If non-blank query string
RewriteCond %{QUERY_STRING} . [NC]
# on static filetype, then redirect to the same URL after stripping off the query string
RewriteRule ^([^.]+\.s?html?)$ http://www.example.com/$1? [NC,R=301,L]

I assume that you want to handle *any* query string attached to an .htm, .shtm, .html, or .shtml filetype. If that's not the case, then you can restore your original pattern of "^[a-z0-9]+$" with a NoCase [NC] flag.
However, that will reject a query string containing an "=" or a hyphen, or any other character that is not a-z, A-Z or 0-9, which is probably not what you want.

Also, if you use filepaths with periods in them other than the final one before the filetype, then change the pattern in the last rule to the much-less-efficient but less-selective "^(.+\.s?html?)$"

See also this recent thread: [webmasterworld.com...]

Jim

7:29 pm on Dec 1, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Oct 26, 2004
posts:319
votes: 0


Many thanks for a such a complete, succinct and considered reply. I'll try it immediately.

----

Later: I tried the 410 Gone code. Worked a treat!
However, it returned a message saying the 'true' file (/foo.htm) had gone, so I've plumped for the 404 response instead.
Just to be safe.

Thank you, JD.