Forum Moderators: phranque

Message Too Old, No Replies

Serve 404 when ever a particular string is detected

         

asher02

12:49 pm on Feb 21, 2010 (gmt 0)

10+ Year Member



Just found out someone is posting links to my website in forums with adult keywords in the QUERY_STRING.
Url like http://www.example.com/page.htm?ref=adult keyword


So what I'm trying to do is to catch particular QUERY_STRING and serve a 404 site wide.

Serving a 301 is not my favorite for this issue , but since I don't know how to server 404 in this situation I used a 301 like this as a temp remedy.

RewriteCond %{QUERY_STRING} S...ex [NC]
rewriterule ^page\.htm /page.htm? [R=301,L]

This code will redirect to the same page with the offending keyword removed from the string, what I want is to be a able to server a 404 when ever the offending word is present in any url on my website.

Any help will be appreciated.

[edited by: tedster at 4:56 pm (utc) on Feb 21, 2010]
[edit reason] switch to example.com - it cannot be owned [/edit]

g1smd

5:13 pm on Feb 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If your site doesn't actually use query strings you can do this very simply with two lines of code in the .htaccess file:

RewriteCond %{QUERY_STRING} .+
RewriteRule (.*) http://www.example.com/$1? [R=301,L]



If you do use query strings, then it will be hard to maintain a list of all the possible word permutations. Instead your site script should check the requested URL is a valid one and return an error for any non-valid request.

g1smd

6:06 pm on Feb 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, you wanted a 404:

RewriteCond %{QUERY_STRING} .+
RewriteRule .* /non-existent-internal-path [L]


Rewriting to a non-existent internal location where file retrieval fails will force the 404 error.

For efficiency, the .* pattern might be better changed to \.html$ or whatever your page extensions are.

asher02

8:49 pm on Feb 21, 2010 (gmt 0)

10+ Year Member



Ty for the prompt response.

I was thinking of the option of eliminating all QUERY STRINGS but I do use them for the shopping cart and tracking.

The offender is now using a ?ref=offendingkeyword, so I'm going to dig into my log files and see if its a pattern or not. If it is I'll block just the ref=

In the mean time I used your suggestion:

RewriteCond %{QUERY_STRING} offendingkeyword [NC]
RewriteRule ^[^/.]+\.htm$ /no-file-exists-here.xyz [L]


and its working perfect. Only problem is that it will not work unless there is a htm page in the url, I want it to work on home page hence http://www.example.com/ as well as on http://www.example.com/folder/

BTW and off topic I found the offending link to my site in Google Webmaster Tools. Google told me I have a duplicate Meta Tag description issue with the original file and the one with the offending QUERY STRING. I can smell more troubles coming :(

g1smd

9:29 pm on Feb 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



^[^/.]+\.htm$


can be replaced with

^[b]([/b][^/.]+\.htm[b])?[/b]$


in order to work with URLs ending with .htm as well as root index "/" requests.

If you need it to also work with folder "example.com/folder/" requests then something like this might work:

^[b](([^/]+/)+|[/b][^/.]+\.htm[b])?[/b]$
where | is a 'pipe' symbol.

or something like:

^[b](([^/]+/)+([^.]+[/b]\.htm[b])?)?[/b]$


Some experimentation to find the most efficient pattern for the rule may be in order, since the rule will run once for every page request on your site.

This is also where I point out that defining "exactly" what it needs to do, in terms of all matching URL formats, is the very first vital step (way before thinking about code). As you can see, small changes in code make it do something completely different.

g1smd

11:49 pm on Feb 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Typo in that last example means it doesn't work for .htm pages in the root folder, so

^[b]([/b][^/]+/)[b]*[/b]([^.]+\.htm[b])?[/b]
$

would probably be better.

asher02

9:08 am on Feb 22, 2010 (gmt 0)

10+ Year Member



Working perfect, Thank you!