avoiding htaccess redirects

I have some bot traps and .htaccess ReWriteCond statements that send certain requests straight to the bot traps and send me an email to alert me.

I have some modified ReWriteCond statements that detect certain IP addresses and direct them to a landing page and send me an email.

All this works fine for me and others on other IP addresses, who have tested them for me.

Yesterday I checked my logs and saw a fairly aggressive four hour visit from an IP range (and ISP) that I am having trouble with.
This visitor (no useragent disclosed) seemed to have read robots.txt on an earlier date and was working through a (selected) list of the locations on the robots.txt disallow section - there is no way they could have known these non-existent directories unless they had read robotx.txt previously. This inluded several "non existent" locations on the site that were subject to .htaccess rewrite/redirect statements. But every time this visitor just got a 200 response. He didn't trigger the redirect, he didn't fall into any of the bot traps, I never got any emails, and his IP was never added to a "deny from" line in my .htaccess.

Here is a typical log entry - first of all from ME trying it all out this morning using Firefox browser. The directory /slides/ is in my robots.txt as a /Disallow directory, and any requests for /slides/ are redirected to a bot trap. I visited, and triggered the trap.

109.152.xx.xx - - [31/Aug/2010:08:17:04 +0200] "GET /slides/IMG_****.html HTTP/1.1" 200 326756 www.mydomain "-" "Mozilla/5.0 (Windows; U; Windows ************Gecko/20100722 Firefox/3.6.8" "-"
109.152.xx.xx - - [31/Aug/2010:08:17:04 +0200] "GET /favicon.ico HTTP/1.1" 403 - www.mydomain "-" "Mozilla/5.0 (Windows****************Gecko/20100722 Firefox/3.6.8" "-"

Yesterday my aggressive crawler did this:
78.145.xx.#*$! - - [30/Aug/2010:18:52:30 +0200] "GET http://www.mydomain/slides/IMG_4555.html HTTP/1.0" 200 645 - "-" "-" "-"
78.145.xx.#*$! - - [30/Aug/2010:18:52:31 +0200] "GET http://www.mydomain/slides/IMG_4556.html HTTP/1.0" 200 645 - "-" "-" "-"
78.145.xx.#*$! - - [30/Aug/2010:18:52:32 +0200] "GET http://www.mydomain/slides/IMG_4557.html HTTP/1.0" 200 645 - "-" "-" "-"

He carried on with this malarkey for about four hours including repeated attempted trawls through that (non existent) photo album, and other genuine areas of the site. He regularly requested urls that should have triggered traps but they didn't.

The things I notice are
- that the GET request gives the whole url (whereas when I go there the GET request leaves out the domain name)
- that he conceals his user agent (so I am suspicious)
- that his visits never get anything other than the parent html file, and don't download the other images etc on the pages he requests.
- that he is visiting areas that are non existent and listed as /Disallow in robots.txt - so he is up to no good.
-that even when he visits non-existent areas of the site, he gets a 200 response and not a 404
- that when he visits a trap he still gets a 200 and doesn't trigger the trap.

The four hours worth of log of his visit contains not a single HTTP 403 code, whereas if I visit to ask for any of the booby trapped pages and directories I get an immediate ban etc. I and someone else have checked in the last few hours - the traps DO work as designed for our visits - but not for his.

Here is the relevant bit from my .htaccess - from the beginning of the file to the end of the rewrite code. I've munged most of the lines except the one I'm referring to.

****************************************
Rewriteengine ON

RewriteRule ^$ /index.html [R,NC,L]
#
RewriteCond %{REQUEST_URI} !/trap/*****warning\.php$
RewriteCond %{REQUEST_URI} !/trap/****st\.php$
RewriteCond %{REQUEST_URI} !^/trap/****st\.php$
# should rewrite everything starting with *****
# except the warning.php

RewriteRule ^****/ /trap/****st.php [L]
RewriteRule ^slides/ /trap/****st.php [L]
RewriteRule ^*******.php /trap/****st.php [L]

*********************************************************

I would be grateful for any advice as to how this is being done. I have adapted most of my traps etc from info on this site. I do not speak either fluent php OR htaccess!

Many thanks.

# Redirect if hostname is present in requested URL-path (with several variations) and matches my domain RewriteCond $2 ^(www\.)?mydomain\.com$ [NC] RewriteRule ^/?(https?://)?([^.:/]+(\.[^.:/]+)+)\.?(:[0-9]+)?(/.*)?$ http://www.mydomain.com/$5 [R=301,L] # Else return 403 if someone else's domain is in there RewriteRule ^/?(https?://)?[^.:/]+(\.[^.:/]+)+) - [F]

# Ban requests with literal hyphens for either or both user-agent and referrer RewriteCond %{HTTP_USER_AGENT} ^-$ [OR] RewriteCond %{HTTP_REFERER} ^-$ RewriteRule ^ /trap-script.php [L] # # Block requests with blank user-agent and referrer RewriteCond %{HTTP_USER_AGENT}%{HTTP_REFERER} ^$ RewriteRule ^ - [F]

avoiding htaccess redirects

revrob

jdMorgan

revrob

wildbest

g1smd

revrob

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week