Forum Moderators: phranque

Message Too Old, No Replies

detecting blank referrer with .htaccess

"-" versus ""

         

abates

12:41 am on May 3, 2006 (gmt 0)

10+ Year Member



Usually in my log files, a blank referrer is represented as "-", and this can be detected in .htaccess thusly:
RewriteCond %{HTTP_REFERER} ^$

However for some user agents, particular BecomeBot, the referrer is showing in the logs as "".

Does anyone know what this means (the complete absence of a referrer header perhaps?) and how I could differentiate between "-" and "" in a .htaccess condition?

UserFriendly

1:05 am on May 3, 2006 (gmt 0)

10+ Year Member



Does your existing line not detect an empty string? Maybe you're right and it's because the referrer is missing.

What happens if you reverse the logic and use:

RewriteCond %{HTTP_REFERER}!^.+$

(condition true if the referrer string does not contain something).

UserFriendly

1:06 am on May 3, 2006 (gmt 0)

10+ Year Member



For some reason I can't get a space to appear after the right curly bracket. When I preview or submit, the forum removes the space. No idea why it would do that.

abates

2:09 am on May 3, 2006 (gmt 0)

10+ Year Member



actually, after checking the documentation some more, I think "-" means that the referrer line was absent, and "" means that the referrer line was present but contained a blank value.

Userfriendly: RewriteCond %{HTTP_REFERER} ^$ appears to detect both cases, and I'd like to be able to differentiate between them. i.e. write a RewriteCond line which fires when the referrer is present but blank and not when the referrer is absent.

abates

3:37 am on May 3, 2006 (gmt 0)

10+ Year Member



After going so far as to look at the source code for mod_rewrite, I believe the answer to my question is "no". There's no way to tell the difference between "referrer header present but blank" and "referrer header not present". Bother.

UserFriendly

3:43 pm on May 3, 2006 (gmt 0)

10+ Year Member



Why do you need to differentiate between the two, if I may ask?

abates

9:58 pm on May 3, 2006 (gmt 0)

10+ Year Member



I have only two people hitting my site using whichever method produces the "" referrer: BecomeBot and a blog comment spammer. If I can detect the "" referrer seperate from "-" (hard) and filter out BecomeBot (easy) then I can hand them a 403 and stop them scraping my pages for comment forms.

I have found an alternate way to do this though it wasn't as effective as I hoped, because their script still has a default file that it tries to post to if it can't find a form.

UserFriendly

1:13 am on May 4, 2006 (gmt 0)

10+ Year Member



Do you mean they are submitting a POST request to a file that isn't even a script?

abates

2:09 am on May 4, 2006 (gmt 0)

10+ Year Member



Now that I've blocked their hits on my static files so they can't work out the comment script, they're trying to post to a .php file which doesn't exist on my site.

UserFriendly

1:12 pm on May 4, 2006 (gmt 0)

10+ Year Member



So long as the file doesn't exist, let them keep trying.

What you might want to do is use something like the following to reduce the bandwidth they consume each time they receive a 404 from your server:

Redirect 404 /targetfile.php
<Files targetfile.php>
ErrorDocument 404 "Not found
</Files>

(Note that there is no closing quote mark for the ErrorDocument string.)

I found this trick on a page about reducing the damage caused by the URL-squatting favicon.ico requests. Now, instead of a huge great HTML 404 page, the tiny string "Not found" is returned instead. This costs merely 22 bytes each time compared to the 800+ bytes that my HTML 404 page was sucking up.

You should be able to use this to minimise the cost of repeated requests for your non-existent PHP file.

abates

10:16 pm on May 4, 2006 (gmt 0)

10+ Year Member



I don't have any error pages set up for that site, so no data's transferred anyway.

I am considering that instead of blocking them altogether, I could hand them a dummy form with an action pointing to [nonexistantdomain.com...] Assuming their script falls for this and doesn't just try to post to [mysite.com...] it will save me bandwidth, because they are posting 11kb of data at my site each time. :)

UserFriendly

1:48 am on May 5, 2006 (gmt 0)

10+ Year Member



Some spamhack used my mailform to send junk mail to hundreds of addresses. I spotted the error I'd made in the code and then fixed it.

But for weeks afterwards, the spammer was sending occassional tests to the old script URL, then the new script URL. They became less and less frequent, so I'm hoping that the human at the other end has realised that it's not worth his time sending the requests to my domain anymore.

Hopefully the requests to your domain will fade away now you've fixed your problem.