Forum Moderators: phranque
GOAL: When a POST for /my/uri/here comes in that does not contain User-Agent and Referer, log to another transaction log, and deny the request.
##########################
## OFFENDER HITTING ME EVERY SECOND BAD LOG ENTRIES
##########################
216.x.x.x - - [03/Oct/2008:11:27:56 -0500] "POST /my/uri/here HTTP/1.1" 200 12417 "-" "-" POST /my/uri/here "" "HTTP/1.1"
216.x.x.x - - [03/Oct/2008:11:27:57 -0500] "POST /my/uri/here HTTP/1.1" 200 12417 "-" "-" POST /my/uri/here "" "HTTP/1.1"
216.x.x.x - - [03/Oct/2008:11:27:58 -0500] "POST /my/uri/here HTTP/1.1" 200 12417 "-" "-" POST /my/uri/here "" "HTTP/1.1"
##########################
## VALID USER TRANSACTION LOG ENTRY - WILL ALWAYS HAVE User-Agent and Referer.
##########################
76.x.x.x - - [03/Oct/2008:00:07:03 -0500] "POST /my/uri/here HTTP/1.1" 200 9714 "https://www.REMOVED.com/Referer" "Mozilla/4.0
(compatible; MSIE 7.0; Windows NT 5.1; InfoPath.2; .NET CLR 1.1.4322)" POST /my/uri/here "" "HTTP/1.1"
##########################
# MY FIRST ATTEMPT
##########################
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %m %U \"%q\" \"%H\"" myFormat
<Location /my/uri/here>
SetEnvIfNoCase User-Agent "-" is_attack
CustomLog logs/attack.log myFormat env=is_attack
# NOT SURE HOW TO DENY THIS. OFFENDING IP CHANGES WITH LEASE
</Location>
##########################
### READ A PREVIOUS THREAD BUT NOT SURE HOW I CAN USE THIS BELOW I DO HAVE REWRITE LOADED AND AVAILABLE.
##########################
## BLOCK *Faked* blank referer -OR- UA (malicious agents supply a literal hyphen as UA string)
#RewriteCond %{HTTP_REFERER}<->%{HTTP_user_agent} ^-<->¦<->-$
#RewriteRule .* - [F]
#
## BLOCK blank referer -AND- UA except for HEAD and favicon requests
#RewriteCond $1 !^favicon\.ico$
#RewriteCond %{REQUEST_METHOD} !^HEAD$
#RewriteCond %{HTTP_REFERER}<->%{HTTP_user_agent} ^<->$
#RewriteRule (.*) - [F]
First, you'll see many HTTP requests with no referrer, because some visitors are behind corporate or ISP caching proxies -- all AOL and EarthLink users, for example. A caching proxy will issue a request to your server on behalf of its user, and then save the response from your server. If the response is not marked as non-cachable in public caches, the proxy will then serve that same response for any subsequent requests for that same URL for a period of time, regardless of which of their users makes the request. This speeds up the Web and results in far fewer requests to our servers. So in general, it's a very good thing.
Details of this caching can be controlled using Cache-Control, Expires, and Vary headers sent by our servers.
However, because the caching proxy will serve this response for multiple requests, any User-Agent or Referer header it might send are meaningless, because these headers can't contain multiple values and they certainly can't contain values which won't be determined until some time in the future when a subsequent request will be made. Therefore, caching proxies often suppress the HTTP Referer header, and often replace the User-Agent header with a string identifying themselves, rather than the end user. In addition, they often set the "Via" or "X-Forwarded-For" header, but like the Referer header, you can't count on it.
Second, the logged value for a blank HTTP header is usually a hyphen, appearing as "-" in the logs. However, the actual value of a missing header when checked by server-side code will be completely blank or (None) or null. Therefore, checking for a hyphen when you want to find a blank header is incorrect -- you should check for "".
Now here's a twist: A few years ago, some tricky troublemaker (or possibly many) was issuing requests where the HTTP_USER_AGENT and HTTP_REFERER actually contained a literal hyphen. This was intended to get past filters that blocked requests if these headers were blank, but still appear as valid blank headers in server logs. You can be sure that if you get a request where either header contains only a single literal hyphen, that it's a malicious request. This should explain the two rules in the mod_rewrite code you copied above (I wrote it). However, as I implemented it, the truly-blank header requests get a 403-Forbidden response, while the faked blank (hyphen) header requests get rewritten to a script (see link below) that blocks the requesting IP address from *any* subsequent access.
Dynamic IPs and bad guys using anonymous proxies are always problematic, and the best you can do is to block them behaviorally. The mod_rewrite code snippet is a good start, and you may want to add a similar snippet that blocks POSTs when the User-Agent header is blank: That's about the best you can do, since the presence or absence of the Referer header is totally unreliable as an indicator.
If you want to "permanently" ban request from IP addresses that use a hyphen for either header, then see this thread [webmasterworld.com]; While the code is intended primarily as a bad-bot trap, it can be called from .htaccess, a config file, or even from SSI or PHP scripts.
However, before you set out to code a solution, I suggest some intense analysis and research: You obviously don't want to block legitimate users' POSTs, and that's the likely result of any deficiency in planning; It's important to define the goal completely before jumping into coding.
I'd suggest:
Missing referrer -- No action
Missing user-agent - 403-Forbidden response
Both missing - Add IP to permanent block list unless HEAD or favicon request
Literal hyphen for referrer or user-agent - Add IP to permanent block list
None of these depend specifically on the HTTP method being a POST.
You will need to clear the "permanent" block list periodically so that it does not grow to a huge size. Also, you need to remove any dynamic IP addresses pretty much daily for dial-up and weekly for DSL, in order to avoid blocking the next (innocent) user who gets assigned those IP addresses. Any IP addresses resolving to hosting or co-location ranges can be left on the block list "more permanently" -- and reviewed once a year. In fact, you can expand the block from a single IP address to a range, unless you expect and want to allow servers in these ranges to access your server.
If you do settle on a mod_rewrite solution, be aware that the [E] flag on RewriteRule can be used to set your conditional-logging-control variable. I didn't see that variable cited in your custom log format, so you'll need to add it as positive-true for your "special" log, and negative-true for your normal access log -- See the Apache mod_log_config documentation for details.
This really isn't a simple subject. I hope this helps.
Jim