Forum Moderators: phranque
If you have root access there is a solution using mod_rewrite which allows you to keep a single external file for UAs, IPs and Referrers to block.
How it works
mod_rewrites allows you to use hash tables to insert/substitute fields in RewriteCond and RewriteRules through a key lookup. These tables can be stored in ASCII files or NDBM files. This is the feature that we want: External files.
To block certain UAs, IP addresses or Referrers we try to look them up in the hash table. If they are contained in the table, we block this request. Otherwise we allow it.
Since this approach uses a key lookup we cannot use regular expressions here. So we need to include each and every UA, IP, URL we want to block in the tables. Additionally keys cannot contain spaces.
For the UA string we need to strip those off before passing it to the lookup function. Since we have only 9 slots for backreferences available, we can only differentiate between UA strings up to the 9th space contained in the UA string. In the implementation I choose to take only the first 4 words contained in the UA string.
For the aforementioned reasons one needs to decide whether one wants to block specific URLs or whole domains.
Generally speaking to account for the lack of regular expressions in the look up we need to strip off unwanted information ("http://" and the path for the referrer and spaces for UAs) before doing the key lookup. This stripping off is done by a RewriteCond rule which preceeds the one doing the actual look up.
Despite those short comings and the extra work involved, this approach is still slightly faster than the ordinary approach used in the A Close to perfect .htaccess ban list [webmasterworld.com] thread. And it adds the additinal benefit of having to maintain only one set of data.
Implementation
RewriteMap testmap txt:/var/www/html/testmap
RewriteMap iptestmap txt:/var/www/html/testmap
RewriteMap reftestmap txt:/var/www/html/testmap or RewriteMap hashmap dbm:/var/www/html/hashmap
RewriteMap iphashmap dbm:/var/www/html/hashmap
RewriteMap refhashmap dbm:/var/www/html/hashmap RewriteCond %{HTTP_USER_AGENT} ^([^\ ]+)\ *([^\ ]*)\ *([^\ ]*)\ *([^\ ]*)
RewriteCond ${hashmap:%1%2%3%4} ^b$
RewriteRule !^path/to/error/docs/[0-9]{3}.php - [F,L]
#
RewriteCond ${iphashmap:%{REMOTE_ADDR}} ^b$
RewriteRule !^path/to/error/docs/[0-9]{3}.php - [F,L]
#
RewriteCond %{HTTP_REFERER} ^http://([^/]+)
RewriteCond ${refhashmap:%1} ^b$
RewriteRule !^path/to/error/docs/[0-9]{3}.php - [F,L] For the referrer we use the domain only. If you want to use more than the domain add additional parenthesis to the RewriteCond %{HTTP_REFERER} ^http://([^/]+) rule. Don´t forget to add the appropriate backreferences to the rule doing the lookup.
BlackWidow b
Botmailto:craftbot@yahoo.com b
ChinaClaw b
DISCo b
DownloadDemon b
...
Wget b
Widow b
Xaldon b
WebSpider b
Zeus b Remove spaces from user agent names.
218.211.127.156 b
154.216.125.110 b www.webmasterworld.com b
www.aaroncarter.com b Convert each of the above files to the NDBM format as explained in the Hash File section of the RewriteMap [httpd.apache.org] documentation.
The b is just some arbitrary string (blocked) that is returned by the key lookup and for which we check in the RewriteCond rule.
Benchmark Results
The setup is the same as here [webmasterworld.com]
AaronCarter needed 1.58516982468692 seconds.
BlackWidow. needed 1.68941572579471 seconds.
Zeus....... needed 1.59178618951277 seconds.
AaronCarter needed 1.54459900205786 seconds.
BlackWidow. needed 1.57292873209173 seconds.
Zeus....... needed 1.5213089206002 seconds.
AaronCarter needed 1.52674262090163 seconds.
BlackWidow. needed 1.56979447061365 seconds.
Zeus....... needed 1.52950318293138 seconds.
AaronCarter: 0.05842720378529 -> not blocked
BlackWidow.: 0.11962125518106 -> blocked
Zeus.......: 0.06228300658139 -> blocked
Again, using the dbm map file is between 0.05 - 0.1 ms faster per request than using the single RewriteCond rule. It would be interesting to know how this solution scales compared to the normal RewriteCond method. While I didn´t test this, I would expect the dbm mapfile to be a lot faster for lots of entries.
Of course, if you have root access it would have been smarter to include the rules in the httpd.conf instead of the .htaccess file. This would speed up things slightly, too. As a general rule, avoid .htaccess files at all if you can. I realize most can´t.
Pros/Cons
pros
- single file for all (virtual) servers / all your projects
- slightly faster than other method
- no mod_perl needed
cons
- no regular expressions available
- no spaces in map keys allowed
- no case insensitive matching
- root access needed
See also
Module mod_rewrite URL Rewriting Engine [httpd.apache.org]
A Close to perfect .htaccess ban list [webmasterworld.com]
How (and Where) best to control access [webmasterworld.com]
Hope this is useful. Comments welcome.
Andreas
Andreas
Great method! And great tutorial, thanks for the write up.
If I may, can I propose an alternate?
My method has some advantages over yours (you can have spaces, match regular expressions, can match over 9 places), but also some possible dissadvantages (MUST have mod_perl). It shares the same ideas of one, centeralized list for all servers, and I beleive is as fast or faster than similar .htaccess (but I have no way of testing, or at least no knowledge of how to test!)
I use the Apache mod_perl BlockAgent, slightly modified. I used that as a base, and rewrote it to block IP's (and I beleive you helped with aht, Andreas- thanks!)
Then, in each VH in httpd.conf, call thus:
<Location />
PerlAccessHandler Apache::BlockAgent
PerlSetVar BlockAgentFile /path/to/bad_agents.txt
PerlAccessHandler Apache::BlockIP
PerlSetVar BlockIPFile /path/to/bad_ip.txt
</Location>
Sample of BadAgents.txt:
^\$botname
# ALL-UPPERCASE letters, and nothing but those letters, 6 or more in a row
^([A-L]¦[N-Z])[B-Z][A-Z]{4,}$
# Begin with Date
^(Mon¦Tue¦Wed¦Thu¦Fri¦Sat¦Sun)\ (Jan¦Feb¦Mar¦Apr¦May¦Jun¦Jul¦Aug¦Sep¦Oct¦Nov¦Dec)\ [0-3][0-9]\ [0-2][0-9]\:[0-5][0-9]\:[0-5][0-9]\ ..T\ 2[0-9][0-9][0-9]Mozilla\/[0-9]\.[0-9][0-9]
almaden
^Anarchie
^ASPSeek
AspTear
^atSpider
^attach
Sample of BadIP.txt:
^63\.174\.33\.196$
# ICaughtYou.Com
^63\.144\.231\.
# Cyveillance
^63\.148\.99\.2(2[4-9]¦[34][0-9]¦5[0-5])$
NOTE: Files MUST be uploaded ASCII or it will not work.
Used in combination with a spider trap that writes to BadIP.txt rather than .htaccess, this is a great system!
dave