My site is being hammered by comment spam bots. I have obtained an IP blacklist which contains 9000 entries and is 124kb in size, how can I efficiently use this list using .htaccess to block these bots?
jdMorgan
2:14 pm on Mar 23, 2010 (gmt 0)
1) Don't include IP address ranges that are not a problem for your site.
2) Where possible, combine many smaller IP address ranges into fewer, larger ranges. In some cases, you may ask yourself, "Do I get any meaningful traffic from this non-blacklisted range which is embedded within two blacklisted ranges?" If the answer is "No," then combine the ranges and block the lot.
3) Use CIDR notation in mod_access to block the ranges (If you can't compute CIDRs in your head, then you can find on-line tools to help). Be aware that the base range, size, and modulus of the CIDR must agree, or multiple CIDRs will be required.
4) Block as little as possible. Some ranges are a constant problem, while others are simply one-off problems for a short period of time. Balance 'problem severity of ranges' against .htaccess filesize and potential legitimate-visitor blocking and the resultant revenue and reputation loss.
5) Make your custom 403-Forbidden page polite and helpful in case you mistakenly block an "innocent" visitor. However, do not gloat or describe the technical methods used to implement your blocks -- Information is power, and you don't want to give any power to malicious visitors... "We're sorry, but our server has denied your request to access our site. Please contact us at help2010Mar (at) example (dot) com or call 1-800-555-1212 for assistance" will suffice. Obviously, "help2010Mar@example.com" should be a throw-away address that can be changed frequently, and the next address *should not* be "help2010Apr" -- Don't follow any predictable pattern.
6) Be sure to exclude both your robots.txt file and your custom 403 error page from the IP address blocks. You can use mod_setenvif, Order Deny,Allow, and the Allow from env=<name> syntax of mod_access to implement this by-pass. Failure to do this can result in "self-inflicted DOS attacks" through two different mechanisms, and that's no fun...
7) When declaring a custom error document, use a local filepath, not a URL. Failure to observe this rule may result in very bad effects on your search engine ranking results. Read the ErrorDocument directive's documentation carefully -- The "fine print" at the bottom is very important. This applies to the other modules as well -- mod_access and mod_setenvif. A bit of reading at Apache.org is well-worth your time, and may prevent disaster.
Jim
gosman
3:05 pm on Mar 23, 2010 (gmt 0)
Hi Jim.
And thank you for your detailed and very technical answer :)
I'm new to .htaccess and was wondering if you could provide an example of how to do this.
This is how far I've got
order allow,deny deny from %ipaddress1% deny from %ipaddress2% deny from %ipaddress3% etc..... allow from all
Obviously doing this for all 9000 addresses wil make the .htacecss about 126K. Is this to large? Also this method won't address the "self-inflicted DOS attacks" you mention.
Any help much appreciated.
jdMorgan
5:40 pm on Mar 23, 2010 (gmt 0)
I was fairly specific about some things, which you've overlooked...
SetEnvIf Request_URI "^(robots\.txt|custom-403-page\.html)$" any-access # Order deny,allow # Deny from 192.168.0.0/24 Deny from 10.10.0.0/16 Allow from env=any-access
Again, if you have not read the docs, take the time to do so. Any attempt at coding may otherwise be "suicidal" or at the very least "problematic" for your site. This is server configuration code, and not to be trifled with. We see a lot of code in this forum, almost all of it defective in one way or another... and many of these problems are "revenue-affecting" because they negatively affect search ranking and trust factors.