Forum Moderators: phranque
I'm trying to ban sites by domain name, since there are recently lots of reference spammers.
I have, for example, the rule:
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*stuff.*\.com/.*$ [NC]
RewriteRule ^.*$ - [F,L]
which should ban any sites containing the word "stuff"
www.stuff.com
www.whatkindofstuff.com
www.some-other-stuff.com
and so on.
However, it is not working, so I am sure I did not setup a proper pattern match rule. Anyone care to advise?
[edited by: jatar_k at 5:06 am (utc) on May 20, 2003]
I'm not sure I'm actually getting my brain fully wrapped around the banning of IP ranges. Actually, based on some recent 500 errors, I'm down right positive I don't have a good grip on it when I try it with some other ranges.
In the above example, do the lines block these ranges:
209.73. 160 through 191
209.131. 32 through 63
209.237. 232 through 235
If you wanted to block addresses in those ranges, you would remove the exclamation point. Here's a reference on regular expressions:
[etext.lib.virginia.edu ]
FYI, in the code you quoted above, you'd have to put a space before each exclamation point, and replace the vertical pipes with the ones on your keyboard because this forum's software modifies the formatting of the code.
At some point I wonder if the .htaccess file is going to get bogged down with all the banned UAs, IPs, etc. Has anyone measured the impact on server performance?
Secondly, I wonder if using a bot trap isn't more efficient. Yes, there are some risks, like banning the wrong folks, but that risk can be minimized by hiding the trigger. I would think that a simple deny from IP address would be much more effective than a long list of UA's?
Then recycle the .htaccess file with a cron job every couple of days to keep it lean and mean?
> Does anybody know how to edit the trap script to only add an IP address once, forever, no matter how many times they land on the ban script?
This should not be necessary. If you get a request from an IP address and it triggers the script, the script adds that IP address to your .htaccess file ban list. If that IP address comes back, it should then get either the standard 403-Forbidden server page, or your custom 403 page if you've defined one. Therefore, it should not be able to re-invoke the script.
The only exception to this is the case where a bad guy has multiple "sessions" open, hitting your server a second or third time before the bot script has had a chance to finish running (banning the IP address) for the first request. In this case, you may see a few duplicate entries in your .htaccess ban list.
It is possible to filter these out, but the script would have to be modified to search through previous ban list entries and compare the IP addresses to the current one. This slows down the script, and therefore you would get even more duplicate entries because the time window for duplicates would be increased by the longer filtering script processing time. So, the filtering technique very soon becomes self-defeating, and I recommend against it.
On the other hand, you may have a problem with your set-up, so let me qualify the above discussion a little more fully: If an IP addresses is still listed in your .htaccess ban list and if it hits the trap and is written again to your ban list more than two or three seconds after the initial entry, then there is something wrong with your installation. Once an IP is banned (which might take two seconds at most), then it should never appear again in the ban list after that two seconds. Instead, you should see that it always gets 403-Forbidden responses after it has been added to the ban list in your .htaccess file. If this is not the case, then something isn't set up correctly.
Hopefully, this explanation is clear enough to be helpful...
Jim
Everybody's .htaccess is likely to be different, and to need to be different. Our sites are all different, and therefore they attract different attackers.
For info on the script that's being mentioned here, start at this recent thread and also read the previous ones: [webmasterworld.com...]
Constantin,
Sometimes it's more efficient to ban by user-agent if that user-agent is being widely-used at the time. When a spoofed user-agent is used, then you must ban by IP address, but that ban list can quickly grow too large. So "pruning" the list, either manually or with a cron job is a good idea. However, some sites may need to prune daily, and others can go for weeks or even months without cleaning up the list, depending on how much bad-bot traffic they are getting.
Personally, I've noticed that attacks slowed down a LOT after I'd had the script in place for several months. The harvesters like to sneak in and grab stuff without being noticed. A 403 response tells them that they've been noticed, and that it's quite possible their IP addresses and proxy addresses are being reported. So, the smart ones don't come back...
Everybody has to figure out the most efficient way to use user-agent blocks, IP address blocks, and bad-bot blocking scripts on their own sites. I doubt there's a good one-size-fits-all recommendation for all sites.
Jim
You are correct (as usual)! I was erroneously allowing rule-banned spiders to access the self-banning script, among other files that poison, 403, record, entrap them, etc. I have since taken the trap script out of that allowed list and placed it under its own rewrite conditions that only deal with FormMail Phishers.
-------------------------------------------------------------
Nick asked to see a complete .htaccess that exemplifies what is being discussed here. I will post mine, with a few items renamedd out to protect the innocent (my domain) and items that are specific to my installation in italics. Also read the note below the codes.
<Files *>
order deny,allow
deny from env=ban
allow from all
</Files><Files .htaccess>
deny from all
</Files>
XBitHack On
ErrorDocument 403 /[i]includes/403.html[/i] # my filename and path
DirectoryIndex index.html index.htm index.shtml index.php /[i]noread.html[/i] #my file
<Files *>
<LimitExcept GET POST>
deny from all
</LimitExcept>
</Files>
order deny,allow
deny from 12.219.232.74
deny from 24.53.200.12
deny from 24.188.211.3
deny from 61.4.64.0/20
deny from 62.253.166.153
deny from 65.33.10.192
deny from 65.57.163.78
deny from 66.36.240.135
deny from 66.36.246.127
deny from 66.72.195.144
deny from 66.76.144.219
deny from 66.119.34.39
deny from 66.250.125.195
deny from 68.42.21.162
deny from 142.177.144.148
deny from 170.224.224.38
deny from 200.176.32.214
deny from 203.194.146.175
deny from 204.234.17.35
deny from 206.135.194.194
deny from 207.134.171.4
deny from 210.192.120.74
deny from 210.192.96.0/17
deny from 212.138.47.18
deny from 213.221.116.114
deny from 216.93.191.2
deny from 217.21.117.121
deny from 217.78.
deny from 220.73.25.68
deny from 220.73.165.
deny from 220.99.112.2
allow from all
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_URI} (.?mail.?form¦form¦(GM)?form.?.?mail¦.?mail)(2¦to)?\.?(asp¦cgi¦exe¦php¦pl¦pm)?$ [NC]
RewriteRule .* MyDomain/cgi-bin/MyTrapFileName [L] # names have been changed
# The following are universal rules anybody can use in .htaccess:
RewriteCond %{HTTP_USER_AGENT} ^(BlackWidow¦Crescent¦Disco.?¦ExtractorPro¦HTML.?Works¦Franklin.?Locator¦HLoader¦http.?generic¦Industry.?Program¦IUPUI.?Research.?Bot¦Mac.?Finder¦NetZIP¦NICErsPRO¦NPBot¦PlantyNet_WebRobot¦Production.?Bot¦Program.?Shareware¦Teleport.?Pro¦TurnitinBot¦TE¦VoidEYE¦WebBandit¦WebCopier¦WEP.?Search¦Wget¦Zeus) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} cherry.?picker¦e?mail.?(collector¦extractor¦magnet¦reaper¦siphon¦sweeper¦harvest¦collect¦wolf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Educate.?Search¦Full.?Web.?Bot¦Indy.?Library¦IUFW.?Web [NC,OR]
RewriteCond %{HTTP_USER_AGENT} httrack¦larbin¦NaverRobot¦Siphon¦SURF [NC,OR]
RewriteCond %{HTTP_USER_AGENT} efp@gmx\.net [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.?URL.?Control [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Miss.*g.*.?Locat.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.06\ \(Win95;\ I\) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible\ ;\ MSIE.? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.00;\ Windows\ 98$ [NC,OR]
# The next lines block NPBot by IP
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\.175\.0\.(3[2-9]¦4[0-7])$ [OR]
RewriteCond %{REMOTE_ADDR} ^(203\.186\.145\.225¦218\.6\.10\.113¦68\.59\.94\.40¦66\.75\.128\.202)$ [OR]
RewriteCond %{REMOTE_ADDR} ^210\.192\.(9[6-9]¦1[0-1][0-9]¦12[0-7])\. [OR]
RewriteCond %{REMOTE_ADDR} ^211\.(1[0-1][4-9])\. [OR]
RewriteCond %{REMOTE_ADDR} ^218\.([0-2][0-9]¦[3][0-1])\. [OR]
RewriteCond %{REMOTE_ADDR} ^218\.(5[6-9]¦[6-9][0-9])\. [OR]
# Start Cyveillance blocks
RewriteCond %{REMOTE_ADDR} ^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^65\.118\.41\.(19[2-9]¦2[0-1][0-9]¦22[0-3])$ [OR]
# End Cyveillance blocks
RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]
RewriteCond %{HTTP_REFERER} iaea\.org [NC]
RewriteRule !^(includes/403\.html¦cgi-bin/various_filenames\.pl¦various_filenames\.html) - [F]
# alternate RewriteRule without allowing access to custom 403 or trap pages, or cgi scripts:
# RewriteRule .* - [F]
Please note that this Board changes the vertical pipe ¦ from a solid pipe to a broken-line pipe. You will need to retype the vertical pipe on your keyboard to replace any broken pipes displayed here, if you choose to use these rules. Also, I see that some of the Rewrite lines have been word-wrapped. They were typed on one line, ending with the [OR] or [NC,OR] switches. Also, there should be a space in my first RewriteRule, between RewriteRule and the!^
Wiz
Have you verified that your fixed list of IP address is in fact blocked?
Another member on this board has reported problems when there are two "Order" sections - with the contents of the second one being ignored.
If you also see this problem, simply combine the two "deny from" lists under one "Order" directive. For example:
<Files .htaccess>
deny from all
</Files>
<Files *>
<LimitExcept GET POST>
deny from all
</LimitExcept>
</Files>
# Allow everybody to access custom 403 page and robots.txt
SetEnvIf Request_URI "^(/custom403\.html¦/robots\.txt)$" allowit
# Deny based on environment variables set above and IP addresses.
<Files *>
Order Deny,Allow
Allow from env=allowit
Deny from env=ban
Deny from 12.219.232.74
Deny from 24.53.200.12
<snip>
deny from 220.73.165.
deny from 220.99.112.2
</Files>
Jim
RewriteCond %{REMOTE_ADDR} ^210\.192\.(9[6-9]¦1[0-1][0-9]¦12[0-7])\. [OR], I noticed that a creepy crawler from 210.192.120.74 snuck in despite it, so I created a simple deny from rule for both that IP and the Host IP range, ala: I have seen 403s for IPs in the second and third block lists, but I like the idea of grouping common items in the same catagories. I'll get to that asap.
Thanks for demonstrating the allow from ruleset below:
# Allow everybody to access custom 403 page and robots.txt
SetEnvIf Request_URI "^(/custom403\.html¦/robots\.txt)$" allowit
Wiz
> a creepy crawler from 210.192.120.74 snuck in despite it
I don't see anything wrong with that RewriteCond or the regex in it, but look at the whole RewriteCond list and make sure you don't have a missing [OR], an unwelcome [OR] on the last RewriteCond, or some other kind of syntax or "structural" error.
Jim