Forum Moderators: phranque
I'm trying to ban sites by domain name, since there are recently lots of reference spammers.
I have, for example, the rule:
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*stuff.*\.com/.*$ [NC]
RewriteRule ^.*$ - [F,L]
which should ban any sites containing the word "stuff"
www.stuff.com
www.whatkindofstuff.com
www.some-other-stuff.com
and so on.
However, it is not working, so I am sure I did not setup a proper pattern match rule. Anyone care to advise?
[edited by: jatar_k at 5:06 am (utc) on May 20, 2003]
It's fairly common to see a blank referer, but blank user-agents are rare. Nevertheless, I have elected not to "ban" truly-blank user-agent+referer, partly because I use key_master's bad_bot.pl script to catch them later if they are up to no good.
However, the one case where I've never seen an innocent visitor is when the user agent is a hypen and the referer is a hyphen. This is an intentional ploy to get past blocks/bans on blank ua+referer. For these guys, I ban them by calling the script, which records their IP address and blocks all subsequent requests.
Note that in most server logs, blank referer and user-agent are displayed as "-" "-" and so these tricky user-agents using hyphens look identical in the logs to a blank referer/ua, because they are also displayed as "-" "-".
RewriteCond %{HTTP_REFERER} ^-$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^-$
RewriteRule .* /cgi-local/bad_bot.pl [L]
Jim
- ah, that explains why. I've always thought it wass odd. Meaning; if they absolutely wanted something apart from blank, then why use the hyphen when there's a whole character set to choose from?
So, they're actually betting on people banning blank strings and forgetting to ban hyphens. Good to know :)
However, some day they might start thinking that another character than a hyphen may also be worth a try, that was the reason for my "BTW" comment in post #50
/claus
I dont encounter any problems (that I am aware off) I checked with wannabrowser if the bad bots are kept out (yes)
however I dont know how to check if the IP banning works.
Also there is already an RewriteEngine On, so I have it twice, is that suppose to be like this?
here is my .htaccess file If anyone could check if all looks ok as i had already other stuff on it.
DirectoryIndex index.phpphp_flag magic_quotes_gpc on
RewriteEngine On
RewriteRule ^news_archive-([0-9][0-9][0-9][0-9][0-9][0-9]*).* index.php?m=$1
# this will make register globals off in b2's directory
# just put a '#' sign before these three lines if you don't want that#
#php_flag register_globals off
## this will set the error_reporting level to remove 'Notices'
#
# php_value error_reporting 247
## this is used to make b2 produce links like [example.com...]
# if you renamed the file 'archives' to another name, please change it here too#
#ForceType application/x-httpd-php
#RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
<long list of more like those>
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F]
thanks
these two comment lines should probaly be just after the: RewriteRule ^news_archive
>> reading like 2 hours straight
And it's getting longer still ;)
>> there is already an RewriteEngine On
You only need one, delete number two and perhaps collect the Rewrite-statements in one block for easy maintenance. As it is now, there's some php-stuff in-between although it's commented out.
>> how to check if the IP banning works
You'll have to be able to spoof the IP-address, but they seem quite allright to me. They ban:
12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])
- from 12.148.209.192 to 12.148.209.255
12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])
- from 12.148.196.128 to 12.148.196.255
Extra:
^news_archive-([0-9][0-9][0-9][0-9][0-9][0-9]*).*
What you are saying here is "news_archive-" followed by any number of any digit as long as there are at least five - followed by any character any number of times including zero. I suspect that this is not what you want, rather i think that you would like to catch a filename like this:
news_archive-200209.php
That is: exactly six digits, then a dot and then a php... or htm or asp, etc. Try this in stead, and replace "php" with the relevant ending if needed:
^news_archive-(\d{6})\.php$
The six digits are still getting caught and turned over to $1 by means of the parenthesis.
/claus
I copied and modified a big list of bad bots that appeared months ago on this thread.
One of the lines was:
RewriteCond %{HTTP_USER_AGENT} MS\ FrontPage [OR]
I had to change that to:
RewriteCond %{HTTP_USER_AGENT} MS.?FrontPage [NC,OR]
The previous version was letting "MSFrontpage" through. (It was trying to POST. The request 404'd, fortunately, because I don't use Frontpage.)
Because the email addresses being used for these virus mass mailings are coming from a web spider... I'm wondering if anyone here knows how that spider identifies itself. Does it look exactly like a legitimate IE broswer, or is it catchable?