Forum Moderators: phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list - Part 2

         

adriaant

11:46 pm on May 14, 2003 (gmt 0)

10+ Year Member



<modnote>
continued from [webmasterworld.com...]



UGH, bad typo in my original post. Here's the better version (I wasn't able to re-edit the older post?):

I'm trying to ban sites by domain name, since there are recently lots of reference spammers.

I have, for example, the rule:

RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*stuff.*\.com/.*$ [NC]
RewriteRule ^.*$ - [F,L]

which should ban any sites containing the word "stuff"
www.stuff.com
www.whatkindofstuff.com
www.some-other-stuff.com

and so on.

However, it is not working, so I am sure I did not setup a proper pattern match rule. Anyone care to advise?

[edited by: jatar_k at 5:06 am (utc) on May 20, 2003]

jdMorgan

7:46 pm on Aug 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Comments on "blank" referer and user-agent:

It's fairly common to see a blank referer, but blank user-agents are rare. Nevertheless, I have elected not to "ban" truly-blank user-agent+referer, partly because I use key_master's bad_bot.pl script to catch them later if they are up to no good.

However, the one case where I've never seen an innocent visitor is when the user agent is a hypen and the referer is a hyphen. This is an intentional ploy to get past blocks/bans on blank ua+referer. For these guys, I ban them by calling the script, which records their IP address and blocks all subsequent requests.

Note that in most server logs, blank referer and user-agent are displayed as "-" "-" and so these tricky user-agents using hyphens look identical in the logs to a blank referer/ua, because they are also displayed as "-" "-".


RewriteCond %{HTTP_REFERER} ^-$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^-$
RewriteRule .* /cgi-local/bad_bot.pl [L]

Jim

Wizcrafts

8:12 pm on Aug 2, 2003 (gmt 0)

10+ Year Member



JD, what about ANDing those two rules instead of ORing them? Wouldn't that make certain that only a bot with a blank or dash Referrer AND UserAgent gets poisoned/banned?

How would you rewrite the code if you want to AND them?

jdMorgan

8:38 pm on Aug 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wizcrafts,

That code blocks anyone who tries to use a hyphen for either request field in order to fake me out. As such, I intentionally [OR]ed them. To make it an AND condition, just omit the [OR] at the end of the first RewriteCond.

Jim

Wizcrafts

8:41 pm on Aug 2, 2003 (gmt 0)

10+ Year Member



Thanks Jim, that's what I thought, but needed to know for sure. I'm used to Javascript statements, where AND is && and OR is ¦¦. Now I know the RegExpr method.

claus

5:37 pm on Aug 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> tricky user-agents using hyphens look identical in the logs

- ah, that explains why. I've always thought it wass odd. Meaning; if they absolutely wanted something apart from blank, then why use the hyphen when there's a whole character set to choose from?

So, they're actually betting on people banning blank strings and forgetting to ban hyphens. Good to know :)

However, some day they might start thinking that another character than a hyphen may also be worth a try, that was the reason for my "BTW" comment in post #50

/claus

jdMorgan

8:11 pm on Aug 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They are counting on the common use of "-" in log files to represent a blank ua. No other character would look like a logged blank referer, so we need not be concerned about other characters.

This ploy was first reported by WebmasterWorld member guabito some time last year, IIRC.

Jim

viggen

6:17 pm on Aug 11, 2003 (gmt 0)

10+ Year Member



After reading like 2 hours straight through this threads
I implemented an .htaccess file.

I dont encounter any problems (that I am aware off) I checked with wannabrowser if the bad bots are kept out (yes)
however I dont know how to check if the IP banning works.
Also there is already an RewriteEngine On, so I have it twice, is that suppose to be like this?

here is my .htaccess file If anyone could check if all looks ok as i had already other stuff on it.


DirectoryIndex index.php

php_flag magic_quotes_gpc on

RewriteEngine On

RewriteRule ^news_archive-([0-9][0-9][0-9][0-9][0-9][0-9]*).* index.php?m=$1

# this will make register globals off in b2's directory
# just put a '#' sign before these three lines if you don't want that

#
#php_flag register_globals off
#

# this will set the error_reporting level to remove 'Notices'
#
# php_value error_reporting 247
#

# this is used to make b2 produce links like [example.com...]
# if you renamed the file 'archives' to another name, please change it here too

#
#ForceType application/x-httpd-php
#

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
<long list of more like those>
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F]

thanks

claus

9:18 pm on Aug 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> # this is used to make b2 produce links like ...

these two comment lines should probaly be just after the: RewriteRule ^news_archive

>> reading like 2 hours straight

And it's getting longer still ;)

>> there is already an RewriteEngine On

You only need one, delete number two and perhaps collect the Rewrite-statements in one block for easy maintenance. As it is now, there's some php-stuff in-between although it's commented out.

>> how to check if the IP banning works

You'll have to be able to spoof the IP-address, but they seem quite allright to me. They ban:

12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])

- from 12.148.209.192 to 12.148.209.255

12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])

- from 12.148.196.128 to 12.148.196.255

Extra:

^news_archive-([0-9][0-9][0-9][0-9][0-9][0-9]*).*

What you are saying here is "news_archive-" followed by any number of any digit as long as there are at least five - followed by any character any number of times including zero. I suspect that this is not what you want, rather i think that you would like to catch a filename like this:

news_archive-200209.php

That is: exactly six digits, then a dot and then a php... or htm or asp, etc. Try this in stead, and replace "php" with the relevant ending if needed:

^news_archive-(\d{6})\.php$

The six digits are still getting caught and turned over to $1 by means of the parenthesis.

/claus

berli

8:54 pm on Aug 16, 2003 (gmt 0)

10+ Year Member



Just wanted to share a problem I ran into:

I copied and modified a big list of bad bots that appeared months ago on this thread.

One of the lines was:

RewriteCond %{HTTP_USER_AGENT} MS\ FrontPage [OR]

I had to change that to:

RewriteCond %{HTTP_USER_AGENT} MS.?FrontPage [NC,OR]

The previous version was letting "MSFrontpage" through. (It was trying to POST. The request 404'd, fortunately, because I don't use Frontpage.)

IanKelley

4:42 am on Aug 31, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm sure everyone here has been enjoying all of the virus spam lately.

Because the email addresses being used for these virus mass mailings are coming from a web spider... I'm wondering if anyone here knows how that spider identifies itself. Does it look exactly like a legitimate IE broswer, or is it catchable?

This 122 message thread spans 13 pages: 122