Forum Moderators: phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list - Part 2

         

adriaant

11:46 pm on May 14, 2003 (gmt 0)

10+ Year Member



<modnote>
continued from [webmasterworld.com...]



UGH, bad typo in my original post. Here's the better version (I wasn't able to re-edit the older post?):

I'm trying to ban sites by domain name, since there are recently lots of reference spammers.

I have, for example, the rule:

RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*stuff.*\.com/.*$ [NC]
RewriteRule ^.*$ - [F,L]

which should ban any sites containing the word "stuff"
www.stuff.com
www.whatkindofstuff.com
www.some-other-stuff.com

and so on.

However, it is not working, so I am sure I did not setup a proper pattern match rule. Anyone care to advise?

[edited by: jatar_k at 5:06 am (utc) on May 20, 2003]

nancyb

7:36 pm on Sep 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have the following in htaccess

# Block libwww-perl except from AltaVista, Inktomi, and IA Archiver
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/[0-9] [NC]
RewriteCond %{REMOTE_ADDR}!^209\.73\.(1[6-8][0-9]¦19[01])\.
RewriteCond %{REMOTE_ADDR}!^209\.131\.(3[2-9]¦[45][0-9]¦6[0-3])\.
RewriteCond %{REMOTE_ADDR}!^209\.237\.23[2-5]\.
RewriteRule!^err403\.htm$ - [F]
# Block Java and Python URLlib except from Google
RewriteCond %{HTTP_USER_AGENT} ^(Python.urllib¦Java/?[1-9]\.[0-9]) [NC]
RewriteCond %{REMOTE_ADDR}!^216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\.

can anyone tell why the first hit gets a 200 and the second is 404? and what I need to do to correct it so both are 404?

65.49.178.17 - - [17/Sep/2003:10:40:52 -0400] "GET /xxx.htm HTTP/1.1" 200 14724 "-" "xxxxxxxxx_xxxxxxxx/0.1 libwww-perl/5.65"
65.49.178.17 - - [17/Sep/2003:10:34:02 -0400] "GET /xxxxxx/- HTTP/1.1" 404 7550 "-" "xxxxxxxxx_xxxxxxxx/0.1 libwww-perl/5.65"

thanks

closed

5:00 am on Sep 18, 2003 (gmt 0)

10+ Year Member



It looks to me like the code you posted has nothing to do with the log file entries, because the code checks to see if the UA begins with libwww... or Python.... or Java..., and there is no UA like that in your log file snippet.

I'd guess that there are no restrictions imposed on 65.49.178.17. The 404 was due to the fact that the document being retrieved was /xxxxxx/-, which is a malformed address. The 200 was due to the fact that /xxx.htm existed, and there were no access restrictions on 65.49.178.17.

I'm guessing you want to correct both log file entries so that a 403 status code (Forbidden) is returned. In that case, I'd change this line:


RewriteCond %{HTTP_USER_AGENT} ^libwww-perl/[0-9] [NC]

to this:

RewriteCond %{HTTP_USER_AGENT} libwww-perl/[0-9] [NC]

I removed the ^ from the first RewriteCond so that you check for the occurrence of libwww-perl/[0-9] anywhere in the UA, which is what happened here.

If you really do want to send back a 404, you'd have to modify the RewriteRule to use the R flag with a status code of 404. Since you want to block unwanted users, though, my guess was that you actually wanted to send back a 403.

nancyb

3:06 pm on Sep 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



thank you, closed!

I looked and looked at the code and it just didn't sink in and I completely missed the "-". Could be all the problems with my site/host over the last weeks and I'm brain dead from looking for problems - or - it could be I'm just as blind as a bat.

Either case, thank you so much for the clear explanation! :)

Wizcrafts

6:22 pm on Sep 18, 2003 (gmt 0)

10+ Year Member



Speaking from with a mental block, I post:

In my .htaccess file I have applied all of the suggestions found throughout this thread and everything is working fine.

One of the thorns in my side has been FormMail Phishers, so I have taken the now-famous Trap.pl, customized it and renamed it "formmail.pl," allowed that file in my RewriteRule and it works fine to ban Phishers. Most Phishers come at me about 8 to 10 times in a row, with various spellings, extensions, and directory names, but have always been caught in my trap when they type in "formmail.pl"

However, while reading yesterday's web log I found a FormMail phisher that evaded my trap by only looking for variations of this exact spelling: cgi-bin/FormMail.pl (and .cgi). My ban-bad-bots trap is named formmail.pl and was not triggered because it is all lowercase, but he did get 403's by my RewrightCond for

form.?mail [nc,or]
.

I tried adding this line to my .htaccess but it does not redirect the request to formmail.pl:

RedirectMatch permanant cgi-bin/FormMail\.pl cgi-bin/formmail.pl
.
I also commented out the other conditions that would have caught this request and 403'd it. Every Wannabrowser attempt was met with a 403, but no redirects. Wannabrowser is not blocked in my .htaccess.

Can anybody help me straighten out the error so I can forward requests for "FormMail.pl" to "formmail.pl"? If I figure it out first I will post the working code-line later.

TIA, Wiz

closed

7:25 pm on Sep 18, 2003 (gmt 0)

10+ Year Member



nancyb: You're welcome so much for the clear explanation! :P Luckily, you edited the lines from the log file correctly so that the file and directory names remained anonymous, keeping the malformed address in place.

Wizcrafts: You should replace permanant with permanent. You could also use 301 instead.

Wizcrafts

7:36 pm on Sep 18, 2003 (gmt 0)

10+ Year Member



Closed;
I rectified the spelling error but I still get a 403 when I try to GET FormMail.pl. Any other ideas?

Here is the applicable RewriteCond and RewriteRule affecting FormMail:


RedirectMatch 301 cgi-bin/FormMail\.pl cgi-bin/formmail.pl

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_URI} formmail\.(cgi¦php)$ [NC]

RewriteRule!^(includes/403\.html¦cgi-bin/MKCounter\.cgi¦robots\.txt¦contact-info\.html¦kissthis\.html¦cgi-bin/contact-info\.cgi¦cgi-bin/contact-list\.pl¦cgi-bin/banbadbots\.cgi¦cgi-bin/formmail\.pl¦cgi-bin/FormMail\.pl¦bait/honeypot\.html¦bait/\w*\.html¦bait/contact-info\.cgi) - [F]

I figured it out myself!

When a request comes for a file in my cgi-bin and I tried to redirect that to cgi-bin/formmail.pl, I was actually telling the searcher to look in cgi-bin/cgi-bin/ for formmail.pl. I got it to work by dropping the cgi-bin/ in the destination file!

Wiz

[edited by: Wizcrafts at 7:59 pm (utc) on Sep. 18, 2003]

closed

7:59 pm on Sep 18, 2003 (gmt 0)

10+ Year Member



So it actually works?

From [httpd.apache.org ]:

RedirectMatch
Syntax: RedirectMatch [status] regex URL

I was going to say that you should change your third input to something that starts with http://.

Added: Never mind. I got confused between URL and URI.

[edited by: closed at 8:08 pm (utc) on Sep. 18, 2003]

claus

8:03 pm on Sep 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm just wondering why you don't do this in stead:

Options +FollowSymLinks 
RewriteEngine On
RewriteCond %{REQUEST_URI} (.?mail.?form¦form¦(GM)?form.?.?mail¦.?mail)(2¦to)?\.?(asp¦cgi¦exe¦php¦pl¦pm)?$ [NC,OR]
RewriteRule .* /path-to/bad-bot-script.pl [L]

If your bad-bot script bans them anyway, it seems odd to have an additional level of banning. Then it would seem more efficient to just dump them directly in the trap, thus getting them banned instantly.

My condition for formmail catches a few more than the one you posted, it's documented by balam here (msg #6): [webmasterworld.com...]

/claus


BTW: this link is really valuable, it's Engelschalls guide to url rewriting, it's better than the official Apache docs imho: [engelschall.com...]

[edited by: claus at 8:05 pm (utc) on Sep. 18, 2003]

Wizcrafts

8:03 pm on Sep 18, 2003 (gmt 0)

10+ Year Member



Here is how I got it to redirect correctly:

RedirectMatch cgi-bin/FormMail.pl formmail.pl

No 301 needed, it generates a 302 found response as the trespasser is viewing my trap text and getting banned.

This stuff can drive you nuts trying to get exact paths and syntax. I read the Apache docs and still had to figure it out by trial and error (heavy on the error side)

Thanks Claus, I'll try to implement that
Wiz

Wizcrafts

9:04 pm on Sep 18, 2003 (gmt 0)

10+ Year Member



Well, I have implemented the ban-all-phishers rule that Balam and Claus posted and it works.

I have found that there are times when the same IP address gets added to my ban list multiple times and I anticipate that this is going to happen more, since the FormMail phishers hit you 6 to 10 times at once, looking for different file names and paths, all of which are now banned by the all-inclusive rule.

Does anybody know how to edit the trap script to only add an IP address once, forever, no matter how many times they land on the ban script?

Here is the banning script section in question:


# trap.pl: upload in ASCII mode and CHMOD 755.

# This is the only variable that needs to be modified. Replace it with the absolute path to your root directory.
$rootdir = "$ENV{DOCUMENT_ROOT}";

# Grab the IP of the bad bot
$visitor_ip = $ENV{'REMOTE_ADDR'};
$visitor_ip =~ s/\./\\\./gi;

# Open .htaccess file
open(HTACCESS,"".$rootdir."/\.htaccess") ¦¦ die $!;
@htaccess = <HTACCESS>;
close(HTACCESS);

# Write banned IP to .htaccess file
open(HTACCESS,">".$rootdir."/\.htaccess") ¦¦ die $!;
print HTACCESS "SetEnvIf Remote_Addr \^".$visitor_ip."\$ ban\n";
foreach $deny_ip (@htaccess) {
print HTACCESS $deny_ip;
}
close(HTACCESS);

I think the multiple listings occur because I have allowed access to formmail.pl and my trap script in my master rewrite rule. Maybe I can now remove that allowance since the formmail rule is totally separate. I'll see.

This 122 message thread spans 13 pages: 122