Forum Moderators: coopster & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

         

toolman

3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Edge

2:09 pm on Sep 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



When Frontpage first accesses a web site, the file _vti_inf.hmtl is requested. I set up a trap script via. SSI in the html file (_vti_inf.hmtl), search for trap.pl on webmaster world.

The trap.pl script blocks thier ip address from further access to your website. This is very safe since "_vti_inf.hmtl" is only requested by Frontpage.

Works great!

mundonet

8:36 pm on Sep 21, 2002 (gmt 0)

10+ Year Member



Edge: what if we are using FP to upload (Publish)? Doesn't FP request the file to determine what needs updating? Will we ban ourselves?

stapel

11:40 pm on Sep 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Edge" said:
When Frontpage first accesses a web site, the file _vti_inf.hmtl is requested.

I'm still waiting to hear from my host (this being the weekend) about whether "mod_rewrite" is available to me, but, in the meantime, I know that "Redirect" works. So could I do a Redirect, something like:

Redirect /_vti_inf.hmtl [purplemath.com...]

...to get rid of the FrontPage bums?

-----ten minutes later-----

I just tried the above line in my .htaccess file, and FrontPage was still able to download whatever it wanted from Purplemath into one of my other "webs". *sigh*

So tell me more about this script thingy...?

Edge

12:04 am on Sep 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops, did I say "_vti_inf.hmtl", I realy meant "_vti_inf.html"

Sorry about that.

stapel

12:49 am on Sep 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Duh! I didn't even notice the misspelling when I did the cut-n-paste.

But I just tried again, using the proper spelling, and it still didn't work.

Oh, well. About that script you mentioned...?

carfac

4:28 pm on Sep 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For those of you with multiple domains, and want it in httpd.conf in stead of .htaccess (and have root access!) I have a solution!

First, install Apache::BlockAgents for each VH, and have them all point to the same bad_agent.txt- thus you only have one file to update for all hosts. (Note that all copies of this I have found on the web have perl errors in them- you will have to tweak that code to make it work at all.)

Then, make a copy of BlockAgents, modify the code a bit to handle IP's instead of agents, rename it BlockIP (or something!)and make a master bad_ip.txt file.

Third, get that trap.pl script, and modify that to write to bad_ip.tx rather than .htaccess. I further modified trap to it day/time stamps each entry, so I can clean it out every week.

This method is REALLY fast, and painless once set up (although set-up is a B***H!) It will work across all your VH, and if someone gets to onbe VH, they get locked out of all of them!

dave

bull

12:32 pm on Sep 25, 2002 (gmt 0)

10+ Year Member



RewriteCond %{HTTP_USER_AGENT} httrack [OR]

won't work always. had this one today, grabbed some hundred pages from my beloved site:

p5084d1b1.dip.t-dialin.net - - [25/Sep/2002:13:34:40 +0200] "GET /_omitted.htm HTTP/1.0" 200 2373 www.mydomain.net "_omitted.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" "-"

So, this might be better as far as I can see:
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
Besides, HTTrack seems to respect robots.txt

1host

3:06 am on Sep 26, 2002 (gmt 0)

10+ Year Member




I guess none of the code discussed here will work without mod_rewrite, what alternative is there if my server doesn't have mod_rewrite installed?

I'd really like to block these fiends as well :)

thx
tom

andreasfriedrich

1:25 pm on Sep 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So, this might be better as far as I can see:
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]

You are right in mentioning that the matching should be case insensitive (the NC flag). The '.*', however, is not neccessary, since in

RewriteCond %{HTTP_USER_AGENT} httrack [NC,OR]
the pattern is not anchored anywhere (startŠend of string). The engine will try to match the pattern anywhere in the string.

With '^httrack' the pattern is anchored at the beginning, with httrack$ at the end of the string. When you anchored your pattern at the start and end you would need the '.*' if you wanted to match httrack in a string that is not just 'httrack'. Your pattern would need to look like this: '^.*httrack.*$'. Note that this pattern does not make sense, unless you would want to grab the substring before and after the httrack.

To sum up, here is a chart of the four options mentioned above. NA = not anchored; BA = anchored at beginning; EA = anchored at end; MS = modified suggestion.

1. achttrackac (NA: +; BA: -; EA: -; MS: +;)
2. htTracKacac (NA: +; BA: +; EA: -; MS: +;)
3. aacaHttrack (NA: +; BA: -; EA: +; MS: +;)

Note that '.*' will match the '' string, since the quantifier * greedily (rather more than less) matches 0 or more times.

Andreas

58sniper

1:44 pm on Sep 26, 2002 (gmt 0)

10+ Year Member



Okay, this brings up a somewhat related question...

I'm trying to ban one site from getting to me. I want to redirect to a page called /robots.php

So I tried this:

rewriteEngine On
rewriteCond {HTTP_REFERER} ^http://(www\.)?domain.com [NC,OR]
RewriteRule ^.*$ /robots.php [L]

but that seems to block everyone. What am I doing wrong?

This 243 message thread spans 25 pages: 243