Forum Moderators: coopster & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

         

toolman

3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Macguru

2:12 am on Mar 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>why I would wish to lay these pests on some one else?

Because spammers spamming each other is funny.

Edge

6:01 pm on Mar 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And I thought I had a huge .htaccess file Superman. Looks like mine will be as long as yours...

I have been thinking for some time now that I should implement a script that limits the number of page views for a unique visitor. This script would help stop all those folks who change thier useragent name to gain a offline copy of my website or other unidentified email bots etc. Now this script would allow all the good spiders and other simular folks all the access they want. Currently, my average visitor views about 5.9 pages per visit. With this knowledge I could limit a visit to say.. 20 page views a day before I redirect them. The script would redirect to a "Become a Member" page that would require registration. All the registered folks would be allowed unlimited access.

What am I missing?

Superman

9:57 pm on Mar 21, 2002 (gmt 0)

10+ Year Member



Crazy_Fool,

The script works perfectly as is ... I'm certainly no expert on htaccess, but I've tested them extensively while implementing others ideas into mine. I've honestly never seen the RewriteBase / anywhere but here ... maybe it is technically correct, I don't know. It works fine without it though.

I have learned that there are multiple ways to do these things, and also that the slightest error in the file can screw up everything. For example, I once left out the space before the [OR] on one of the lines and the script did not block anything.

Edge,

That is actually only a small portion of my htaccess ... I have another one in my images folder to prevent people hotlinking my pics, another one in my members directory for password authentication and that blocks many IP addresses hackers have used to try to bust in ... all proxy servers.

I like your idea, but I would not know how to implement it ... it's a good idea though.

Bogglesworld

12:15 pm on Mar 22, 2002 (gmt 0)



Superman, tool, et al. Thanks a bunch. I went ahead and did everything you recommended here. Final question:
How can I check whether I did it correctly?
So far nothing seems wrong anyway?

To webmasterworld: Thanks. I have made good use of your glossary and the forums. What do you think of a "dictionary of spiders"? or is that too much work for something that is not really necessary. I used superman's list myself but I wonder if there are any that I shouldn't have prevented?

Edge

1:58 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bogglesworld,

Try Teleport Pro, you can change the user agent to anything you want to test your site. I suggest that you first test your site with a succesfull download before you try a blocked download.

toolman

3:31 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>>How can I check whether I did it correctly?

I like Sam Spade.org...a common tool lots of us use here. It will let you change your ua so you can test and also do head requests to see what kind of server someone is running on. A handy tool to use for diagnosing server troubles as well.

WOW. This has really bloomed. I'm glad to see so many people finding the rewrite script handy. I can't really take credit for it, as I just peiced together what littleman, Air, Gorufu and others posted for individual situations. I'm not so concerned with the email bots as I have no email addresses in my sites...except for that IndyLibrary one. That thing will shred your site in 2 seconds flat.

I'm more concerned with things like Front Page and other "theft" bots. One of the really cool aspects of that script is the blocking of that annoying iaea.org screen scraper or what ever it is. I dont think we've really figured out precisely what it is doing (it's certainly raised my awareness of atomic issues ;) ).

I'm just like the rest of you...learning regex as I go. It's a good time for some of the *nix geeks to shine. This has really brought out one of the strengths of WMW....the collective experience of webmasters pitching in to acheive a common goal.

Visi

4:26 am on Mar 24, 2002 (gmt 0)

10+ Year Member



New to this, but have a "website quester" hitting my site a lot, always same time. Is this a bot, or someone downloading site? Any advice appreciated, and also some direction omn a good reference site on the robot file if they exist, so I can learn about it.

Thanks

Superman

5:20 am on Mar 24, 2002 (gmt 0)

10+ Year Member


That's the offline browser Website Extractor.

http://www.esalesbiz.com/extra/

I'd add it to my htaccess above to block it. It usually shows up in my logs as Website eXtractor, but I see others get it as Website Quester ... simply blocking all agents beginning with Website will take care of it.

RewriteCond %{HTTP_USER_AGENT} ^Website [OR]

knight

8:33 pm on Apr 23, 2002 (gmt 0)



what is the Microsoft.URL and how is it harmfull?

Brendan

richlowe

10:45 pm on Apr 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anyone know how to do this with IIS?
This 243 message thread spans 25 pages: 243