Forum Moderators: coopster & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

         

toolman

3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

andreasfriedrich

2:02 pm on Sep 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



what alternative is there if my server doesn't have mod_rewrite installed?

You could use mod_access and mod_setenvif which are compiled and loaded into the server by default. They should be available unless you or your hosting company removed them.

Deny [httpd.apache.org] is used to restrict access to the server based on hostname, IP address, or environment variables. Hostname and IP won´t work, so we need a way to set environment variables depending on the User-Agent. SetEnvIf [httpd.apache.org] allows us to do just that. Preferrably we would like the matching to be case insensitive. Luckily the Apache developers provided a method to do just that SetEnvIfNoCase [httpd.apache.org].

Now we need to put those pieces together.

SetEnvIfNoCase User-Agent EmailSiphon AC_FORBIDDEN
SetEnvIfNoCase User-Agent EmailWolf AC_FORBIDDEN
SetEnvIfNoCase User-Agent Crescent AC_FORBIDDEN
SetEnvIfNoCase User-Agent LinkWalker AC_FORBIDDEN
SetEnvIfNoCase User-Agent EmailCollector AC_FORBIDDEN
Order Allow,Deny
Allow from all
Deny from env=AC_FORBIDDEN

As with the regular expression in the RewriteCond directive you could just use one SetEnvIfNoCase [httpd.apache.org] like this:

SetEnvIfNoCase User-Agent EmailSiphon¦EmailWolf¦Crescent¦LinkWalker¦EmailCollector AC_FORBIDDEN
Order Allow,Deny
Allow from all
Deny from env=AC_FORBIDDEN

where everything from SetEnvIfNoCase to AC_FORBIDDEN would need to be in a single line.

Andreas

andreasfriedrich

2:20 pm on Sep 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What am I doing wrong?

Lose the OR flag and add a % sign in front of {HTTP_REFERER}.
You don´t need the pattern in the RewriteRule to be anchored.

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(www\.)?domain.com [NC]
RewriteRule .* /robots.php [L]

Andreas

58sniper

2:45 pm on Sep 26, 2002 (gmt 0)

10+ Year Member



Ok. I figured out the OR issue myself, but the % did get it to work correctly.

Thanks!

58sniper

3:56 pm on Sep 26, 2002 (gmt 0)

10+ Year Member



On to the next issue -

Can anyone tell me what "RewriteCond: bad flag delimiters" means (other than the obvious)? As soon as I plug in the following to my .htaccess, I'm getting 500 errors, and "RewriteCond: bad flag delimiters" shows up in the error_log.

RewriteCond %{HTTP_USER_AGENT} ^Mozilla* [OR]
RewriteCond %{HTTP_USER_agent} .*almaden.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_agent} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_agent} ^attach [OR]
RewriteCond %{HTTP_USER_agent} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_agent} ^BackWeb [OR]
RewriteCond %{HTTP_USER_agent} ^Bandit [OR]
RewriteCond %{HTTP_USER_agent} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_agent} ^Buddy [OR]
RewriteCond %{HTTP_USER_agent} ^bumblebee [OR]
RewriteCond %{HTTP_USER_agent} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_agent} ^CICC [OR]
RewriteCond %{HTTP_USER_agent} ^Collector [OR]
RewriteCond %{HTTP_USER_agent} ^Copier [OR]
RewriteCond %{HTTP_USER_agent} ^Crescent [OR]
RewriteCond %{HTTP_USER_agent} ^DA [OR]
RewriteCond %{HTTP_USER_agent} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_agent} ^DISCo\Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_agent} ^Download\ Wonder [OR]
RewriteCond %{HTTP_USER_agent} ^Downloader [OR]
RewriteCond %{HTTP_USER_agent} ^Drip [OR]
RewriteCond %{HTTP_USER_agent} ^DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_agent} ^EasyDL/2.99 [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_agent} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_agent} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_agent} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_agent} ^GetSmart [OR]
RewriteCond %{HTTP_USER_agent} ^gigabaz [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go\!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_agent} ^gotit [OR]
RewriteCond %{HTTP_USER_agent} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_agent} ^grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_agent} ^httpdown [OR]
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
RewriteCond %{HTTP_USER_agent} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_agent} ^Indy*Library [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_agent} ^InternetLinkagent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_agent} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_agent} ^Iria [OR]
RewriteCond %{HTTP_USER_agent} ^JBH*agent [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_agent} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_agent} ^LexiBot [OR]
RewriteCond %{HTTP_USER_agent} ^lftp [OR]
RewriteCond %{HTTP_USER_agent} ^Link*Sleuth [OR]
RewriteCond %{HTTP_USER_agent} ^likse [OR]
RewriteCond %{HTTP_USER_agent} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_agent} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_agent} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_agent} ^Memo [OR]
RewriteCond %{HTTP_USER_agent} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_agent} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_agent} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_agent} ^Mozilla*MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^MS\ FrontPage* [OR]
RewriteCond %{HTTP_USER_agent} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_agent} ^MSProxy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_agent} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_agent} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_agent} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_agent} ^Ping [OR]
RewriteCond %{HTTP_USER_agent} ^PingALink [OR]
RewriteCond %{HTTP_USER_agent} ^Pockey [OR]
RewriteCond %{HTTP_USER_agent} ^psbot [OR]
RewriteCond %{HTTP_USER_agent} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_agent} ^Reaper [OR]
RewriteCond %{HTTP_USER_agent} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_agent} ^Seeker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_agent} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_agent} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_agent} ^Snake [OR]
RewriteCond %{HTTP_USER_agent} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_agent} ^Stripper [OR]
RewriteCond %{HTTP_USER_agent} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_agent} ^Szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_agent} ^URLSpiderPro [OR]
RewriteCond %{HTTP_USER_agent} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_agent} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web Downloader [OR]
RewriteCond %{HTTP_USER_agent} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebHook [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMiner [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_agent} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_agent} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_agent} ^x-Tractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_agent} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ /robots.php [L]

58sniper

4:38 pm on Sep 26, 2002 (gmt 0)

10+ Year Member



I believe part of the problem is with:
RewriteCond %{HTTP_USER_AGENT} .*almaden.* [OR]

So I changed it to:
RewriteCond %{HTTP_USER_AGENT} almaden [OR]

I also determined that some of the problem was with:
RewriteCond %{HTTP_USER_AGENT} ^Web Downloader [OR]
It didn't escape the space.

This appears to have resolved the problems.

bull

5:25 pm on Sep 26, 2002 (gmt 0)

10+ Year Member



RewriteCond %{HTTP_USER_agent} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_agent} ^EmailWolf [OR]

so IMHO can be reduced to

RewriteCond %{HTTP_USER_agent} email [NC,OR]

as I don't see any legitimate UA has "email" in its name.

58sniper

5:58 pm on Sep 26, 2002 (gmt 0)

10+ Year Member



Yeah, I'm going to consolidate. I can probably do the same with "download" "grab" "bot" and "spider"

andreasfriedrich

6:09 pm on Sep 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



GoogleBot :o

Andreas

58sniper

8:05 pm on Sep 26, 2002 (gmt 0)

10+ Year Member



Ya know, this has got me thinking....

Wouldn't it be easier to just write the .htaccess on what to allow, instead of what to deny?

stapel

8:30 pm on Sep 26, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think this would require way too many listings on what to allow, and what then would happen when a new browsing product came out that you didn't know about yet?

As depressingly long as these "deny" lists can get, I think they're still the better way to go.

Eliz.

This 243 message thread spans 25 pages: 243