Forum Moderators: coopster & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

         

toolman

3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

pmkpmk

10:04 am on Oct 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Dave,

no, I have root access on my own server, which physically resides about 7m from where I am right now :-)

Excerpts from httpd.conf:

LoadModule rewrite_module /usr/lib/apache/mod_rewrite.so
AddModule mod_access.c
<VirtualHost a.b.c.d>
Options +FollowSymLinks
</VirtualHost>

Excerpts of .htaccess

XBitHack on
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

Error messages in Logfile:

error.log:[Wed Sep 11 11:23:27 2002] [error] [client x.y.z.z] Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden: /usr/local/httpd/virtual/....

carfac

3:16 pm on Oct 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



pmkpmk:

Yep- looks like it is not enabled. Ask your ISP to add "allowoverride all" for that directory and you should be OK!

dave

Crazy_Fool

9:09 am on Oct 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Actually my primary goal is to block adress harvesters. I don't
>>care (yet) for people downloading the whole site. But we really
>>need to get a lid on this SPAM.

same here. i'm using this file to redirect bad bots and email harvesters to a page with a list of spammers email addresses (their real email addresses, not the yahoo or hotmail addresses they send spam from). the harvesters will pick these up and spammers will end up spamming each other. if enough people do this, then eventually we could stop a lot of spam.

andreasfriedrich

10:49 pm on Oct 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you have root access you might want to check out the alternative approach described in the thread on How to centralize administration of things to block [webmasterworld.com].

Andreas

Superman

6:10 am on Oct 14, 2002 (gmt 0)

10+ Year Member



Found a new site downloader tonight:

Irvine/0.4.5a

Japanese offline browser ... multiple versions.

RewriteCond %{HTTP_USER_AGENT} ^Irvine [OR]

That'll take care of it!

-Superman-

pmkpmk

7:31 am on Oct 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



carfac:

I *AM* in this case the ISP - where is the "allowoverride" directive to be placed?

dingman

7:49 am on Oct 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Stick it in the <Directory> or <VirtualHost> (or whatever) that defines the site you're working with. In the case of your posted snippet, try:

<VirtualHost a.b.c.d>
AllowOverride All
Options +FollowSymLinks
</VirtualHost>

andreasfriedrich

1:17 pm on Oct 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You don´t need the AllowOverride [httpd.apache.org] directive if you specify Options +FollowSymLinks in the configuration file itself. AllowOverride is only used to specify which settings are allowed to me made in .htaccess files.

This is a different situation than the one in Msg #83 [webmasterworld.com] where FollowSymLinks needed to be enabled in the .htaccess file. For that to work one needs to have at least AllowOverride Options privileges.

If you have root access I would opt for Allowoverride None to turn htaccess files off entirely. You can do the configuration in the main configuration file. This saves Apache lots of stat calls to check for .htaccess files. And you won´t need the FollowSymLink Option at all since it is only neccessary in the per directory context.

Andreas

andreasfriedrich

1:34 pm on Oct 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BTW there is something strange going on in the configuration described in Msg #153 [webmasterworld.com]

Given a config as this

<VirtualHost a.b.c.d>
Options +FollowSymLinks
</VirtualHost>

and assuming the requested URI resides on the virtual host a.b.c.d I find it rather strange that Apache would complain that Options FollowSymLinks is off since it is clearly enabled.

Could it be that the requested URI is not on this virtual server but somewhere else on your server?

Andreas

58sniper

3:59 pm on Oct 16, 2002 (gmt 0)

10+ Year Member


I have a question about the order in which things should appear in .htaccess....

I have:
=====================================================
[b]# Error docs[/b]
ErrorDocument 401 /error.php?eid=401
....
ErrorDocument 500 /error.php?eid=500

[b]# RedirectPermanent for the old format to the current format (probably to be removed in favor of the search engine friendly URLs)[/b]
RedirectPermanent /divisions/comet http://www.mydomain.com/article.php?aid=25
....
RedirectPermanent /wanted http://www.mydomain.com/section.php?sid=wanted

[b]# stop the image thieves[/b]
RewriteEngine on
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://(dev\.)?mydomain.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://localhost/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://12.34.5.(6*¦7*)$ [NC]
RewriteRule \.(gif¦jpg¦zip¦pdf)$ http://www.mydomain.com/apology.gif [R,L]

[b]# Search engine friendly URLs[/b]
RewriteRule ^articles/([0-9]*) /article.php?aid=$1 [L]
....
RewriteRule^sheriff /article.php?aid=22 [L]

[b]# RewriteCond for those annoying UAs[/b]
RewriteCond %{HTTP_USER_AGENT} almaden [OR]
....
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ /robots.php [L]
=====================================================
I'm curious as to if this is the best order?

This 243 message thread spans 25 pages: 243