Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list



3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]


10:04 am on Oct 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Hi Dave,

no, I have root access on my own server, which physically resides about 7m from where I am right now :-)

Excerpts from httpd.conf:

LoadModule rewrite_module /usr/lib/apache/mod_rewrite.so
AddModule mod_access.c
<VirtualHost a.b.c.d>
Options +FollowSymLinks

Excerpts of .htaccess

XBitHack on
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

Error messages in Logfile:

error.log:[Wed Sep 11 11:23:27 2002] [error] [client x.y.z.z] Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden: /usr/local/httpd/virtual/....


3:16 pm on Oct 11, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member


Yep- looks like it is not enabled. Ask your ISP to add "allowoverride all" for that directory and you should be OK!



9:09 am on Oct 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

>>Actually my primary goal is to block adress harvesters. I don't
>>care (yet) for people downloading the whole site. But we really
>>need to get a lid on this SPAM.

same here. i'm using this file to redirect bad bots and email harvesters to a page with a list of spammers email addresses (their real email addresses, not the yahoo or hotmail addresses they send spam from). the harvesters will pick these up and spammers will end up spamming each other. if enough people do this, then eventually we could stop a lot of spam.


10:49 pm on Oct 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

If you have root access you might want to check out the alternative approach described in the thread on How to centralize administration of things to block [webmasterworld.com].



6:10 am on Oct 14, 2002 (gmt 0)

5+ Year Member

Found a new site downloader tonight:


Japanese offline browser ... multiple versions.

RewriteCond %{HTTP_USER_AGENT} ^Irvine [OR]

That'll take care of it!



7:31 am on Oct 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member


I *AM* in this case the ISP - where is the "allowoverride" directive to be placed?


7:49 am on Oct 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Stick it in the <Directory> or <VirtualHost> (or whatever) that defines the site you're working with. In the case of your posted snippet, try:

<VirtualHost a.b.c.d>
AllowOverride All
Options +FollowSymLinks


1:17 pm on Oct 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

You don´t need the AllowOverride [httpd.apache.org] directive if you specify Options +FollowSymLinks in the configuration file itself. AllowOverride is only used to specify which settings are allowed to me made in .htaccess files.

This is a different situation than the one in Msg #83 [webmasterworld.com] where FollowSymLinks needed to be enabled in the .htaccess file. For that to work one needs to have at least AllowOverride Options privileges.

If you have root access I would opt for Allowoverride None to turn htaccess files off entirely. You can do the configuration in the main configuration file. This saves Apache lots of stat calls to check for .htaccess files. And you won´t need the FollowSymLink Option at all since it is only neccessary in the per directory context.



1:34 pm on Oct 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

BTW there is something strange going on in the configuration described in Msg #153 [webmasterworld.com]

Given a config as this

<VirtualHost a.b.c.d>
Options +FollowSymLinks

and assuming the requested URI resides on the virtual host a.b.c.d I find it rather strange that Apache would complain that Options FollowSymLinks is off since it is clearly enabled.

Could it be that the requested URI is not on this virtual server but somewhere else on your server?



3:59 pm on Oct 16, 2002 (gmt 0)

10+ Year Member

I have a question about the order in which things should appear in .htaccess....

I have:
[b]# Error docs[/b]
ErrorDocument 401 /error.php?eid=401
ErrorDocument 500 /error.php?eid=500

[b]# RedirectPermanent for the old format to the current format (probably to be removed in favor of the search engine friendly URLs)[/b]
RedirectPermanent /divisions/comet http://www.mydomain.com/article.php?aid=25
RedirectPermanent /wanted http://www.mydomain.com/section.php?sid=wanted

[b]# stop the image thieves[/b]
RewriteEngine on
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://(dev\.)?mydomain.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://localhost/.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://12.34.5.(6*¦7*)$ [NC]
RewriteRule \.(gif¦jpg¦zip¦pdf)$ http://www.mydomain.com/apology.gif [R,L]

[b]# Search engine friendly URLs[/b]
RewriteRule ^articles/([0-9]*) /article.php?aid=$1 [L]
RewriteRule^sheriff /article.php?aid=22 [L]

[b]# RewriteCond for those annoying UAs[/b]
RewriteCond %{HTTP_USER_AGENT} almaden [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ /robots.php [L]
I'm curious as to if this is the best order?


7:48 pm on Oct 19, 2002 (gmt 0)

10+ Year Member

After visits from a s*x-spambot abusing goggle
(acb12246.ipt.aol.com - - [11/Oct/2002:14:08:51 +0200] "GET /mypoorpage.htm HTTP/1.1" 200 9075 www.mydomain.net "http://www.google.de/search?q=Guestbook+Jewel&num=100...start=400&sa=N" "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)" "-" )
i inserted the following 2 lines. there's imho no real reason to search for guestbook, except for spambots.

RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]
RewriteCond %{HTTP_REFERER} q=g%E4stebuch [NC,OR]

works for msn search also.


8:58 pm on Oct 19, 2002 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member


Since your rewrite rules all end with the [L] flag, it doesn't really matter what order you put them in. You can choose to do the "good guys" rewrites first, or the "bad guys" first (depending on how many of each you get) in order to speed up (slightly) the majority of requests.

Only when the output from one rewrite needs to be processed by subsequent rules does the order matter all that much. In that case, you wouldn't be using the [L] flag on each ruleset.


That's an interesting exploit - I haven't seen that one yet, but I'll keep an eye out!



5:58 pm on Oct 24, 2002 (gmt 0)

10+ Year Member

I have an issue....

Seems the site flipdog.com has been snatching my content. Additionally, they've been doing a pretty crappy job at displaying it. (and the fact that many things on my site have changed since they've archived things, it only makes it worse) I'd like to prevent any requests from flipdog.com. This includes requestes for files from the content they've already archived, as well as any attempts to get new content.

Can someone tell me if this would work:

RewriteCond %{HTTP_REFERER} ^http://(www\.)?flipdog.com/*$ [NC]
RewriteRule ^.* /robots.php [R,L]

I already have a RewriteRule in place to protect images and other files, and that's working fine. But I'd like to prevent them from getting anything in the future. I don't see what UA they are using, so I tend to believe that they are masking that.

Should I be looking at something else to block them as well?


6:55 pm on Oct 24, 2002 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member


The "/*$" at the end of your RewriteCond pattern isn't quite right. Just leave the "$" end anchor off to match anything that starts with the pattern. Escape the 2nd period with "\" too.

I don't see any point in redirecting to your robots.txt. How about just a 403 response? Also, neither anchor is needed in the RewriteRule pattern - just ".*" will do.

RewriteCond %{HTTP_REFERER} ^http://(www\.)?flipdog\.com [NC]
RewriteRule .* - [F,L]

Returns a 403-Forbidden response and no content.



7:28 pm on Oct 24, 2002 (gmt 0)

10+ Year Member

Actually, I'm not redirecting to my robots.txt file, but to a file robots.php, which has some content on it.

I'll try your suggestions...



2:01 pm on Oct 25, 2002 (gmt 0)

10+ Year Member

Okay, I've determined that the UA for FlipDog is

"Mozilla/4.7 (compatible; FlipDog; http://www.whizbang.com/crawler)"

in case anyone else wants to block this as well.

RewriteCond %{HTTP_USER_AGENT} ^FlipDog [OR]

should work......


11:52 pm on Oct 25, 2002 (gmt 0)

10+ Year Member


The "^" at the beginning of your RewriteCond pattern isn't quite right. You might try removing the "^"

RewriteCond %{HTTP_USER_AGENT} FlipDog [OR]

You can test any new RewriteCond using WannaBrowser see Message #63 [webmasterworld.com]


9:29 pm on Oct 26, 2002 (gmt 0)

10+ Year Member

Here is the UA list from my site, who do I need to worry about here?

sitecheck.internetseer.com (For more info see: http:¦x
FAST WebCrawler¦3
Openfind data gatherer, Openbot¦3
Mercator 2.0
Scooter 3.2.FNR
Scooter 3.2.EX
PingALink Monitoring Services 1.0 (http:¦x
libwww perl¦5
ah ha.com crawler (crawler@ah ha.com)
NationalDirectory WebSpider¦1
curl¦7 4
oBot 4
Internet Explore 5.x
Microsoft URL Control 6.00.8862
Scooter 3.2
FreeFind.com SiteSearchEngine¦1
OneStop Webmaster; http:¦x
appie 1.1 (www.walhello.com)
our agentlibwww perl¦5
Scooter ARS 1.1
Snoopy v0.1
Xenu Link Sleuth 1.2a
(Teradex Mapper; mapper@teradex.com; http:¦x
Microsoft URL Control 6.00.8169
Teleport Pro¦1
IE 5.5 Compatible Browser
Scooter 3.2.QA
metacarta (crawler@metacarta.com)
Scooter 3.2.SF0
Python urllib¦1
Mewsoft Search Engine wwWebmasterWorldsoft.com¦4
Xenu_s Link Sleuth 1.1c
Rex Swain_s HTTP Viewer (http:¦x
lwp request¦2
Snoopy v0.94
Scooter 3.2.SB
Scooter 3.2.BT
COAST WebMaster (Windows NT)
DISCo Pump 3.2
Zeus 2895 Webster Pro V2.9 Win32
AbachoBOT (Mozilla compatible)
IP*Works! V5 HTTP¦x
A WinHTTP Example Program¦1
antibot V1.1.9¦x
NetMechanic Page Primer
Robot: NutchCrawler, Owner: wdavies@acm.org
rabaz (rabaz at gigabaz dot com)
Linkbot 3.0
Net Probe

Listed here in order hits.


4:31 am on Nov 13, 2002 (gmt 0)


I doubt the use of Rewrite rules is the adequate solution to keep robots away from your files. It consumes a lot of CPU power on the server (esp. with these large lists) and still doesn't do the job very well. I have downloaded entire sites myself for several reasons and know how easy it is to fake the UA or to avoid per User traffic/max. connection limits by using a couple of proxies.

Why don't you use JacaScript for this? Most sites require it anyway. This way you can keep all the bots away from you files if you use a JavaScript Funktion to generate the URLs instead of using static URLs for links or images.

In fact I'm using this approach on my site for some time now and it works perfectly. Since this I didn't find a single download bot access pattern in my logs anymore. Apparently none of these tools are able to process Javascript.


5:01 am on Nov 13, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Welcome to WebmasterWorld [webmasterworld.com], garwk.

Well, I guess it´s a tradeoff between security on the one hand and usability on the other hand.

If you use a JavaScript function to generate URLs on the client side you will shut out all users with Javascript turned off. And you would still need a way to make your site accessable to the SE spiders. Unless you use IP based cloaking for that a user would still be able to pose as some SE spider.

While the .htaccess approach has its shortcomings I´m still not convinced that a client side JavaScript solution would be any better.

I guess just like you can´t prevent people from copying a book you cannot prevent them from copying your website. The only thing you can do is make it harder for them.



11:35 am on Nov 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Unfortunately I still have it not working (see earlier posts), but I hadn't had the time yet to look into it further (fate of a part-time-webmaster).

Bute there's a new issue: anyobody ever heard of a user-agent calling itself "GraphicBrain.com"?

This special agent seems to download the whole site (which - in theory - I don't mind) but it produces such long logfile entries that my logfile analyzer crashes :-(

Example of ONE(!) logfile line:

[edited by: jatar_k at 5:07 pm (utc) on Nov. 15, 2002]
[edit reason] fixed side scroll [/edit]


11:51 am on Nov 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Hi Andreas,

in reply to message number 161 [webmasterworld.com...]

Well, as soon as I have the .htaccess in the subdirectory of the virtual server, Apache won't reload the config or restart - it exists with error.

The requested URI should be on the same virtual server. Actually configwise I've taken the default config of Apache and all my modifications mostly were in the virtual hosts section.

I'm a bit reluctant to post uncensored configfiles and logfile exceprts here on this public space, but up to my best knowledge (which may not be much) I think I made it right.

I guess it's only one little configuration routine which is faulty or missing.

Since I have all root priviliges, I'm not limited to htaccess but can make changes to other parts of the config as well. As I mentioned in another post I'm only trying to block email harvesters.

So what would be your recommendation?

[edited by: jatar_k at 5:10 pm (utc) on Nov. 15, 2002]
[edit reason] fixed link and sidescroll [/edit]


1:46 pm on Nov 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Hi pmkpmk,

I´m in a hurry right now and need to catch a train in half an hour. I´ll get back to you tommorrow unless somebody else already helped you solve your problem.



10:22 am on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Thanks to Andreas Friedrich, we solved my mysterious problem!

Andreas found out, that I had a directive:

<Files index.html>
Options -FollowSymLinks +Includes

in my httpd.conf. Even though according to the documentation the "Options"-line should be ignored, it actually isn't.

After removing the "-FollowSymLinks" from the statement, everything works as supposed.


11:34 am on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

Aren't you harassed by the "e-mail spyder" (www.emailspyder.com)?

I has the user-agent "Microsoft URL Control" and - well - spiders for email addresses.

RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [NC,OR]


1:15 pm on Nov 19, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

The second line of defense (slightly off topic, but worth mentioning it):

If you're lucky enough to run your own mailserver under your own control, you can add a second line of defense: the use of realtime blacklists (somtimes also called realtime blocklist or RBL's) in your mailserver allows you to block potential spam when the spammer tries to deliver it to you. On EACH incoming email, the mail-server checks at least one of these RBL's. If the senders IP-address tests positive on this list, email delivery is instantly cancelled even BEFORE the mail-data is transferred to your server. There's a multitude of RBL's out there. Our server checks EACH incoming message against 5 different RBL's. Some of our users - including myself - post-check their messages again against other RBL's. I - for example - have all messages coming from Russia/China/Korea/Malaysia etc. tagged with the prefix "**SPAM**". This second (and third) line of defense makes life a lot esier!


11:52 am on Nov 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member

As mentioned above, thanks to Andreas Freidrich everything works fine now, and I "catch" typically 1-2 mail harvesters per day that way. I downloaded a few of those bugs myself to get a feeling how they work.

And now the $1.000.000 prize question is: what hinders a programmer of these bugs to "steal" the user-agent string of - say - IE5.0?

Am I right in thinking that a bot camouflaging itself as IE50 would be COMPLETELY invisible to .htaccess rewrite rules?


4:42 pm on Nov 28, 2002 (gmt 0)

10+ Year Member


I've been reading through this thread and having used htaccess to secure areas of other websites I thought I'd test out the concepts on a dormant web site on my server.

But when I add the file which contains :-

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]

I find I can't get into any of the pages.

The site is a virtual site on a server I have root access to and I've checked the httpd.conf to see that rewriteengine is on in each of the virtual sites.

Can anybody suggest what I'm doing wrong?


Later update

I've checked my server logs and I'm getting the message:-
RewriteEngine not allowed here

So now I'm really confused

Later still update

Solved it, I needed to amend the access.conf to allow overide on fileinfo


6:06 pm on Nov 29, 2002 (gmt 0)

Hi All,

I'm very impressed with the knowledge shown in this thread! I've read it at least once but I still have a question.

What if you want to ban certain countries using ReWriteCond? How do I do that?

Right now I'm using:

deny from .at
deny from .bg

The problem with that is that it even denies my error pages so I'd like to switch over to ReWriteCond instead so that I can give them a page with a reason why they can't reach my site.

Another question... does anyone know how I test to see if the country ban is working correctly? wannabrowser.com works great for referrers but has no provisions for testing from offshore or from a specific IP location.

Thanks for the help.



10:34 pm on Dec 3, 2002 (gmt 0)

10+ Year Member

okidata, I've been wondering something similar. My ban list takes the form of:

SetEnvIf Remote_Addr ^12\.40\.85\. getout
SetEnvIfNoCase User-Agent ^Microsoft.URL getout

<Limit GET POST>
order allow,deny
allow from all
deny from env=getout

This is working fine but how can I show a custom error message without implementing this all using mod_rewrite? Also how can I do a redirect if getout is set? Thanks.

This 243 message thread spans 9 pages: 243

Featured Threads

Hot Threads This Week

Hot Threads This Month