Welcome to WebmasterWorld Guest from 54.167.253.186

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

     
3:30 am on Oct 23, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

10:09 am on Sept 11, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


The correct directive to enable the FollowSymLinks feature in your .htaccess file would be
Options [httpd.apache.org] +FollowSymLinks

For that to works you need to have at least

AllowOverride [httpd.apache.org] Options
privileges. Those are set in the server config, virtual host context.
3:08 pm on Sept 12, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 26, 2001
posts:1076
votes: 0


i've added the .htaccess list to a new site i've just built, but i've set it to redirect bad bots to a php page that lists all the email addresses of companies that have sent me spam. hopefully the email harvesters will pick up all those email addresses and add the spammers to other spam lists. maybe they'll end up spamming each other into submission?

when spam comes in, i check the actual company site to get their genuine email addresses then manually add their email addresses to a mysql database. this means only genuine mailboxes get listed on the page and not the yahoo or hotmail addresses the spam is often sent from.

i've also added a simple browser check to the php page so that if an IE / Netscape / Opera user visits the page they will only see a normal forbidden message .... well, that's the theory, but i've not been able to test it yet. just need a "browser" or something that will let me set the UA to whatever i want .....

11:56 pm on Sept 12, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 31, 2002
posts:43
votes: 0


This question is for Pushycat:

Thanks for the browscap.ini & sample IIS code, which I got from your site! I want to make sure I understand their use. I implement the browscap.ini (or whatever parts of it I want), and then I implement the code in global.asa for each robot I want to ban? Or just the one block of code gets revised to include each robot to be banned? (Or -- is every robot in the browscap.ini banned, so I should only include those which I wish to ban?) My partner is much more of a web programmer than I am and would take care of this, but I want to make sure I understand what needs to be done first!

Thanks a lot,
Snark

10:50 am on Sept 13, 2002 (gmt 0)

New User

10+ Year Member

joined:July 3, 2002
posts:39
votes: 0


Hi all,
Thanks so much for the info in this thread,
I use .htaccess and have edited the list here a bit. I want to include it in my existing .htaccess file which has a couple of extra rules in it. Will the following work ok? Is it in the right order etc?

ErrorDocument 404 /404.htm
ErrorDocument 400 /404.htm
ErrorDocument 403 /404.htm
ErrorDocument 501 /404.htm
ErrorDocument 502 /404.htm
ErrorDocument 503 /404.htm

<FilesMatch "htm([l])*$">
ForceType application/x-httpd-php
</FilesMatch>

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Thanks

Anni

Arcie

2:29 am on Sept 15, 2002 (gmt 0)

Inactive Member
Account Expired

 
 


Hi Toolman nice compilation of nasty bots! Have you tried sticking the re-writer in httpd.conf? It would run fastest there, although you noted that there was no noticeable speed difference as it is.

First post!

Since I run a virtual server with a number of different domains, it seems to me it would make more sense to put my list of forbidden UAs in the httpd.conf file, rather than try to replicate them in .htaccess on each domain's document root. Are there any caveats or special directions I should follow before I proceed?

Thanks!

Randy

[edited by: jatar_k at 12:04 am (utc) on Sep. 16, 2002]
[edit reason] no sigs please [/edit]

11:39 am on Sept 15, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


Hi Randy and welcome to Webmaster World [webmasterworld.com].

As shown in post #77 [webmasterworld.com] putting your RewriteRules in httpd.conf is indeed faster and the way to go when you have access to it.

However, this will not solve the problem of applying those rules to all virtual servers. You cannot just put the rewriting code in the main section and expect it to work for all virtual servers. For an explanation on this see API Phases [httpd.apache.org] in the mod_rewrite URL Rewriting Engine documentation.

So, after [...] Apache has determined the corresponding server (or virtual server) the rewriting engine starts processing of all mod_rewrite directives from the per-server configuration in the URL-to-filename phase.

my emphasis

There´s also a thread How (and Where) best to control access [webmasterworld.com] that you might want to read on this topic. If you have mod_perl you might want to use the solution mentioned in this thread. Ask carfac [webmasterworld.com] for the modified version of BlockAgent.

And as a sidenote. Do not drop any URLs. Do not use a signature.

1:52 am on Sept 16, 2002 (gmt 0)

New User

joined:Sept 16, 2002
posts:39
votes: 0


Hi,
First, thank you for having this great place, where I was I able to learn more last two weeks then in 6 month since I decided to have my first site.
I am not yet very familiar with .httaccess and when I try to modify it, it always give me an error.
There is text in it that was left there by my host:

# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

DirectoryIndex index.html index.htm index.php index.phtml index.php3

# AddType application/x-httpd-php .phtml
# AddType application/x-httpd-php .php3
# AddType application/x-httpd-php .php
#
# Action application/x-httpd-php "/php/php.exe"
# Action application/x-httpd-php-source "/php/php.exe"
# AddType application/x-httpd-php-source .phps

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

AuthName www.XXXXXX.com
AuthUserFile /www/XXXXXX/_vti_pvt/service.pwd
AuthGroupFile /www/XXXXXX/_vti_pvt/service.grp

Should I remove this before pasting bans or simply add?

Thank you

1:07 pm on Sept 20, 2002 (gmt 0)

New User

10+ Year Member

joined:Sept 20, 2002
posts:28
votes: 0


Dunno if this helps, but I've found this list: [psychedelix.com...]
3:09 pm on Sept 20, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 20, 2002
posts:735
votes: 1


I keep having my stuff stolen (by teachers, not students), and, when I can tell what they used, they downloaded my site using FrontPage.

Can I use this Rewrite stuff to block FrontPage from downloading my site? (I know the educators can still get my stuff from their browser's cache, etc, etc, but it would be nice to make them work at stealing, rather than having it be so easy, ya know?)

Thanks!

[edited by: jatar_k at 4:44 pm (utc) on Mar. 13, 2003]

4:30 pm on Sept 20, 2002 (gmt 0)

New User

10+ Year Member

joined:Sept 20, 2002
posts:28
votes: 0


Hmmm...PHPINFO shows FrontPage 2002 (XP) as
Mozilla/2.0 (compatible; MS FrontPage 5.0)

Dunno if that helps.

This 243 message thread spans 25 pages: 243