Forum Moderators: coopster & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

         

toolman

3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

ratboy

9:16 pm on Apr 9, 2003 (gmt 0)



Oh, rather than clutter up this section with more samples, I put what I gather is more or less the version that includes most of the stuff people have added in a text file tech.ratmachines.com/downloads/sample_wbmw.txt

If there are more things that should be added please post them,
Thanks

notsleepy

7:44 pm on Apr 10, 2003 (gmt 0)

10+ Year Member



ratboy: Good idea on the central location for the file.

I think I have one more for you to add:

RewriteCond %{HTTP_USER_AGENT} ^GornKer [OR]

I couldn't find any information on it but it never touched my robots.txt.

ratboy

12:39 am on Apr 11, 2003 (gmt 0)



Thanks, I'll keep it as up to date as I can. The thing that became quickly obvious from reading this really educational discussion forum was that the technique I had been wanting to use, a robots.txt exclusion, was a complete waste of time, since the only thing any self-respecting spider/crawler programmer would do with that information would be to seek out the areas that were explicitly denied.

The .htaccess file idea seems like a much better stop gap measure, and much more versatile, and easier to implement. I'll stop in now and then and see if there is anything more to add to it. Kudos to webmasterworld for having forums and contributors that actually can teach you something and not waste your time.

Oaf357

1:48 am on Apr 11, 2003 (gmt 0)

10+ Year Member



Okay. I tried to implement the central .htaccess file but got some unusual errors. Any ideas if there is anything missing from that file that would keep it from working?

ratboy

6:44 am on Apr 11, 2003 (gmt 0)



Oaf357 - I don't claim any expertise in this stuff, all I can say is that this is what I cut and pasted directly out of this forum, with a few spider additions, which shouldn't change how the script runs. You might try cutting out the first lines of

RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]

(just the referer ones with guestbook, and see if that makes a difference?), also cut out the first line source comment, just to be on the safe side, then see if you get the same errors.

I've been running it for a few days, without any errors, but that's just one server on one webhoster, so I can't tell you there's nothing wrong with it, maybe some of the other people who have contributed can take a look at it tech.ratmachines.com/downloads/sample_wbmw.txt
here and let us know.

Here are the first and last lines of the script, however, if someone can spot an error (the dots represent the cut out part:
===============================

RewriteEngine On
RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]
.....................
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
RewriteCond %{HTTP_USER_AGENT} ^ZyBorg

RewriteRule ^.* - [F,L]

===============================

You might want to post what errors you got exactly, then somebody might be able to help you, I'm not very good at this stuff, but some of the people on this forum are.

ratboy

8:07 pm on Apr 11, 2003 (gmt 0)



Here is a useful thing from [apache-server.com...]
on basic .htaccess trouble shooting. It might help.
==============================================
==============================================

Troubleshooting

Here are some of the most common problems I've seen people have (or have had myself) with .htaccess files. One thing I should stress first, though: the server error log is your friend. You should always consult the error log when things don't seem to be functioning correctly. If it doesn't say anything about your problem, try boosting the message detail by changing your LogLevel directive to debug. (Or adding a LogLevel debug line of you don't have a LogLevel already).

'Internal Server Error' page is displayed when a document is requested
This indicates a problem with your configuration. Check the Apache error log file for a more detailed explanation of what went wrong. You probably have used a directive that isn't allowed in .htaccess files, or have a directive with incorrect syntax.

.htaccess file doesn't seem to change anything
It's possible that the directory is within the scope of an AllowOverride None directive. Try putting a line of gibberish in the .htaccess file and force a reload of the page. If you still get the same page instead of an 'Internal Server Error' display, then this is probably the cause of the problem. Another slight possibility is that the document you're requesting isn't actually controlled by the .htaccess file you're editing; this can sometimes happen if you're accessing a document with a common name, such as index.html. If there's any chance of this, try changing the actual document and requesting it again to make sure you can see the change. this isn't happening.

I've added some security directives to my .htaccess file, but I'm not getting challenged for a username and password
The most common cause of this is having the .htaccess directives within the scope of a Satisfy Any directive. Explicitly disable this by adding a Satisfy All to the .htaccess file, and try again.

jdMorgan

5:38 am on Apr 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Syntax errors in the list posted here:

Don't use quotes for mod_rewrite patterns. That's for RedirectMatch syntax.
Comments should be on their own line - Otherwise, you will get warnings if you have that log-level set.

So,

RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [OR] # spambot
should be

# spambot
RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL\ Control [OR]

In the rule,
RewriteRule !err_¦robots\.txt - [F,L]
the underscore in _"err_" needs to be escaped - precede it with a "\".
Also, the alternates in the pattern probably need to be delimited with parentheses:
RewriteRule !(err\_¦robots\.txt) - [F,L] 

Also, the broken vertical pipe "¦" character above must be changed to a solid vertical pipe before it can be used in .htaccess.

HTH,
Jim

jdMorgan

5:41 am on Apr 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



pmkpnk,

Bad bot script: [webmasterworld.com...]

(See the links at the top of that thread for even more "historical" information on the subject.)

Jim

ladymindy

12:17 am on May 11, 2003 (gmt 0)

10+ Year Member



I added an htaccess script as described in this forum. It took away all incidences of the off line browsers except 1. I just noticed an entry for teleportpro/ in my logs
(Agent: Teleport Pro/1.29.1718) for my message board. How did this get through when I used the statement:
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR] ?

Did I write this wrong?
Thanks

ladymindy

boxturt

3:57 am on May 11, 2003 (gmt 0)



I've been pouring over this forum for hours; trying, tweaking, etc. I have learned so much!

No problem blocking Teleport Pro. Except then I discovered it can be set to disguise itself as
(compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Hotbar 4.0) as well as a few other things.

Now I'm really confused. I can't very well block that right?!
Suggestions?

Ty

This 243 message thread spans 25 pages: 243