Forum Moderators: coopster & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

         

toolman

3:30 am on Oct 23, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

mivox

10:09 pm on Jan 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dredging this thread out of the depths of time.

Could someone please translate this line:

RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Just wondering exactly what's happening there....

idiotgirl

12:46 am on Jan 9, 2002 (gmt 0)

10+ Year Member Top Contributors Of The Month



RewriteRule !^http://[^/.]\.your-site.com.* - [F]

is shorthand for "Get the hell out and don't come back 'cuz you aren't viewing a darned thing from this (my domain) today and as far as I'm concerned you get the big 'F' meaning - I (my domain) does not exist to you."

At least, that's my understanding. Apache has all that neat stuff posted. I forget most if it - always have to refer back.

"I'm not a smart man, Jenny" - Forrest Gump aka idiotgirl
<added>not a sig - just how I feel today,</added>

toolman

1:43 am on Jan 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mivox thats blocking that screen scraper from iaea.org

pshea

5:57 pm on Jan 9, 2002 (gmt 0)

10+ Year Member



Thank you Toolman for the list.

I have added these to my htaccess which I have never really fooled around with before. Having now added these, can you tell me what I can expect?

Will its effect be a "lack of" data, meaning if these bots are excluded, my (a) logs will be smaller and (b) fewer email harvesters leading to less junk email and (c) less usage on the server. Have I got its' benefits right?

toolman

6:07 pm on Jan 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can expect a slight performance hit on your server...nothing major.

I really don't worry too much about email harvesters as I don't put email addresses on my site. The ones that iritate me are the site rippers. This is the latest version.

I know it could be shortened so if you're a unix geek please quit snickering and help us on the regex stuff. Thanks for your support ;)

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.yo-do-main.net.* - [F]

rcjordan

6:22 pm on Jan 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>expect

I installed TM's htaccess about 2 months ago, along with a trial run of a script to email me when one of these tripped an error code. Luckily, I decided to run it on a single site rather than 40 of them. I was deluged by error notifications, I had to repoint it to an error form to save my inbox. Expect to be surprised.

BTW, I now have it on all sites and server performance does seem to be slightly improved.

bird

6:58 pm on Jan 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteRule !^http://[^/.]\.your-site.com.* - [F]

  • ! If the requested URL is NOT of the following form:

    1. ^ directly at the beginning of the string
    2. http:// this string literally
    3. [^/.] one character that is not a slash or a dot (probably meant to read [^/.]+ for "one or more of those")
    4. \. a literal dot (escaped)
    5. your-site.com this string literally (almost, as the unescaped dot will match any arbitrary character)
    6. .* any trailing characters (or none)

  • - dont't rewrite the URL
  • [F] return a "403 forbidden" to the client

This means that the rule would theoretically be applied to all requests that ask your server for a page from from a different domain than "your-site.com", given that they show the www.iaea.org referrer. In other words, the pattern probably doesn't do what its author had in mind.

Reality, however, is slightly different. ;) The string passed to the RewriteRule only contains the path component of the URL without the hostname. This is the reason why the technically pointless pattern still gives the desired result and simply denies any request where the RewriteCond matches. The rule will by definition never see a string that starts with "http://", but only strings that start with a "/".

If in doubt, I'd simply lump the RewriteCond for iaea together with the others in the upper list and get rid of the second RewriteRule. The "^.*" of the first RewriteRule acheives the same result in a much simpler was, by saying "apply this rule to URLs that contain any sequence of characters, or none".

Crazy_Fool

12:00 am on Jan 12, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



toolman
i found a few new UAs in my logs for the last couple of months. don't know much about them but you might like to keep an eye on them in case they are pests. i've posted the list in the spider identification forum at [webmasterworld.com...]

DrOliver

11:00 am on Mar 5, 2002 (gmt 0)

10+ Year Member



Hi all, this is my first post, and it is a question...

I still don't get it. Do I have to replace "your-site.com" and/or "http://www.iaea.org" with my actual URL or do I leave this as it is?

This is a snippet of the code:
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

I hope I will be able to deliver some solutions to other topics in return soon, as I am mostly a designer and quite good in X/HTML and CSS, rather than in programming and server technologies.

So I'd be happy if anyone could blow away the fog

Brett_Tabke

9:21 am on Mar 6, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Leave it as is. That iaea referrer is part of some abusive bot that we've all banned. It uses iaea as a referrer. You will find it coming in from all kinds of ip's in south east asia - easiest to ban the referrer.
This 243 message thread spans 25 pages: 243