Welcome to WebmasterWorld Guest from 54.198.2.110

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

     
3:30 am on Oct 23, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

10:09 pm on Jan 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member mivox is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 6, 2000
posts:3928
votes: 0


Dredging this thread out of the depths of time.

Could someone please translate this line:

RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Just wondering exactly what's happening there....

12:46 am on Jan 9, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:July 6, 2001
posts:410
votes: 0


RewriteRule !^http://[^/.]\.your-site.com.* - [F]

is shorthand for "Get the hell out and don't come back 'cuz you aren't viewing a darned thing from this (my domain) today and as far as I'm concerned you get the big 'F' meaning - I (my domain) does not exist to you."

At least, that's my understanding. Apache has all that neat stuff posted. I forget most if it - always have to refer back.

"I'm not a smart man, Jenny" - Forrest Gump aka idiotgirl
<added>not a sig - just how I feel today,</added>

1:43 am on Jan 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


mivox thats blocking that screen scraper from iaea.org
5:57 pm on Jan 9, 2002 (gmt 0)

Full Member

10+ Year Member

joined:Jan 31, 2001
posts:283
votes: 0


Thank you Toolman for the list.

I have added these to my htaccess which I have never really fooled around with before. Having now added these, can you tell me what I can expect?

Will its effect be a "lack of" data, meaning if these bots are excluded, my (a) logs will be smaller and (b) fewer email harvesters leading to less junk email and (c) less usage on the server. Have I got its' benefits right?

6:07 pm on Jan 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


You can expect a slight performance hit on your server...nothing major.

I really don't worry too much about email harvesters as I don't put email addresses on my site. The ones that iritate me are the site rippers. This is the latest version.

I know it could be shortened so if you're a unix geek please quit snickering and help us on the regex stuff. Thanks for your support ;)

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.yo-do-main.net.* - [F]

6:22 pm on Jan 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member rcjordan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 22, 2000
posts:9138
votes: 0


>expect

I installed TM's htaccess about 2 months ago, along with a trial run of a script to email me when one of these tripped an error code. Luckily, I decided to run it on a single site rather than 40 of them. I was deluged by error notifications, I had to repoint it to an error form to save my inbox. Expect to be surprised.

BTW, I now have it on all sites and server performance does seem to be slightly improved.

6:58 pm on Jan 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 10, 2001
posts:1551
votes: 10


RewriteRule !^http://[^/.]\.your-site.com.* - [F]

  • ! If the requested URL is NOT of the following form:

    1. ^ directly at the beginning of the string
    2. http:// this string literally
    3. [^/.] one character that is not a slash or a dot (probably meant to read [^/.]+ for "one or more of those")
    4. \. a literal dot (escaped)
    5. your-site.com this string literally (almost, as the unescaped dot will match any arbitrary character)
    6. .* any trailing characters (or none)

  • - dont't rewrite the URL
  • [F] return a "403 forbidden" to the client

This means that the rule would theoretically be applied to all requests that ask your server for a page from from a different domain than "your-site.com", given that they show the www.iaea.org referrer. In other words, the pattern probably doesn't do what its author had in mind.

Reality, however, is slightly different. ;) The string passed to the RewriteRule only contains the path component of the URL without the hostname. This is the reason why the technically pointless pattern still gives the desired result and simply denies any request where the RewriteCond matches. The rule will by definition never see a string that starts with "http://", but only strings that start with a "/".

If in doubt, I'd simply lump the RewriteCond for iaea together with the others in the upper list and get rid of the second RewriteRule. The "^.*" of the first RewriteRule acheives the same result in a much simpler was, by saying "apply this rule to URLs that contain any sequence of characters, or none".

12:00 am on Jan 12, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 26, 2001
posts:1076
votes: 0


toolman
i found a few new UAs in my logs for the last couple of months. don't know much about them but you might like to keep an eye on them in case they are pests. i've posted the list in the spider identification forum at [webmasterworld.com...]
11:00 am on Mar 5, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 5, 2002
posts:142
votes: 0


Hi all, this is my first post, and it is a question...

I still don't get it. Do I have to replace "your-site.com" and/or "http://www.iaea.org" with my actual URL or do I leave this as it is?

This is a snippet of the code:
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

I hope I will be able to deliver some solutions to other topics in return soon, as I am mostly a designer and quite good in X/HTML and CSS, rather than in programming and server technologies.

So I'd be happy if anyone could blow away the fog

9:21 am on Mar 6, 2002 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38079
votes: 28


Leave it as is. That iaea referrer is part of some abusive bot that we've all banned. It uses iaea as a referrer. You will find it coming in from all kinds of ip's in south east asia - easiest to ban the referrer.
This 243 message thread spans 25 pages: 243
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members