Welcome to WebmasterWorld Guest from 54.161.106.81

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

     
3:30 am on Oct 23, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

TheLynxEffect

3:51 am on Oct 23, 2001 (gmt 0)

Inactive Member
Account Expired

 
 


Nice! Thanks for sharing that really cool info toolman. I can't spot any other bots at the moment.

Sticky

7:24 pm on Oct 23, 2001 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38070
votes: 16


Very nice TM. How much speed difference can you notice on each page view?
8:14 pm on Oct 23, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


>>>How much speed difference can you notice on each page view.

Couldn't say I notice any at all. The part above this though could determine that...if I run everything through the php parser I expect a hit. Usually I run AddHandlers for for ssi's and have never noticed a slow down.

BTW I pieced this together from snippets others posted here on the board.

8:26 pm on Oct 23, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 6, 2000
posts:904
votes: 0


Another one might be

RewriteCond %{HTTP_USER_AGENT} .*almaden.* [OR]
8:51 pm on Oct 23, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 4, 2001
posts:997
votes: 0


I use .htaccess to remap third level domains to various directories based on HTTP_HOST. What happens it two rewritecond's apply to two separate rewrite rules (ie: I place some of these blocking lines above my third level domain remaps in my .htaccess file)?
11:21 am on Oct 24, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 8, 2005
posts:833
votes: 0


Hi Toolman nice compilation of nasty bots! Have you tried sticking the re-writer in httpd.conf? It would run fastest there, although you noted that there was no noticeable speed difference as it is.

Thanks again for sharing it with us!

2:37 pm on Oct 24, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


I found another UA for InternetSeer

RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]

Not sure what the difference is but this one is the one that comes by every fifteen minutes as my competition tries to fool me into thinking I have more traffic than I do. Now it's easily filtered as a 403.

Long live mod_rewrite :)

3:24 pm on Oct 24, 2001 (gmt 0)

Preferred Member

10+ Year Member

joined:July 6, 2001
posts:410
votes: 0


toolman- have you been looking over my shoulder at 2 am? I thought *I* had some kind of unhealthy fixation with .htaccess. Guess not. And it may even be healthy, after all.

I've been going back and forth from a kind of banbot.cgi that reads a banned.txt file, to just drawing a line in the sand and doing the full-on mod_rewrite at the top level to initiate a trickle down effect on the sub domains I host.

What I've been toying with is a combination of my banned.txt file automatically updating my .htaccess file - using grep to insert/add/delete lines depending on what is in banned.txt. It's pretty easy to update my banned.txt file either by hand or with a little interface program I wrote - but I'm 'grappling with grep' to insert my lines in the correct place in the .htaccess file. I'm in the dark with grep. Grep vexes me. Grep makes my stomach hurt.

Has anyone else considered this, or is it too much work? I thought it would give me some flexibility, and kill two birds with one stone. In fact, at 2 am I think it's a brilliant idea. Then again, I don't get out much.

6:04 am on Oct 30, 2001 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 24, 2001
posts:117
votes: 0


toolman, mind translating that for those of us are mod_rewrite impaired ?
10:09 pm on Jan 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member mivox is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 6, 2000
posts:3928
votes: 0


Dredging this thread out of the depths of time.

Could someone please translate this line:

RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Just wondering exactly what's happening there....

12:46 am on Jan 9, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:July 6, 2001
posts:410
votes: 0


RewriteRule !^http://[^/.]\.your-site.com.* - [F]

is shorthand for "Get the hell out and don't come back 'cuz you aren't viewing a darned thing from this (my domain) today and as far as I'm concerned you get the big 'F' meaning - I (my domain) does not exist to you."

At least, that's my understanding. Apache has all that neat stuff posted. I forget most if it - always have to refer back.

"I'm not a smart man, Jenny" - Forrest Gump aka idiotgirl
<added>not a sig - just how I feel today,</added>

1:43 am on Jan 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


mivox thats blocking that screen scraper from iaea.org
5:57 pm on Jan 9, 2002 (gmt 0)

Full Member

10+ Year Member

joined:Jan 31, 2001
posts:282
votes: 0


Thank you Toolman for the list.

I have added these to my htaccess which I have never really fooled around with before. Having now added these, can you tell me what I can expect?

Will its effect be a "lack of" data, meaning if these bots are excluded, my (a) logs will be smaller and (b) fewer email harvesters leading to less junk email and (c) less usage on the server. Have I got its' benefits right?

6:07 pm on Jan 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


You can expect a slight performance hit on your server...nothing major.

I really don't worry too much about email harvesters as I don't put email addresses on my site. The ones that iritate me are the site rippers. This is the latest version.

I know it could be shortened so if you're a unix geek please quit snickering and help us on the regex stuff. Thanks for your support ;)

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.yo-do-main.net.* - [F]

This 243 message thread spans 17 pages: 243
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members