Welcome to WebmasterWorld Guest from 54.145.117.60

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

Banning bots

     
8:05 am on Jan 19, 2011 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 26, 2004
posts: 405
votes: 0


Some excellent info here. This list below seems to be the latest - any others to add? Especially for email grabbers and scrapers please?
<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
8:32 am on Jan 19, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 16, 2007
posts:846
votes: 0


Not so sure that this topic fits in the Perl forum...

You should take a look in the Search Engine Spider and User Agent Identification [webmasterworld.com] forum. One of the approaches that is mentioned there is whitelisting, which boils down to banning anything that does not pass the smell test.

In addition to bannning certain 'suspicious' user-agent strings, you might want to take a look at header information that is supplied, and also the ip range that a supposed visitor is accessing your site from. YMMV ;)
10:28 pm on Jan 19, 2011 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 26, 2004
posts: 405
votes: 0


Sorry, yep, I'd better post this in another forum. Do you use your .htaccess file for any of this?
10:39 pm on Jan 19, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:May 31, 2008
posts:661
votes: 0


The problem with those lists is, they only block those bots that are bad (ia_archiver is not if that's archive.org's bot) and stupid, but it doesn't block those that really want the information. Blocking harvesters is pretty much the same as blocking spam bots, you might want to look at "bad behaviour", it's a client fingerprinting-based solution that tries to identify bots that pose as regular browsers and denys them access.
7:24 am on Jan 20, 2011 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11080
votes: 106


Whitelisting just the good bots - Search Engine Spider and User Agent Identification forum:
http://www.webmasterworld.com/search_engine_spiders/4255036.htm [webmasterworld.com]
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members