Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- Stopping scrapers from the get-go


frontpage - 12:59 am on Mar 1, 2011 (gmt 0)


We use Mod-Security combined with Honey Traps. Works pretty well.

Common user agents used by scrappers:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322)

Mod Security 2.x rules

SecRule HTTP_User-Agent "Indy Library" "deny,log,status:403"

SecRule HTTP_User-Agent "Nutch" "deny,log,status:403"

And these geniuses set their crawler to use a malformed user-agent called.... 'user-agent'.

SecRule HTTP_User-Agent "User-Agent" "deny,log,status:403"


Also, we block known spam/hacker server farms at Leaseweb, Singlehop, Limestone Networks, Calpop, Softlayer/ThePlanet, etc.


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4267704.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com