Page is a not externally linkable
frontpage - 12:59 am on Mar 1, 2011 (gmt 0)
We use Mod-Security combined with Honey Traps. Works pretty well.
Common user agents used by scrappers:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322)
Mod Security 2.x rules
SecRule HTTP_User-Agent "Indy Library" "deny,log,status:403"
SecRule HTTP_User-Agent "Nutch" "deny,log,status:403"
And these geniuses set their crawler to use a malformed user-agent called.... 'user-agent'.
SecRule HTTP_User-Agent "User-Agent" "deny,log,status:403"
Also, we block known spam/hacker server farms at Leaseweb, Singlehop, Limestone Networks, Calpop, Softlayer/ThePlanet, etc.