dstiles - 10:07 pm on Jan 13, 2012 (gmt 0)
There are 3 primary sources of site scrapers:
1. server farms and clouds - these farms inhabit known IP ranges which can be permanently blocked, perhaps drilling the occasional hole for a known good bot.
2. botnets - these can inhabit server farms (see above) or ADSL (broadband) IP ranges (see below).
3. home/business IPs - basically dynamic/static broadband IPs that are still under control of their owners (ie not compromised by trojans). These usually have faulty "credentials" which can be detected.
Most bots, particularly the high-speed scraper types, CAN be dealt with. It just takes a bit of dedication and time.
If you do not want scrapers you have two choices:
1. learn to run your web site properly, installing blocking software as relevant (linux/unix, for example, has htaccess capability, as have later versions of IIS: learn to use it).
2. buy in expertise from someone who knows how to manage blocking properly - it's cheaper than losing revenue to scrapers.
And, of course, check out WebmasterWorld's own "Search Engine Spider and User Agent Identification" and Apache (htaccess) forums.