incrediBILL - 11:46 pm on Aug 7, 2013 (gmt 0)
1. How can you block all of the bots? None of them honor robots.txt.
2. It's impossible to block all IP addresses from scraping.
3. Not all scrapers use bots.
YES you can stop almost all of the bots and the following 5 steps will get rid of nearly all of the scraping. I've been doing it for years and although a little still slips thru the cracks, I can deal with one or two problems versus hundreds or thousands of incidents.
It's simple. 5 Steps and most scraping is all gone.
1. White list robots.txt to tell all the good ones OK, the ones you don't want that honor robots.txt all go away.
2. White list .htaccess with the same bot names allowed in robots.txt and include browser UAs like Firefox, MSIE, Opera, etc.
3. Install a bot blocker script to catch everything that slips thru the cracks.
4. Block all data centers
5. Put NOARCHIVE in all pages to stop scraping from cache