bumpski - 2:34 pm on Jan 14, 2012 (gmt 0)
One should really differentiate between a crawler and a scraper. For example, Google is not a scraper.
What's the difference?
A scraper "crawls" the web and then publishes documents to the web that other crawlers can crawl. This is where the harm comes in. Google crawls the web, BUT, Google does not publish new web pages with stolen content, that other crawlers can crawl and index.
I put a nonsense string in all my titles on one of my sites, something like, "kkljghik". When I search for my unique nonsense string, many, many, scraper pages/sites pop up, BUT, none of them have the domain, www.google.com, because Google does not publish what it crawls! (Well except for groups.google.com where some scraper idiot keeps inventing new group names and published copies of my content! Which it appears Google takes down fairly quickly.)
Anyway the unique string trick "kkljghik", at least makes it easy to find all the scrapers! For a 150 page site, Google returns 58000 plus scraped pages, perhaps 2 to 3 percent "might" be considered legitimate. Virtually all of these results clearly have extracted, and republished, some content from my site.