Page is a not externally linkable
blend27 - 5:40 pm on Dec 14, 2012 (gmt 0)
I don't think there is a complete list of those, but what I started doing on weekly bases several years back was to collect IP Addresses of the sites that the info is hosted on and add the IP ranges to a block list.
When the bots find themselves ineffective because of the number of entities blocking them, they move.
Reputable ones seems to stick with reputable hosting companies. Scrapers seem to move around but easier to catch based on headers. Catch an IP scraping, block the range and all the ranges that belong to the hosting company. It is a never ending task till one starts understanding how things work on the darker side.
Get a few forums going out there, spread the word that comment spamming is all good to go, setup some traps, collect all the data one can, learn, learn, learn, start blocking IP Ranges on your main sites.
Visit STOPFORUMSPAM, ProjectHoneyPot, .... get into API, lovely stuff...
Lot of work, but PAYS to discover :)