-- Search Engine Spider and User Agent Identification
---- Quick primer on identifying bot activity.
wilderness - 1:58 am on Apr 1, 2008 (gmt 0)
Now imagine how easy that is to accomplish when you get your hands on a list of 6K open proxies operating from random locations around the world which means I could probably scrape 100K pages from any web site without getting caught unless they block this proxy list.
OK, now imagine this proxy list isn't public and it's run and used by a private consortium of customers that need to operate without being detected...
Imagine how the schemed and collective group above (as well as many of those within "the www" that banter about the theme of "free access" or "public domian") percieve a collective group of webmasters at SSID discussing the limited access of possible infractions? ;)
Saw where some universities where taking part in public-domain archive of their libraries through Archive.org (I believe), rather than Goggle because google presented too many possibilites for future access restrictions and future paid access (subscription) possibilities.