Someone tried using it from an Italian University - unisi.it from IP 193.205.7.*
Dijkgraaf
12:15 am on Nov 1, 2010 (gmt 0)
Yes, the issues forum for that crawler has "No, the current version does not support robots.txt" I've added a note saying that this would get it banned by a lot of webmasters.
incrediBILL
2:55 am on Nov 1, 2010 (gmt 0)
They shouldn't put it on the web before it honors robots.txt in the first place.
Just goes to show they're bad neighbors already.
Pfui
5:46 am on Nov 9, 2010 (gmt 0)
Earlier today, from the respected Rensselaer Polytechnic Institute (coincidentally where, in 1865, an ancestor by marriage, John Flack Winslow, served as fifth president), came the disrespectfully coded and run "crawler4j" bot:
leo.tw.rpi.edu crawler4j (http://code.google.com/p/crawler4j/) robots.txt? NO
2 hits in 1 second.
(Where else can you get bot bits and family tree trivia in one byte?:)