Forum Moderators: open
obot
Almaden.ibm.com/cs/crawler (IBM looking my site?)
Enterprise Search (?)
MimeLive Client (?)
[MSIE 6.0] ( Why so many spiders claiming to be Microsoft?)
MSNIA
MSIECrawler
NutchOrg (seems to be a SE open source.., friendly?)
Openfind data gatherer (asian bot but friendly too?)
Pompos (pompos@iliad.fr) (from france but nice or ban?)
libwww-perl (?)
I'm thinking in putting that to .httaccess
RewriteCond %{HTTP_USER_AGENT} ^abot [NC,OR]
and so on...
Comments please?
Here for some reading:
abot: 241 [webmasterworld.com], 1297 [webmasterworld.com], 1450 [webmasterworld.com], 1616 [webmasterworld.com], 1762 [webmasterworld.com]
Almaden.ibm.com/cs/crawler: 128 [webmasterworld.com], 665 [webmasterworld.com], 759 [webmasterworld.com], 777 [webmasterworld.com], 800 [webmasterworld.com], 814 [webmasterworld.com], 904 [webmasterworld.com], 1442 [webmasterworld.com], 1536 [webmasterworld.com], 1705 [webmasterworld.com], 1992 [webmasterworld.com], 2197 [webmasterworld.com], 2276 [webmasterworld.com] (and those are just the Forum 11-posts)
MimeLive Client: 2137 [webmasterworld.com], 2147 [webmasterworld.com]
MSIE 6.0: 2354 [webmasterworld.com]
MSIECrawler: 749 [webmasterworld.com], 1010 [webmasterworld.com], 1398 [webmasterworld.com], 2270 [webmasterworld.com]
NutchOrg:1667 [webmasterworld.com], 2200 [webmasterworld.com], 4987 [webmasterworld.com]
Openfind data gatherer: 30 [webmasterworld.com], 877 [webmasterworld.com], 1054 [webmasterworld.com]
Pompos: 1448 [webmasterworld.com], 1604 [webmasterworld.com]
libwww-perl: 1885 [webmasterworld.com], 2160 [webmasterworld.com]
And my personal conclusion:
abot: I banned it, like Thomson&Thomson and Cyveillance. Services like that can't be serious if they obey robots.txt, and I don't allow spiders that do that.
Almaden.ibm.com/cs/crawler: I don't see a reason why to ban a well behaved bot from Big Blues research center
MimeLive Client: Belongs to Exalead(?), wouldn't know
MSIE 6.0: Banned it, has nothing to do with Internet Explorer but is some Rumanian Directory that doesn't obey robots.txt
MSIECrawler: Internets Explorer offline-browsing crawler. Obeys robots.txt and therefore not banned.
NutchOrg: Caught it not obeying robots.txt more than once and banned it subsequently.
Openfind data gatherer: I havent's the UA banned but pretty much all of 211. and 205. IP-ranges that are registered in APNIC. There's just too much e-crap coming from that part of the world
Pompos: Wouldn't know
libwww-perl: Banned it except a few exceptions, as "Microsoft URL Control" and "Indy Library", all one or another part of some dev-tool where pretty much anyone can "create" a bot (that of course ignores robots.txt)
MSNIA: MSN Internet Access perhaps? This one may be human.