Forum Moderators: open
Hopefully this list will help people find information about known spiders quickly and gain some insights into the spider's purpose, as well as being a useful quick resource guide in the future.
Let me kick off this list with a few entries of my own:
Google
Yahoo
Bing
Ask
Feel free to contribute IP ranges for spiders that don't use full trip DNS validation.
However, IPs should not be submitted when discussing distributed crawlers that are run from volunteer computers.
Perhaps this list needs updating?
I was hoping to create a resource like we did with the default programming library thread.
If there's no interest, I can just kill the thread and forget about it.
I'm sure others wouldn't mind helping to find all the threads initially if you can dump out a list of bots.
This could easily be the best bot index on the web.
Ideally we would want to know the bot name, site, crawler page and thread, and hopefully the thread will have the last 3 if possible.
AdsBot-Google+(+http://www.google.com/adsbot.html)
Gigabot/3.0+(G75)
Gigabot/3.0+(http://www.gigablast.com/spider.html)
Googlebot-Image/1.0
Java/1.6.0_10
Mozilla/4.0+(compatible;+BOTW+Spider;++http://botw.org)
Mozilla/5.0+(Twiceler-0.9+http://www.cuil.com/twiceler/robot.html)
Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+fr;+rv:1.8.1)+VoilaBot+BETA+1.2+(support.voilabot@orange-ftgroup.com)
Mozilla/5.0+(compatible;+Ask+Jeeves/Teoma;++http://about.ask.com/en/docs/about/webmasters.shtml)
Mozilla/5.0+(compatible;+Charlotte/1.1;+http://www.searchme.com/support/)
Mozilla/5.0+(compatible;+DBLBot/1.0;++http://www.dontbuylists.com/)
Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)
Mozilla/5.0+(compatible;+ScoutJet;++http://www.scoutjet.com/)
Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp)
Sosospider+(+http://help.soso.com/webspider.htm)
SurveyBot/2.3+(Whois+Source)
Yandex/1.01.001+(compatible;+Win16;+I)
ia_archiver+(+http://www.alexa.com/site/help/webmasters;+crawler@alexa.com)
msnbot-media/1.0+(+http://search.msn.com/msnbot.htm)
msnbot/1.1+(+http://search.msn.com/msnbot.htm)
i could do it again from a newer sample but i'm guessing GaryK is way ahead of me on something like this...