Forum Moderators: goodroi
Sample:
User-agent: robot1
Disallow:
User-agent: robot2
Disallow:
.
.
.
User-agent: robotn
Disallow:
User-agent: *
Disallow: /
2.) That said, you can also use robots.txt to control aspects of the major SE bots' behaviors. You'll find out the specifics when you visit the majors' sites, use search engines, and read the The Web Robots Pages [robotstxt.org].
3.) Also, a lot of the major SEs have more than one bot. Here are just a few to give you an idea. Note that the list is not robots.txt-ready, rather it's to help you ID some of the majors you may see:
GOOGLE-related...
User-agent: Googlebot
User-agent: Mediapartners-Google*
User-agent: Googlebot-Image
MICROSOFT-related...
User-agent: msnbot
User-agent: SandCrawler - Compatibility Testing
YAHOO!-related...
User-agent: Slurp
User-agent: Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
(The preceding are ALL case-sensitive.)
4.) Last but not least, regularly eyeball your own server logs and check IPs for more info about what/who is visiting you.
There are literally hundreds, if not thousands, of bots nowadays, not to mention bots spoofing browsers, so if you're really into a list, you'll probably have to roll your own. Good luck!
Use Search Engine World's Robots.txt Validator [searchengineworld.com] to make sure your robots.txt is A-OK. And see also WW sister site's excellent Robots.txt Tutorial [searchengineworld.com].