robots.txt is an exclusionary protocal in that you have to exactly list who and what you don't want access too - not the other way around. I'm not an htaccess wizard but I'm certain you can do what you are asking with htaccess instead of robots.txt.
The 'cooperation' of the robot with robots.txt is voluntary. For those that do obey, yes, you can construct your robots.txt to list those that you wish to allow, and deny the rest. As oilman says, the rest have to be handled with mod_rewrite on Apache or ISAPI filters on Windows servers.
An allow list construct in robots.txt would look like this:
This allows Googlebot and Slurp while keeping them out of /cgi-bin and /devel, but disallows all other robots completely - *if* they obey it.
I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard. Those too can be handled by mod_rewrie or ISAPI filters redirecting them to a simpler version of robots.txt
>>I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard.
Ive been using this method for quite a while and have only had two UA's fail to obey it. I emailed the first one and they acknolwedged their mistake and fixed it immediately. The other argued that the syntax was incorrect and it took quite a few emails before they saw my point of view :D
I've thought about blocking Baidu b/c they're only in Chinese & my site is only in English. I don't see the point in allowing them to use my bandwidth when their users will probably never visit my site.