Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: goodroi
joined:June 27, 2000
"If your robot's name is not listed in the list below, then you cannot crawl my site?"
Like an invite-only party where the bouncer at the door kicks your *ss out if you don't have an invitation.
An allow list construct in robots.txt would look like this:
This allows Googlebot and Slurp while keeping them out of /cgi-bin and /devel, but disallows all other robots completely - *if* they obey it.
I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard. Those too can be handled by mod_rewrie or ISAPI filters redirecting them to a simpler version of robots.txt
Ive been using this method for quite a while and have only had two UA's fail to obey it. I emailed the first one and they acknolwedged their mistake and fixed it immediately. The other argued that the syntax was incorrect and it took quite a few emails before they saw my point of view :D