Forum Moderators: goodroi
"If your robot's name is not listed in the list below, then you cannot crawl my site?"
Like an invite-only party where the bouncer at the door kicks your *ss out if you don't have an invitation.
An allow list construct in robots.txt would look like this:
User-agent: Googlebot
User-agent: Slurp
Disallow: /cgi-bin
Disallow: /devel
User-agent: *
Disallow: /
This allows Googlebot and Slurp while keeping them out of /cgi-bin and /devel, but disallows all other robots completely - *if* they obey it.
I also should note that not all robots can handle the multiple user-agent records as shown above, even though it is in the standard. Those too can be handled by mod_rewrie or ISAPI filters redirecting them to a simpler version of robots.txt
Jim
Ive been using this method for quite a while and have only had two UA's fail to obey it. I emailed the first one and they acknolwedged their mistake and fixed it immediately. The other argued that the syntax was incorrect and it took quite a few emails before they saw my point of view :D
kinda new at this robots.txt but my deadline doesnt have to know that...(!)
which web robots should i disallow and why....(?)
thanx