Forum Moderators: goodroi
User-agent: abbabot
Disallow:
User-agent: *
Disallow: /
and then control what `abbabot` can and cannot index via meta tags, which would be these primarily:
<META name="ROBOTS" content="all">
<META name="ROBOTS" content="none">
There are reasons I don't want to use robots.txt, the site in question adds new sections/channels on a daily basis, many of which we don't want spidered, some we do, whilst the URL structure is such that even with using say the Google extensions to robots.txt we'd end up with a very, very large robots.txt file which would be unmanageable on a daily basis. With this we can use the CMS (Content Management System) to control the state of meta tags when the pages are created. Well it sounds like a plan anyway...
I see that at some point the W3C discussed putting user agents in the meta tag standard but didn't...
The only way I have found to tell Gbot and AJ, "Please don't mention this URL at all" is to not disallow the page in robots.txt, but rather disallow it only using the on-page robots meta tag. It's the only way I've found to make "semi-private pages" stay that way.
Jim