Forum Moderators: goodroi
User-agent: googlebot
User-agent: slurp
User-agent: msnbot
User-agent: teoma
User-agent: W3C-checklink
User-agent: WDG_SiteValidator
Disallow: /js/
Disallow: /_includes/
User-agent: *
Disallow: /
our website search tool is Verity. spider is called "vspider".
realized, i was blocking it. ok, so thought i could add:
User-agent: vspider
and be all good....
nope, its still being blocked.
anyone had an instance like this before?
if i cant allow this spider, and have to remove the lines:
User-agent: *
Disallow: /
am i going to have to manually add every other possible spider or bot?
Try adding a duplicate record of the one you have, above the one you have, but listing only the vspider user-agent in it.
When creating a multiple-user-agent policy-record, you should carefully test that each robot recognizes it and behaves accordingly. If you cannot test, then go to each robots' Webmaster Help page, and see if they indicate that they can handle it. If not, then defining separate policy records is indicated.
Another approach is to use mod_rewrite or ISAPI Rewrite to rewrite (not redirect!) robot requests for robots.txt to one of two robots.txt files; one that allows access to all spiders, and the other that denies access to all spiders. You could also rewrite the robots.txt files to a script to dynamically generate the proper robots.txt directives for each spider.
When using either dynamic robots.txt delivery approach, be careful what you do with unrecognized spiders -- whether you allow or deny them. Allowing them means you have to maintain the script for new unwelcome spiders, while denying them risks the chance that some major 'bot might change its user-agent string and be unrecognized and denied.
Jim
i have tried various and multiple records to work, but apparently my web admin says that there is no way to use:
User-agent: *
Disallow: /
in the robots.txt at all. even if you allow, or add a separate record...
ill just find a lengthy list of bots and spiders and add them manually..
Another approach is to use mod_rewrite or ISAPI Rewrite to rewrite (not redirect!) robot requests for robots.txt to one of two robots.txt files; one that allows access to all spiders, and the other that denies access to all spiders. You could also rewrite the robots.txt files to a script to dynamically generate the proper robots.txt directives for each spider.
Or you could even use a PHP script to generate your robots.txt "file."
In the meantime, file a complaint with Verity... Obviously, their robot does not conform to the Standard.
Jim