Welcome to WebmasterWorld Guest from 54.147.0.174

Forum Moderators: goodroi

Message Too Old, No Replies

vspider not adhereing to robots.txt

     

tonynoriega

3:39 pm on Feb 25, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



after i stole ..uhmm... borrowed a template of a robots.txt file from pageoneresults, i came across a new problem.

User-agent: googlebot
User-agent: slurp
User-agent: msnbot
User-agent: teoma
User-agent: W3C-checklink
User-agent: WDG_SiteValidator
Disallow: /js/
Disallow: /_includes/

User-agent: *
Disallow: /

our website search tool is Verity. spider is called "vspider".

realized, i was blocking it. ok, so thought i could add:

User-agent: vspider

and be all good....

nope, its still being blocked.

anyone had an instance like this before?

if i cant allow this spider, and have to remove the lines:

User-agent: *
Disallow: /

am i going to have to manually add every other possible spider or bot?

jdMorgan

3:59 pm on Feb 25, 2009 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The problem, which I warned about in Pageoneresults' thread, is likely that vspider does not handle the multiple-user-agent policy-record format, as defined by the original Standard for Robot Exclusion. Although it was part of the original document, many spiders don't handle it correctly -- Cuil's twiceler robot being a recent example.

Try adding a duplicate record of the one you have, above the one you have, but listing only the vspider user-agent in it.

When creating a multiple-user-agent policy-record, you should carefully test that each robot recognizes it and behaves accordingly. If you cannot test, then go to each robots' Webmaster Help page, and see if they indicate that they can handle it. If not, then defining separate policy records is indicated.

Another approach is to use mod_rewrite or ISAPI Rewrite to rewrite (not redirect!) robot requests for robots.txt to one of two robots.txt files; one that allows access to all spiders, and the other that denies access to all spiders. You could also rewrite the robots.txt files to a script to dynamically generate the proper robots.txt directives for each spider.

When using either dynamic robots.txt delivery approach, be careful what you do with unrecognized spiders -- whether you allow or deny them. Allowing them means you have to maintain the script for new unwelcome spiders, while denying them risks the chance that some major 'bot might change its user-agent string and be unrecognized and denied.

Jim

tonynoriega

4:06 pm on Feb 25, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I will try adding the dup record. As follows right?

User-agent: vspider
Disallow:

User-agent: googlebot
...etc...etc...

jdMorgan

4:28 pm on Feb 25, 2009 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



User-agent: vspider
Disallow: /js/
Disallow: /_includes/

User-agent: googlebot
...etc...etc...

Jim

tonynoriega

6:57 pm on Feb 25, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



that still kills the verity search engine...

i have tried various and multiple records to work, but apparently my web admin says that there is no way to use:

User-agent: *
Disallow: /

in the robots.txt at all. even if you allow, or add a separate record...

ill just find a lengthy list of bots and spiders and add them manually..

jdMorgan

11:07 pm on Feb 26, 2009 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Another approach is to use mod_rewrite or ISAPI Rewrite to rewrite (not redirect!) robot requests for robots.txt to one of two robots.txt files; one that allows access to all spiders, and the other that denies access to all spiders. You could also rewrite the robots.txt files to a script to dynamically generate the proper robots.txt directives for each spider.

Or you could even use a PHP script to generate your robots.txt "file."

In the meantime, file a complaint with Verity... Obviously, their robot does not conform to the Standard.

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month