Forum Moderators: open
My company operates a small robot, and recently we receieved a complaint from a site owner stating we are ignoring his robots.txt file.
I can state very clearly we do not ignore robots.txt, I'm the one that wrote the java code that processes it. However we are not doing as the person expects, because we believe their robots.txt file is invalid.
The question is, is this a valid robot.txt file to exclude everyone from everything:
User-agent: *
Disallow: *
In strict terms I say no because it should be:
User-agent: *
Disallow: /
Some robots may obey this, I am considering changing ours to obey the * also.
Any opinions?
Thanks,
Paul
The second is correct. There are a few spiders like slysearch that only see "Disallow: *" even though it should be "Disallow: /".
Here is a good resource for robots.txt
[searchengineworld.com...]
I would program the bot for both.
To exclude all robots from the entire server
User-agent: *
Disallow: /
In addition to the link to Brett's robots.txt checker, you might take a look at Web Server Administrator's Guide to the Robots Exclusion Protocol at www.robotstxt.org/wc/exclusion-admin.html