Forum Moderators: goodroi
I'm sure questions like that have been covered, but I could not exactly find an answer to this dispute I got with peerbot.
Here's the situation:
Let's say, I have a file
www.somedomain.tld/bots/file.htm in the root folder of that domain, there's a robots.txt with these lines:
User-agent: *
Disallow: /bots/ So, on Sat, 04 Sep 2004 10:36:32 GMT+0100, peerbot visited a the file
/bots/file.htm. I sent an email message to peerbot saing that, in my opinion, their bot wasn't supposed to visit /bots/file.htm. [...]first off, the protocol is working correctly for all of our services, the problem is that you missunderstood the protocol.> User-agent: *
> Disallow: /bots/means that all bots are disallowed to index the directory /bots/ which peerbot does NOT do. To learn more about the robots exclusion protocol check the page www.robotstxt.org.[...]
Everything I know about the robots exclusion protocol is from robotstxt.org. From their documentation and from other posts here, I take it that the bot is not allowed to retrieve any document from the
/bots/-folder. Who is correct?
Peerbot's response is incorrect, as you have disallowed any resource whose local URL-path begins with /bots/
Their alternative view would require you to Disallow each file individually, which is ridiculous.
As a further example, let's take this:
User-agent: *
Disallow: /
I think you were corresponding with someone who didn't know what they were talking about, or whose job it is to simply deflect any reports of problems with their 'bot. Obviously, they didn't read the Standard they directed you to, or they would have found the quote I cited above.
A 403-Forbidden response to their user-agent coded into your .htaccess file is an alternative approach they won't be able to ignore...
Jim