Welcome to WebmasterWorld Guest from 54.160.131.144

Forum Moderators: goodroi

Message Too Old, No Replies

Logic Error

Wildcard userAgent last entry in robots.txt file?

     

pageoneresults

1:32 am on May 12, 2004 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



You know, I've been writing robots.txt files in Notepad for years. Today I receive an email from Italy stating that there is an error in my robots.txt file. I was also referred to another online validator to verify the error.

This one stumped me as I cannot find any reference to this in the official documentation for the robots.txt file.

I was informed that the wildcard entry for the userAgent should be the last entries in my robots.txt file. Is there truth to this? If so, would it be an actual error or just a warning?

The statement was made that this could cause confusion for some robots. What say ye?

jdMorgan

2:34 pm on May 20, 2004 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



pageone,

The typical behaviour of a robot is to examine your robots.txt file, looking for either a match on its own user-agent name, or the wildcard "*" whichever comes first. So no, it won't cause the robots to get confused, they will simply accept the wildcard record if they find that first, and ignore your subsequent robot-specific record.

As I said, this is "typical" behaviour. Some robots may try to be more user-friendly; they might scan all of the User-agent lines in the file, accepting a specific match if they find one, and using the wildacard record only if they don't find a specific record, but there is no guarantee that any 'bot will do this -- or continue to do this.

So, I would recommend putting your wildcard record last, as the default catch-all for robots for which you have no specific instructions. (And tell that guy, "Grazie" too, for letting you know.) :)

A sure-fire way to tell which robots will and won't scan all records is to make the wildcard record deny all resources, put it first, and then follow that with robot-specific records. This would be an interesting survey if you have the time and an idle domain to try it on.

Jim

 

Featured Threads

Hot Threads This Week

Hot Threads This Month