homepage Welcome to WebmasterWorld Guest from 54.205.59.78
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Logic Error
Wildcard userAgent last entry in robots.txt file?
pageoneresults




msg:1525539
 1:32 am on May 12, 2004 (gmt 0)

You know, I've been writing robots.txt files in Notepad for years. Today I receive an email from Italy stating that there is an error in my robots.txt file. I was also referred to another online validator to verify the error.

This one stumped me as I cannot find any reference to this in the official documentation for the robots.txt file.

I was informed that the wildcard entry for the userAgent should be the last entries in my robots.txt file. Is there truth to this? If so, would it be an actual error or just a warning?

The statement was made that this could cause confusion for some robots. What say ye?

 

jdMorgan




msg:1525540
 2:34 pm on May 20, 2004 (gmt 0)

pageone,

The typical behaviour of a robot is to examine your robots.txt file, looking for either a match on its own user-agent name, or the wildcard "*" whichever comes first. So no, it won't cause the robots to get confused, they will simply accept the wildcard record if they find that first, and ignore your subsequent robot-specific record.

As I said, this is "typical" behaviour. Some robots may try to be more user-friendly; they might scan all of the User-agent lines in the file, accepting a specific match if they find one, and using the wildacard record only if they don't find a specific record, but there is no guarantee that any 'bot will do this -- or continue to do this.

So, I would recommend putting your wildcard record last, as the default catch-all for robots for which you have no specific instructions. (And tell that guy, "Grazie" too, for letting you know.) :)

A sure-fire way to tell which robots will and won't scan all records is to make the wildcard record deny all resources, put it first, and then follow that with robot-specific records. This would be an interesting survey if you have the time and an idle domain to try it on.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved