Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

Logic Error

Wildcard userAgent last entry in robots.txt file?

1:32 am on May 12, 2004 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
votes: 61

You know, I've been writing robots.txt files in Notepad for years. Today I receive an email from Italy stating that there is an error in my robots.txt file. I was also referred to another online validator to verify the error.

This one stumped me as I cannot find any reference to this in the official documentation for the robots.txt file.

I was informed that the wildcard entry for the userAgent should be the last entries in my robots.txt file. Is there truth to this? If so, would it be an actual error or just a warning?

The statement was made that this could cause confusion for some robots. What say ye?

2:34 pm on May 20, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
votes: 0


The typical behaviour of a robot is to examine your robots.txt file, looking for either a match on its own user-agent name, or the wildcard "*" whichever comes first. So no, it won't cause the robots to get confused, they will simply accept the wildcard record if they find that first, and ignore your subsequent robot-specific record.

As I said, this is "typical" behaviour. Some robots may try to be more user-friendly; they might scan all of the User-agent lines in the file, accepting a specific match if they find one, and using the wildacard record only if they don't find a specific record, but there is no guarantee that any 'bot will do this -- or continue to do this.

So, I would recommend putting your wildcard record last, as the default catch-all for robots for which you have no specific instructions. (And tell that guy, "Grazie" too, for letting you know.) :)

A sure-fire way to tell which robots will and won't scan all records is to make the wildcard record deny all resources, put it first, and then follow that with robot-specific records. This would be an interesting survey if you have the time and an idle domain to try it on.



Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members