Forum Moderators: goodroi
This one stumped me as I cannot find any reference to this in the official documentation for the robots.txt file.
I was informed that the wildcard entry for the userAgent should be the last entries in my robots.txt file. Is there truth to this? If so, would it be an actual error or just a warning?
The statement was made that this could cause confusion for some robots. What say ye?
The typical behaviour of a robot is to examine your robots.txt file, looking for either a match on its own user-agent name, or the wildcard "*" whichever comes first. So no, it won't cause the robots to get confused, they will simply accept the wildcard record if they find that first, and ignore your subsequent robot-specific record.
As I said, this is "typical" behaviour. Some robots may try to be more user-friendly; they might scan all of the User-agent lines in the file, accepting a specific match if they find one, and using the wildacard record only if they don't find a specific record, but there is no guarantee that any 'bot will do this -- or continue to do this.
So, I would recommend putting your wildcard record last, as the default catch-all for robots for which you have no specific instructions. (And tell that guy, "Grazie" too, for letting you know.) :)
A sure-fire way to tell which robots will and won't scan all records is to make the wildcard record deny all resources, put it first, and then follow that with robot-specific records. This would be an interesting survey if you have the time and an idle domain to try it on.
Jim