Forum Moderators: Robert Charlton & goodroi
About a week later, we noticed one of our pages (named widgets-blue.php) wasn't cached by Google. It was still showing up in the results, but no cache was available.
The next day, in Google Webmaster Tools there was an error in our XML sitemap. The error stated that widgets-blue.php couldn't be crawled because is was restricted by the robots.txt.
After a little panicking, we discovered all of our pages that begin with a "w" were dropped from the cache. The only suspect, robots.txt, had become the main suspect. We added a closing "/" to the file, and a day later, the errors are gone.
I mention this story because, in my decade+ of doing this, I've always thought that pages listed in the robots.txt had to be exact/absolute; wildcard entries were always marked with an asterisk (*). This is wrong. The asterisk is only allowed for User-agent.
The moral is be watchful and careful. This is a simple technology I thought I had mastered years ago. I don't know if I confused it with another language or I learned it wrong to begin with.
It doesn't help that robots.txt is so rarely altered. Sometime next year I'll have forgotten this episode and will have to re-research whether I should add a closing slash.
The wildcard like /*pattern or /pattern1*pattern2, is to replace characters on the left so that the match is for any beginning (or a restricted beginning) but which must be followed by pattern.
The * is never needed on the right.
You can use $ to signify "must end with", though.
.
/123 matches /123 and /123/ and /1234 and /123/456
/123/ matches /123/ and /123/456
/*abc matches /123abc and /123/abc and /123abc456 and /123/abc/456
/123*xyz matches /123qwertyxyz and /123/qwerty/xyz/789
/123$ matches ONLY /123
/*abc$ matches /123abc and /123/abc but NOT /123/abc/x etc.
.
I have been caught out before.
This thread [webmasterworld.com...] contains some good advice direct from several Google staffers.
[edited by: g1smd at 8:30 pm (utc) on July 2, 2008]
I think the GWT robots.txt tool is an excellent offering. Not only does it validate the syntax, it helps you understand if your rules are actually doing what you intended them to do.