Tip: Watch your robots.txt - small change can have big impact - Google Search and SEO forum at WebmasterWorld

We recently cleaned up our site by moving our scripts into a directory simple labeled "w". To prevent Google (or the others) from indexing this directory, we added it to the robots.txt:
User-agent: *
Disallow: /w

About a week later, we noticed one of our pages (named widgets-blue.php) wasn't cached by Google. It was still showing up in the results, but no cache was available.

The next day, in Google Webmaster Tools there was an error in our XML sitemap. The error stated that widgets-blue.php couldn't be crawled because is was restricted by the robots.txt.

After a little panicking, we discovered all of our pages that begin with a "w" were dropped from the cache. The only suspect, robots.txt, had become the main suspect. We added a closing "/" to the file, and a day later, the errors are gone.

I mention this story because, in my decade+ of doing this, I've always thought that pages listed in the robots.txt had to be exact/absolute; wildcard entries were always marked with an asterisk (*). This is wrong. The asterisk is only allowed for User-agent.

The moral is be watchful and careful. This is a simple technology I thought I had mastered years ago. I don't know if I confused it with another language or I learned it wrong to begin with.

It doesn't help that robots.txt is so rarely altered. Sometime next year I'll have forgotten this episode and will have to re-research whether I should add a closing slash.

Tip: Watch your robots.txt - small change can have big impact

doritoz

g1smd

tedster

doritoz

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week