I have had top positions for years in Google for many keywords on my site...until the last week in June, where my site disappeared in the SERPS. All whitehat stuff on site; I don't know what would have caused such a drastic change as I've done little changes to the site other than add/modify products and content for new products. I created a Google webmaster tools account a few days ago. In the unreachable URLs section I see most of my urls listed including my home page, with the detail stating "robots.txt file unreachable".
I used the analyze robots.txt tool -robots.txt file is error free; Although it was last downloaded on June 22. I don't understand why Google would be having a problem downloading it now - hasn't changed in years. Google's definition for unreachable is somewhat vague and and I'm not sure where to go from here.
I've looked at the logs and everything seems to look ok with a 200 response for each googlebot request. But on the 23rd of June and up to present, Googlebot requests the robots.txt file and then leaves and doesn't download any files.
I've contacted my host to check to see if they're doing any IP Blocking and they say they're not, but will look into it further.
Msg#: 3698390 posted 3:09 am on Jul 15, 2008 (gmt 0)
WMT seems to lose things. I don't often go there and was surprised to see some of my sites were unverified. I thought it was because G has renamed the verification file to uppercase while I had the old lowercase filename. Renamed the file and WMT is happy.
My site is still verified. Googlebot must not be reading the robots.txt file because there are urls listed that have the path through the cgi-bin folder. My ecommerce software uses JS to start it and injects a adding pathway through the cgi-bin - for example. If a user would go to a product page /123.html and the ecommerce software wasn't already in the path, it would refresh the page and add /cgi-local/softcart.exe/123.html?E+scstore.
What I find in the unreachable urls section are 2 pathways for the same page :/cgi-local/softcart.exe/123.html?E+scstore and /123.html I thought Googlebot doesn't execute JS. In years past, I have added Disallow: /cgi-local/ to my robots.txt file which solved that issue.
I have other ecommerce sites running with the same ecommerce software without a hitch. Is Googlebot ignoring robots.txt and then considering this duplicate content? Should I remove the /cgi-local/softcart.exe urls that are listed in WMT? How can I further test whether googlebot is really having a problem reading my robots.txt file or its some other problem? thanks.
That's a good idea. I'll try that. But that still doesn't explain why googlebot's ignoring the robots.txt file. If I'm violating some Google guideline I'm unaware of would WMT still give me the same nebulous response "robots.txt file unreachable"?