Why does Google WMT say that robots.txt is unreachable? - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Why does Google WMT say that robots.txt is unreachable?

rshandy

11:11 pm on Jul 14, 2008 (gmt 0)

10+ Year Member

I have had top positions for years in Google for many keywords on my site...until the last week in June, where my site disappeared in the SERPS. All whitehat stuff on site; I don't know what would have caused such a drastic change as I've done little changes to the site other than add/modify products and content for new products. I created a Google webmaster tools account a few days ago. In the unreachable URLs section I see most of my urls listed including my home page, with the detail stating "robots.txt file unreachable".

I used the analyze robots.txt tool -robots.txt file is error free; Although it was last downloaded on June 22. I don't understand why Google would be having a problem downloading it now - hasn't changed in years. Google's definition for unreachable is somewhat vague and and I'm not sure where to go from here.

I've looked at the logs and everything seems to look ok with a 200 response for each googlebot request. But on the 23rd of June and up to present, Googlebot requests the robots.txt file and then leaves and doesn't download any files.

I've contacted my host to check to see if they're doing any IP Blocking and they say they're not, but will look into it further.

Is this a glitch? A penalty?

Thanks.

anallawalla

3:09 am on Jul 15, 2008 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

WMT seems to lose things. I don't often go there and was surprised to see some of my sites were unverified. I thought it was because G has renamed the verification file to uppercase while I had the old lowercase filename. Renamed the file and WMT is happy.

rshandy

12:17 pm on Jul 15, 2008 (gmt 0)

10+ Year Member

My site is still verified. Googlebot must not be reading the robots.txt file because there are urls listed that have the path through the cgi-bin folder. My ecommerce software uses JS to start it and injects a adding pathway through the cgi-bin - for example. If a user would go to a product page /123.html and the ecommerce software wasn't already in the path, it would refresh the page and add /cgi-local/softcart.exe/123.html?E+scstore.

What I find in the unreachable urls section are 2 pathways for the same page :/cgi-local/softcart.exe/123.html?E+scstore and /123.html
I thought Googlebot doesn't execute JS. In years past, I have added Disallow: /cgi-local/ to my robots.txt file which solved that issue.

I have other ecommerce sites running with the same ecommerce software without a hitch.
Is Googlebot ignoring robots.txt and then considering this duplicate content?
Should I remove the /cgi-local/softcart.exe urls that are listed in WMT?
How can I further test whether googlebot is really having a problem reading my robots.txt file or its some other problem?
thanks.

tedster

8:46 pm on Jul 15, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

If you're seeing urls in WMT that are blocked by robots.txt, you might try a url removal request, based on robots.txt and see what results you get from that.

rshandy

9:37 pm on Jul 15, 2008 (gmt 0)

10+ Year Member

That's a good idea. I'll try that. But that still doesn't explain why googlebot's ignoring the robots.txt file. If I'm violating some Google guideline I'm unaware of would WMT still give me the same nebulous response "robots.txt file unreachable"?

tedster

9:54 pm on Jul 15, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Have you tried the robots.txt tool that Google offers within your WMT account? That may give you some clues.

My ecommerce software uses JS to start it and injects a adding pathway through the cgi-bin

There may well be some kind of technical tangle in the javascript area. Although they are working on it, googlebot does not usually work with javascript.

rshandy

11:52 am on Jul 16, 2008 (gmt 0)

10+ Year Member

Yes, I have used the robots.txt tool. It shows my current file is valid with no errors. The javascript on my pages should be clean as they have not been a problem nor edited in a long time.

I guess the next steps is to figure out whether there are unrelated (to the robots.tx file) possible problems that would trigger this error.