|Google Cannot See My Site But Everyone Else Can|
| 1:40 pm on Sep 19, 2012 (gmt 0)|
Hi, I have a weird dilemma. Webmaster Tools (both FETCH and SITEMAP retrieval tools) cannot see 2 of my websites.
I consistently get a "Unreachable Robots.txt" file error from both. Moreover, I received a Google alert that 2 of my 7 GoDaddy websites have been unable to retrieve this file for over 24 hours. I get this error from the WMT FETCH tool, regardless of which pages on these websites that I try to retrieve. All 7 websites are hosted on a GoDaddy shared plan, and are on the same server. 5 of them are not exhibiting this problem at all.
I can see the website, robots.txt, and all associated pages fine from my Browser and FTP, and GoDaddy can as well. Moreover, with the shared hosting plan that I have, all my other websites on the same server can all be found by Google without a hitch. These effected domains have been in existence for many years, and this problem has manifested since Sunday.
To be safe, I uploaded robots.txt again, in the event that my robots.txt file was corrupted somehow. This did not remedy the problem at all. Removing robots.txt and augmenting it to contain a Googlebot User Agent didn't remedy it either. My robots.txt file is a simple 2 line
file that contains a "*" for User-Agent, and the 2nd line lists the URL of my sitemap. Again, these had been working for years, without a hitch, and hadn't been changed.
I have talked with GoDaddy and asked them about permissions and firewalls and such, and they say that everything is fine on their end.
Does anyone have an idea as to what this problem might be?
Thank you in advance.
| 7:49 pm on Sep 19, 2012 (gmt 0)|
What happens if you search your logs for "googlebot"?. You should be able to see the googlebot fetch attempts and what response the server gives.
| 8:10 pm on Sep 19, 2012 (gmt 0)|
|My robots.txt file is a simple 2 line |
file that contains a "*" for User-Agent, and the 2nd line lists the URL of my sitemap.
I hope you mean a 3-line file, where the second line is
followed by nothing. In any case, since you seem to be letting everyone in, why not see what happens if you take the other tack and delete the robots.txt file? Google doesn't mind if you don't have one at all. They only get upset if they can see it's there but have a problem reaching it. Sitemaps are similarly unessential-- and if they're in the default top-level location, search engines will find them even without haha a map.
The other possibility is that although it's got the name "robots.txt" you have inadvertently saved it in some other format, so google can retrieve it but can't read it. What do you see on the tab of gwt that is supposed to show the content of your current robots.txt?
| 8:30 pm on Sep 19, 2012 (gmt 0)|
Thank you for your response. I don't block anything, and hence, left out the blank "Disallow" line. However, I added it to try this permutation, and it had no effect.
I had already tried removing robots.txt altogether, and I still get an "Unreachable robots.txt" GWT error msg, when I try to fetch my home page.
I saved my robots.txt file in ASCII format, and the GWT tab shows the content as it should. This robots.txt file has been around forever, as I have not changed it at all recently.
I will go through the GoDaddy interface to see if I can find any log clues. According to Google doc, the "Unavailable" error message, implies a 5xx-level webserver return code.
| 8:30 pm on Sep 19, 2012 (gmt 0)|
You get that error message when your robots.txt returned a 5xx error. You have to wait for the next time Googlebot requests robots.txt file. Go to WMT > Health > Blocked URLs, and check robots.txt status.
| 8:58 pm on Sep 19, 2012 (gmt 0)|
I have 7 websites hosted on a shared GoDaddy Server, and only 2 of them are returning this bad status. All 7 websites reside on the same server with the same IP. And these 2 problematic ones were working for years before the problem manifested on 9/16/2012. In regard to WMT -> Health -> Blocked URL's, the status of my Robots.txt is 200 (OK), but last download date is 9/15/2012 (before problem manifested).
I tried forcing my sitemap to be read, hoping that robots.txt would be accessed to, and I'm still in WMT "pending" state - this for the last 2 days. So, it looks like I'm stuck.
| 9:17 pm on Sep 19, 2012 (gmt 0)|
The server probably had a short configuration problem and served 5xx for 2 of your sites. I would make sure the robots.txt is being served without an error (using a 3rd party header checker tool) and wait for the next robots.txt refresh.
| 9:46 pm on Sep 19, 2012 (gmt 0)|
Goog idea. Even more, you can check your robots.txt file with the "Fetch as googlebot" tool in WMT and you will also know almost exactly when and where to look in your server logs for that request and its response.
[edited by: tedster at 10:26 pm (utc) on Sep 19, 2012]
| 10:02 pm on Sep 19, 2012 (gmt 0)|
I have checked with 3rd party tools, and my robots.txt is being served out correctly. I hope your suggestion of waiting out the next robots.txt refresh is the answer, as I have fear of being delisted in Google on the 2 effected sites.
| 10:49 pm on Sep 19, 2012 (gmt 0)|
OK, I resolved problem by making GoDaddy a verified owner of my 2 effected domains in WMT. After a few hours, I am able to Fetch from WMT again, and get a 200 Success message. Yay! Only problem is if I change hosting providers down the line, I will need to remember to change this accordingly.
I was more inclined to do this now, rather than wait for the next robots.txt "refresh" as Levo pointed out, since I didn't want to risk getting delisted from the Google index.
Thanks for everyone's input on this.
| 10:59 pm on Sep 19, 2012 (gmt 0)|
Am I the only one confused by this, what's the owner in WMT got to do with robots.txt ?
The recent outage on godaddy most likely was the cause of the error on robots.txt I suspect, as it may have caused it to be unreachable when google tried to crawl.
Changing ownership of WMT will not get your site back any quicker you still need Google to recrawl robots.txt which it will do when next visiting the site.
| 11:34 pm on Sep 19, 2012 (gmt 0)|
OK, looks like I lied. I went into the Verification Methods screen of WMT, and selected the "Domain Name Provider" method of verification, in addition to the "HTML tag" method that I had used previously, when I established the domain.
WMT knows that my domain host provider is GoDaddy so they provide directions on how to add a TXT record to GoDaddy - WMT provides a unique string for this. I added this to my GoDaddy Domain Manager record, and then added it WMT as well, but I think I forgot to hit the "Verify" button on WMT, to effectively complete the loop and consummate the verification. I just checked and my Domain Name record in WMT is not there.
In any event, since WMT wouldn't recognize my site for whatever reason - maybe a prolonged server timeout noted above - I thought that I could trick it into being able to recognize my domain by doing the above.
Perhaps the "glitch" was just fixed, or perhaps my robots.txt successful fetch just happened to occur since my earlier posts. I dunno.
Sorry for the confusion, but all is well again in WMT land for me.