Robots.txt unreachable in WMT

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Robots.txt unreachable in WMT

denisl

9:18 am on Mar 12, 2009 (gmt 0)

ONe ofmy sites (this happens to be small non profit making at present) is showing a list of Unreachable Pages in WMT with the detail "robots.txt unreachable".
I don't have a robots file on this site - havn't seen the need yet so can't see where the problem is.

Checking crawl stats in WMT appears to show crawling coming to a halt at the beginning of this month.

The site is up and running and Google Analytics show normal visitor numbers and I have not made any changes to the site lately. I don't have access to raw logs so cannot check responses to the bot.
Any ideas what is happening?

tedster

2:57 am on Mar 13, 2009 (gmt 0)

Have you tried requesting http://example.com/robots.txt through a browser, just to see what happens? Best would be to use a googlebot user agent and be using a tool such as LiveHTTPHeaders, so you can see the server's repsonse.

Possible troubles could come from changes you didn't make on the server, so it would be good to check in real time just to what happens. Sometimes a web host takes a step to try to protect you from bat bots, for instance, and ends up making a problem for good bots. If you actually have no robots.txt file, you should see a 404 status code returned - and possibly your web host is now doing something other than returning a 404 status.

denisl

8:11 am on Mar 13, 2009 (gmt 0)

Thanks Tedster

Did the checks and robots was coming back as 404 and all other pages as 200.
Also used the robots analysis tool in WMT which didn't appear to show that there was no robots, but gave the robots text as blank, and was allowing pages. It also showed that robots.txt was last downloaded a few minutes ago.

Decided the easiest thing to do was create a robots file and hope that solves the problem.

tedster

8:30 am on Mar 13, 2009 (gmt 0)

robots was coming back as 404

Just to double check - this was a 404 http status, and not just a page with text "404 not found" on it.

And yes, creating a real robots.txt file is a good step, even one that just allows everything.

denisl

12:50 pm on Mar 13, 2009 (gmt 0)

This was a status report

However, having uploaded a robots file which gives a 200 response, WMT Robots Analysis tool gives the status as "Please check back later".

denisl

10:33 pm on Mar 17, 2009 (gmt 0)

Update
Having contacted my hosting company on the 13th.
This morning I saw for the first time that WMT showed robots 200 OK and crawl stats showed that G was crawling again.
Finally gota message from hosting company later which said:

[paraphrased]
The IP address 66.249.72.162 (googlebot) was blocked at the server firewall because of anomalous behaviour that triggered Apache mod_security. We are now allowing that IP address to crawl the server's websites again.

I assume that all G bots were blocked as there had been no crawling since the 1st March.
I also assume that everything was being blocked, but as G looks for robots.txt first, if there is a problem (even if robots.txt doesn't exist) WMT shows it as a Robots error.

[edited by: tedster at 1:25 am (utc) on Mar. 18, 2009]
[edit reason] paraphrase email quote [/edit]

tedster

1:26 am on Mar 18, 2009 (gmt 0)

Interesting - can you get any further information on what the anomalous behavior is? This could be affecting a number of servers.