Welcome to WebmasterWorld Guest from 54.162.240.235

Message Too Old, No Replies

Is Google WMT Wrongly reporting crawl errors

     
10:51 am on Apr 2, 2013 (gmt 0)

10+ Year Member



G has been sending me emails to say they cannot access my robots.txt.

I can access via a browser but when I use WMT Fetch as googlebot tool it says page unreachable. However when I click on "Page Unreachable" it gives me this report:


Fetch as Google

This is how Googlebot fetched the page.

URL: http://www.example.com/robots.txt

Date: Tuesday, April 2, 2013 at 3:41:27 AM PDT

Googlebot Type: Web

Download Time (in milliseconds): 14985

To me, this appears to show that it did download it.

Assuming I had a problem I have replaced robots.txt. I have replaced .htacccess ( and tried it without .htaccess). Have also been in contact with my hosting company who cannot see a problem.
I am still however getting the messages from Google so am concerned.
12:39 pm on Apr 2, 2013 (gmt 0)

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I get these all the time, because between my own sites and client sites I have a hundred sites or so in my GWT, hosted all over the place. Once I ascertain there's no real problem, I ignore them.
1:10 pm on Apr 2, 2013 (gmt 0)

10+ Year Member



May be heading to an answer. I just noticed all my website files are dated 13th March 2013, the same date G started reporting crawl errors, so am assuming something changed on the server on that date.
Am waiting for a response from the hosting company.
3:31 pm on Apr 2, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Download Time (in milliseconds): 14985

Big problem. ^
4:33 pm on Apr 2, 2013 (gmt 0)

10+ Year Member



yep, noticed that g1dms. All the reports are just under 15 seconds - guess that is when they take it as timed out. Doesnt say timed out in the report though.
6:00 pm on Apr 2, 2013 (gmt 0)



Download Time (in milliseconds): 14985
Just checked a few pages to make sure that wasn't GWT doing that.

@denisl if all your site files on the server have the same date then you can be sure the host did something. Have you also tried to test your sitemap with GWT to see if it gets errors too?
6:42 pm on Apr 2, 2013 (gmt 0)

10+ Year Member



Have just tried the googlebot test on sitemap.xml and gives the error "unreachable robots.txt". I thought it was a good sugestion of yours Str82u as I didnt think they would access robots before downloading the sitemap.
For every other file it gives the same robots error.

I'm still waiting for a further response from the hosting company but something happened on the 13th.

I notice that WMT crawl stats show that the number of files downloaded since the 13th have dropped from a few thousand to a few hundred, but download times are around the same.
12:05 pm on Apr 3, 2013 (gmt 0)

10+ Year Member



As some sort of closure on this. The hosting company wouldnt admit that anything changed on the server on the 13th March but today, everytime I test with "Fetch as Google" on WMT, I get a positive result.
Having told the hosting company this, I have had no further response.
10:18 am on May 19, 2013 (gmt 0)

10+ Year Member



Have got the problem again, and this time it is hitting visitors. I can access robots.txt but it appears the G cannot.

My hosting company came back with this response:

"I've had a check in the error_log and it looks as though everytime Google has tried to spider your site, it's been coming up against this:

[Fri May 17 19:33:43 2013] [error] PHP Notice: Undefined variable: province2 in /var/www/vhosts/mysite.com/
httpdocs/templates/local_menu_1dir.php on line 247, referer: http://www.google.com/url?sa=t&rct=j
&q=keyword%20keyword%20keyword%20keyword&source=web&cd=1&ved=0CCsQFjAA
&url=http%3A%2F%2Fwww.mysite.com%2Fregions%2F&ei=AniWUZuYF8rl0QGFkYCoDg
&usg=AFQjCNESwr4e5lDTZYaNQCDTBEfnfRkkIA


That in turn is causing a temporary IP block which would prevent it from being able to access the server.

The server itself has 100% uptime for the day. "

But this is only a notice, not a warning and I wouldn't have expected Google to be phased by it.
My robots.txt and htaccess have not changed in the last 2 years. I am working through some of the errors that are causing these notices but don't believe they are the problem
.

[edited by: phranque at 10:43 am (utc) on May 19, 2013]

[edited by: Robert_Charlton at 6:35 pm (utc) on May 19, 2013]
[edit reason] unlinked referer url & fixed side-scroll [/edit]

10:42 am on May 19, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



googlebot does not see the error in the server error log.
you need to find out what response googlebot is getting from the server, starting with the status code.
you can find this in the server access log.
for example, sometimes an "[error]" in the server error log causes a "500" status code, which would not be an interesting response for googlebot.

you might try accessing your robots.txt from a browser with a user agent switcher so that you look like a googlebot request and see if that gives you any clues.
4:37 pm on May 19, 2013 (gmt 0)



Is there a chance that your host is blocking certain regions or countries because of potential hacking threats?
6:00 pm on May 19, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should fix the error in the PHP script anyway. Not only will it keep your error log a bit cleaner, it will remove the possibility that it is actually somehow causing the problem so if you have to keep going back to the hosting service they can't point to that issue any longer.
8:44 pm on May 19, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



That in turn is causing a temporary IP block which would prevent it from being able to access the server.

<snip>

But this is only a notice, not a warning and I wouldn't have expected Google to be phased by it.

phranque, did you accidentally edit out the wrong thing? As written, this no longer makes sense.

The key phrase appears to be quoted from the host:
a temporary IP block which would prevent it from being able to access the server

That implies there's a php script which reacts to the warning-- even if the warning appears to be gibberish-- and slams down a lock. But whose script is it? The host, being overenthusiastic with mod_security, or the client, using cut-and-paste code that needs to be edited to fit the site?
8:54 pm on May 19, 2013 (gmt 0)

10+ Year Member



Thank you for your replies.
I havn't been able to get a user agent switcher to work yet.

But I also notice that I cannot find Googlebot in the access logs or error logs, only google.com search strings.

So it looks to me that G is being blocked before it gets to any of my scripts.
Have tried removing htaccess incase there was a problem there that I couldn't see.

Am now waiting for another reply from the host.
10:00 am on May 30, 2013 (gmt 0)

10+ Year Member



This is still ongoing for me. The hosting company now say they need the ip used by googlebot when they fail to access my site (this does not show in my access or error logs).

Am I right in saying that you cant get this info from Goolge, you can only do a reverse dns lookup on an ip you have in your logs.
6:27 pm on May 30, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



66.249.something

And that's just off the top of my head.

:: shuffling papers ::

66.249.64.0/19 but it tends to stick with a pretty exact IP for quite a while.

Editorial comment: If your hosts don't similarly know the googlebot's IP off the top of their heads, they have possibly not been working hard enough. And the UA string always contains the word "Googlebot", possibly in conjunction with something else, like "-Mobile" or "-Image".

Edit: Now, wait a minute. If the word "Googlebot" doesn't show up in logs at all, then it can only be a DNS issue. But google must be reaching your site, or it wouldn't be indexed. And if weren't indexed, it would not come up in searches.
7:06 pm on May 30, 2013 (gmt 0)

10+ Year Member



Googlebot does occasionally show up in the logs, but not when it fails to get through.

Google have been finding my site for the last 11 years but have been having trouble in the last few weeks.

Not aware of dns issue assumed the host is blocking it somehow.
Strangely, i have a subdomain on the domain which does not appear to have problems.
7:19 pm on May 30, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You can easily test whether they are blocking by UA by changing the UA string in your browser and then trying to access the site.

A very large well-known host was found to be blocking Googlebot for months some years ago.
If I could remember which of 1&1, GoDaddy or some other it was, I'd mention it here.
9:07 am on May 31, 2013 (gmt 0)

10+ Year Member



This particularly happens when Google crawls the website. I mean a deep index, perhaps the server would not be able to hold up when Google crawls thousands of pages in a matter of minutes. Happens usually with large websites.
2:25 pm on May 31, 2013 (gmt 0)

10+ Year Member



Well all of a sudden, googlebot can access robots.txt and is crawling my site. The hosing company say they have not made any changes.
Google visitors dropped from 800 a day to 40 - not expecting to get them back very quickly.

@morpheus83 - it is a complicated site (though visitor navigation is simple) - I may need to look at that.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month