homepage Welcome to WebmasterWorld Guest from 54.227.41.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Is Google WMT Wrongly reporting crawl errors
denisl




msg:4560538
 10:51 am on Apr 2, 2013 (gmt 0)

G has been sending me emails to say they cannot access my robots.txt.

I can access via a browser but when I use WMT Fetch as googlebot tool it says page unreachable. However when I click on "Page Unreachable" it gives me this report:


Fetch as Google

This is how Googlebot fetched the page.

URL: http://www.example.com/robots.txt

Date: Tuesday, April 2, 2013 at 3:41:27 AM PDT

Googlebot Type: Web

Download Time (in milliseconds): 14985

To me, this appears to show that it did download it.

Assuming I had a problem I have replaced robots.txt. I have replaced .htacccess ( and tried it without .htaccess). Have also been in contact with my hosting company who cannot see a problem.
I am still however getting the messages from Google so am concerned.

 

netmeg




msg:4560575
 12:39 pm on Apr 2, 2013 (gmt 0)

I get these all the time, because between my own sites and client sites I have a hundred sites or so in my GWT, hosted all over the place. Once I ascertain there's no real problem, I ignore them.

denisl




msg:4560588
 1:10 pm on Apr 2, 2013 (gmt 0)

May be heading to an answer. I just noticed all my website files are dated 13th March 2013, the same date G started reporting crawl errors, so am assuming something changed on the server on that date.
Am waiting for a response from the hosting company.

g1smd




msg:4560635
 3:31 pm on Apr 2, 2013 (gmt 0)

Download Time (in milliseconds): 14985

Big problem. ^

denisl




msg:4560664
 4:33 pm on Apr 2, 2013 (gmt 0)

yep, noticed that g1dms. All the reports are just under 15 seconds - guess that is when they take it as timed out. Doesnt say timed out in the report though.

Str82u




msg:4560679
 6:00 pm on Apr 2, 2013 (gmt 0)

Download Time (in milliseconds): 14985
Just checked a few pages to make sure that wasn't GWT doing that.

@denisl if all your site files on the server have the same date then you can be sure the host did something. Have you also tried to test your sitemap with GWT to see if it gets errors too?

denisl




msg:4560711
 6:42 pm on Apr 2, 2013 (gmt 0)

Have just tried the googlebot test on sitemap.xml and gives the error "unreachable robots.txt". I thought it was a good sugestion of yours Str82u as I didnt think they would access robots before downloading the sitemap.
For every other file it gives the same robots error.

I'm still waiting for a further response from the hosting company but something happened on the 13th.

I notice that WMT crawl stats show that the number of files downloaded since the 13th have dropped from a few thousand to a few hundred, but download times are around the same.

denisl




msg:4560928
 12:05 pm on Apr 3, 2013 (gmt 0)

As some sort of closure on this. The hosting company wouldnt admit that anything changed on the server on the 13th March but today, everytime I test with "Fetch as Google" on WMT, I get a positive result.
Having told the hosting company this, I have had no further response.

denisl




msg:4575428
 10:18 am on May 19, 2013 (gmt 0)

Have got the problem again, and this time it is hitting visitors. I can access robots.txt but it appears the G cannot.

My hosting company came back with this response:

"I've had a check in the error_log and it looks as though everytime Google has tried to spider your site, it's been coming up against this:

[Fri May 17 19:33:43 2013] [error] PHP Notice: Undefined variable: province2 in /var/www/vhosts/mysite.com/
httpdocs/templates/local_menu_1dir.php on line 247, referer: http://www.google.com/url?sa=t&rct=j
&q=keyword%20keyword%20keyword%20keyword&source=web&cd=1&ved=0CCsQFjAA
&url=http%3A%2F%2Fwww.mysite.com%2Fregions%2F&ei=AniWUZuYF8rl0QGFkYCoDg
&usg=AFQjCNESwr4e5lDTZYaNQCDTBEfnfRkkIA


That in turn is causing a temporary IP block which would prevent it from being able to access the server.

The server itself has 100% uptime for the day. "

But this is only a notice, not a warning and I wouldn't have expected Google to be phased by it.
My robots.txt and htaccess have not changed in the last 2 years. I am working through some of the errors that are causing these notices but don't believe they are the problem
.

[edited by: phranque at 10:43 am (utc) on May 19, 2013]

[edited by: Robert_Charlton at 6:35 pm (utc) on May 19, 2013]
[edit reason] unlinked referer url & fixed side-scroll [/edit]

phranque




msg:4575430
 10:42 am on May 19, 2013 (gmt 0)

googlebot does not see the error in the server error log.
you need to find out what response googlebot is getting from the server, starting with the status code.
you can find this in the server access log.
for example, sometimes an "[error]" in the server error log causes a "500" status code, which would not be an interesting response for googlebot.

you might try accessing your robots.txt from a browser with a user agent switcher so that you look like a googlebot request and see if that gives you any clues.

Awarn




msg:4575483
 4:37 pm on May 19, 2013 (gmt 0)

Is there a chance that your host is blocking certain regions or countries because of potential hacking threats?

rainborick




msg:4575500
 6:00 pm on May 19, 2013 (gmt 0)

You should fix the error in the PHP script anyway. Not only will it keep your error log a bit cleaner, it will remove the possibility that it is actually somehow causing the problem so if you have to keep going back to the hosting service they can't point to that issue any longer.

lucy24




msg:4575521
 8:44 pm on May 19, 2013 (gmt 0)

That in turn is causing a temporary IP block which would prevent it from being able to access the server.

<snip>

But this is only a notice, not a warning and I wouldn't have expected Google to be phased by it.

phranque, did you accidentally edit out the wrong thing? As written, this no longer makes sense.

The key phrase appears to be quoted from the host:
a temporary IP block which would prevent it from being able to access the server

That implies there's a php script which reacts to the warning-- even if the warning appears to be gibberish-- and slams down a lock. But whose script is it? The host, being overenthusiastic with mod_security, or the client, using cut-and-paste code that needs to be edited to fit the site?

denisl




msg:4575523
 8:54 pm on May 19, 2013 (gmt 0)

Thank you for your replies.
I havn't been able to get a user agent switcher to work yet.

But I also notice that I cannot find Googlebot in the access logs or error logs, only google.com search strings.

So it looks to me that G is being blocked before it gets to any of my scripts.
Have tried removing htaccess incase there was a problem there that I couldn't see.

Am now waiting for another reply from the host.

denisl




msg:4579377
 10:00 am on May 30, 2013 (gmt 0)

This is still ongoing for me. The hosting company now say they need the ip used by googlebot when they fail to access my site (this does not show in my access or error logs).

Am I right in saying that you cant get this info from Goolge, you can only do a reverse dns lookup on an ip you have in your logs.

lucy24




msg:4579572
 6:27 pm on May 30, 2013 (gmt 0)

66.249.something

And that's just off the top of my head.

:: shuffling papers ::

66.249.64.0/19 but it tends to stick with a pretty exact IP for quite a while.

Editorial comment: If your hosts don't similarly know the googlebot's IP off the top of their heads, they have possibly not been working hard enough. And the UA string always contains the word "Googlebot", possibly in conjunction with something else, like "-Mobile" or "-Image".

Edit: Now, wait a minute. If the word "Googlebot" doesn't show up in logs at all, then it can only be a DNS issue. But google must be reaching your site, or it wouldn't be indexed. And if weren't indexed, it would not come up in searches.

denisl




msg:4579591
 7:06 pm on May 30, 2013 (gmt 0)

Googlebot does occasionally show up in the logs, but not when it fails to get through.

Google have been finding my site for the last 11 years but have been having trouble in the last few weeks.

Not aware of dns issue assumed the host is blocking it somehow.
Strangely, i have a subdomain on the domain which does not appear to have problems.

g1smd




msg:4579595
 7:19 pm on May 30, 2013 (gmt 0)

You can easily test whether they are blocking by UA by changing the UA string in your browser and then trying to access the site.

A very large well-known host was found to be blocking Googlebot for months some years ago.
If I could remember which of 1&1, GoDaddy or some other it was, I'd mention it here.

morpheus83




msg:4579771
 9:07 am on May 31, 2013 (gmt 0)

This particularly happens when Google crawls the website. I mean a deep index, perhaps the server would not be able to hold up when Google crawls thousands of pages in a matter of minutes. Happens usually with large websites.

denisl




msg:4579885
 2:25 pm on May 31, 2013 (gmt 0)

Well all of a sudden, googlebot can access robots.txt and is crawling my site. The hosing company say they have not made any changes.
Google visitors dropped from 800 a day to 40 - not expecting to get them back very quickly.

@morpheus83 - it is a complicated site (though visitor navigation is simple) - I may need to look at that.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved