Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot "double-spidering" tanks rankings

         

doughayman

5:29 pm on Jul 4, 2007 (gmt 0)

10+ Year Member



OK, guys, here's a new one. I go through this periodic -950 penalty of sorts. It happens periodically, and not regularly.

I have now attributed this to occurring for my domain, when the main page of my site (and domain) apparently gets spidered by Googlebot twice in succession, within 1 second of each other. I garnered this from my webserver logs, and have now gone back in history to see that this is a problem.

The first spidering returns a 304 (cached version) by my webserver, whereas the second spidering (1 second later) results in a full fetch (webserver return code of 200).

Does anybody have any thoughts on:

1) Why the double-spidering by Google in the span of 1 second?

2) Why this may have weird effects on my rankings almost immediately
upon the double-spidering?

Thanks in advance, and Happy Fourth for those celebrating.

doughayman

6:08 pm on Jul 4, 2007 (gmt 0)

10+ Year Member



I forgot to mention - when I make changes to this page, and it gets re-spidered normally, usually the ranking problem (-950 penalty) clears up.

I'm looking for suggestions to rid myself of this double-spidering phenomenon.

tedster

6:29 pm on Jul 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why would your server return a 200 status one second after a 304 unless the page content really changed? Or did you accidentally reverse the order of the responses here, meaning a 200 first and then a 304 one second later?

doughayman

3:11 am on Jul 5, 2007 (gmt 0)

10+ Year Member



No, that is what is strange, Tedster. The problems manifests whenever I get a 304, and then the next second, get a 200. It's definitely in that order, and the timestamps are one second apart. Whenever I see this strange phenomenon from Googlebot in my weblogs, the -950 penalty is almost instantly (at least within a few hours) imposed on me.

To remedy, I wind up changing content on this page, re-submitting my sitemap to Google, and once this page is re-indexed (singularly), after a day or 2, my rankings are back to where they once were.

I am extremely perplexed by this sequence of events, and for the life of am trying to explain this double-spidering, which leads to this problem.

tedster

3:31 am on Jul 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I hope it's clear that the http status code is something returned by YOUR server, not by googlebot. If the content has not changed since the previous spidering time included in the If-Modified-Since header request, then your server should be saying "304, 304, 304" over and over.

This makes me think that the second request you see from googlebot may have a weird "If-Modified-Since" header with an ancient or otherwise buggy time stamp (not the time of the request, but the time that the if-modified-since refers to). If your server is operating properly, that should be the only time it returns a 200 status and gives the full html document in response.

Can you capture the "If-Modified-Since" header that googlebot is sending on the second request? That may give you a big clue.

You can report googlebot crawling issues from this page [google.com] -- follow the instructiions on validating that it really is your site, attaching a logfile, etc.

PS - I don't know if the googlebot team would care about the apparent tie in to ranking changes or not. Their main area of focus would be the doubled request and any technical error in the header it includes.

jdMorgan

4:24 am on Jul 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The first request was likely made with an If-Modified-Since request header -- sometimes called a "Conditional GET," whereas the second request was likely made without that header. This would explain why you see the 304-200 response code sequence.

Now as to why they'd do that, and whether and why this ties in with a -950 penalty, I've no idea. But is is entirely possible to see a 304-Not Modified response followed by a 200-OK, since it is the client that determines whether to include If-Modified-Since in its request.

Jim

tedster

5:11 am on Jul 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for bringing that clarity, jd. I hadn't thought of a situation where the second request was made without the header, but it makes much sense.

I think I would write to the googlebot spidering team on this. Things may not be as they intended.

doughayman

12:06 pm on Jul 5, 2007 (gmt 0)

10+ Year Member



Tedster - yes I am aware that the codes are being generated by my server - I was just regurgitating my weblog sequence.

In regard to the googlebot link that you provided, after reading it carefully, I'm a little loathe to be submitting my site to this page. It appears that it should be used ONLY if you are uncomfortable with googlebot's crawl rate to your server. The last thing I would want to do is for Google to process my request, and "downgrade" or eliminate altogether the crawl rate to my site.