Forum Moderators: Robert Charlton & goodroi
We have noticed a very large fall-off in Google's spidering of our sites since Sunday. This is happening across two sites, each runs on its own server farm. We are hosted at RackSpace.
We normally get a huge amount of spidering on a daily basis, in the range of 500K to 800K pages spidered daily. It was running ~700K and then dropped down to <20K over the last couple days. Our page rendering times are well within our normal range. We have made no changes to our load balancer.
Is anyone else seeing a similar pattern?
Thanks!
Greg
there is no practical way that I know of to compare cache dates
The number of pages is not really significant - you can still get a sample worth monitoring and even a statistically valid sample from a relatively small amount of dates.
You can check a pattern of pages like home >> category >> subcategory >> page and use that as the basis to figure out the caching cycle your site has.
To my mind, there's a process involved in figuring this out:
- Check that the spidering data is valid (how are you measuring - can we rely on this data?)
- If the data is valid, determine the affected pages (should be part of the measuring process, or at least the data should be collected)
- See if there has been any other impact on the affected pages
As you work through such a process, it often becomes clear where (if at all) any issue worth addressing lies.
1) Our spidering data is captured from the our server logs and then post processed in order to separate out the various bots. We then cross-correlate our firgures to what Google reports in the WMT console to ensure that our data is within the same range as what Google reports(it is). We have been doing this now for about 5 years.
2) Since we do not have a process that goes out and looks at a sample set of pages to see their cache timestamp and to measure how often those are updated, I don't know how this would be useful for the current situation. It makes sense to set this up to monitor results going forward.
Thanks
Greg
The way I interpret your question is that Google is spidering less than in the past few months, and should this be something to be concerned about.
With that much data, you should be able to see whether this change fits within standard deviation, and so is not something to be unduly concerned about - the sort of pattern g1smd mentions.
If you can connect it with some kind of quality statistic (ranging from cache date all the way up to conversions) you can judge if it has had any impact worth responding to.
Remember that on a busy forum such as this one, and on an index of billions of pages, lots of people will be experiencing less frequent spidering at the same time as you, but that doesn't necessarily mean that the causes are the same.
Our spidering started picking up steam again late yesterday, so it looks like things are on the mend.
Andy-
This was WAY out of the standard deviation. Our normal range is 600K to 900K pages spidered a day. On some very rare occasions (like 2x per year) that might drop to 200K, but what we saw on Sunday and Monday was 20K. So we are talking several standard deviations off.
However, as mentioned above and consistent with eeek's experience, things are turning back up again.
Thanks!
Greg
regards,
KMS11