Forum Moderators: Robert Charlton & goodroi
hoping someone can shed some light...
Looking over my logs for the past week I see that the Googlebot is requesting many pages 4-5 times during a 24 hour period.
Went looking and found this [google.com...]
"In general, Googlebot should only download one copy of each file from your site during a given crawl."
This is certainly not what we are seeing.
During a 24 hour period:
- 16 pages were requested 5 times
- 22 pages were requested 4 times
- 1 pages was requested 3 times
- 4 pages weere requested 2 times
Anyone care to comment? This seems like a *very* inefficient use of Google's and our resources.
Anyone else seeing this behaviour?
Thanks
Cheers
[validator.w3.org...]
It sounds like google is trying to crawl your pages, but is erroring out. More than likely seeing good content, and trying to index the page fully.....
thanks for your suggestions.
We are using Mambo which does not validate LOL.
I'm also seeing Header Response 200 OK for each of these Googlebot pageviews. I think this means the request was fulfilled (ie page was delivered OK).
I've also done some speed tests and they seem OK. We have several thousands viewers each week and they have not reported any difficulty accessing the site.
Strangely the site: command is giving us 10X the number of pages our site actually has.
Thanks
Cheers
{edit: should also add the Google index (cache date) is also updated regularly)
If Mambo does not validate, then you're either using a relatively old version or your TEMPLATE does not validate. Mambo 4.5 stable I believe was the first version that validated as XHTML 1.0 Transitional.
If you're running something earlier than that, you've got some serious security worries.
The sitemap on one site doesn't seem to be doing any damage, and the site is performing well enough in Google on its chosen keywords, but reflecting its state about three weeks ago.
I've built an xml sitemap that reflects the status of each page on the site as perfectly as possible - including lastmods that reflect the genuine date of last change which is confirmed by the date of the page on the server, proper priorities, etc.
After each significant change, I upload the sitemap. If I resubmit it, Google downloads it within minutes. If I don't Google will pick it up on its regular 16-hour-or-so cycle.
But it never uses it - preferring to pound away time after time (sometimes 5 or 6 times a day) on the home page. Which would be fine, except that's pretty much a static page - the changes occur elsewhere.
Yahoo and MSN are up to date - both have crawled all changed pages and continue to sniff about in a sensible manner. It's just Google.
Anyway, I've written a piece of code that analyses crawler log entries and page age. It shows conclusively that Google is, on average, eight days behind. And you can't build an up-to-date index with old data.
Doubtless they would say this reflects general interest in my site - it ain't that important. But the reverse of the coin is that both Yahoo and MSN have current data and Google doesn't. I suspect if this sort of situation becomes known, many people will switch their search engine of choice.
As it is, this week over last, the percentage of Google referrals of the total (which is up slightly) is down 6 points.