Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Sitemaps data vs Server log data

         

bw100

9:57 pm on Jun 8, 2007 (gmt 0)

10+ Year Member



I have a niche market news aggregation site that went live on 1 April.
After 2 months of fine-tuning, we submitted a manual sitemap to google via google webmaster tools account on June 4, with 68 pages identified: 61 news pages and 7 site pages, which include the homepage (current date news display) and the archive access.
Reviewing the available Google sitemap data reveals the statement that googlebot has visited the site once, on June 2nd (prior to sitemap submission). There are 7/68 pages indexed.
Analysis of daily logs indicates that there is site access by ip addresses attributed to google in Mountain View California, google in Michigan, as well as two international google addresses.
Google Webmaster Tools sitemap data is:
The cache is from June 5.Per Webmaster tools: Googlebot last home page access: June 2
Pages indexed: 7/68: homepage, feedlist, April news: 2 April dates; May news: 3 May dates
Log data for June YTD reveals two ip addresses for googlebot; cumulatively the data is: 26 visits / 42 files / 57 hits.
I am trying to make some sense of the apparent discrepancy between what Google Sitemap reporting is stating, server log data, pages crawled and pages indexed.
Suggestions?

tedster

9:29 pm on Jun 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Webmaster Tools is sometimes buggy in the data they report. If your server logs show an authentic request from googlebot with a 200 response, then that URL was crawled. Of course, whether that URL gets into the index is a later decision that Google makes. spidering is no guarantee of inclusion.

Webmaster Tools is a free reporting service that Google offers, but they certainly do not guarantee its accuracy. I've seen examples where the actual Google SERPs were out of line with what GWT reports. With the way that Google's data is sharded and spread around their monster server farm (hundreds of thousands of physical machines) it's very difficult for theri reporting features to be all in a line with the live search reality. Live search is still their top priority, not reports for webmasters -- that only makes business sense.