Forum Moderators: Robert Charlton & goodroi
The index growing on 64.233.179.104 does seem to be largely a Mozilla Googlebot generated index - and this new index is being built for the future - so can we say Mozilla Googlebot is now taking over from normal Googlebot.
OK ignore supplimentals etc for a moment - as all DCs have this problem and have a look at the cache dates for pages that are indexed...... some of these pages have only been fetched by Mozzilla Googlebot (even on the same day as normal Googlebot visited)
Eg. On the test DC I have a homepage cached 30th November at 5:40 - fetched by Mozilla Googlebot - while on the other DCs it is cached on 30th November at 3:40 - fetched by normal Googlebot.
So in many ways this does look like building a whole new index parrellel to the existing index - with largely Mozilla Googlebot crawl data.
Some pages appear very old - eg another page is cached on the test dc on 6th November - but on the other dcs it has cache in December - checking the logs - 6th November was the last time Mozilla Googlebot visited this page.
OK - there are pages in the test DC only visited by normal Googlebot - however, pages crawled by Mozilla Googlebot do not appear on other DCs.
The newest pages on the DC crawled by Mozilla Googlebot seem to be in November - eg no pages crawled by Mozilla Googlebot in December have made it to the index yet.
Some pages crawled by Mozilla Googlebot in November have not made it to the index - so I dont know if G are working with a sample data size......
For confirmation that this is a whole new build of the index MC said on his blog:-
"the test data center certainly has some different crawling and indexing characteristics."
OK - folks remember also that MC said that this index will roll out in months and is in a test state so I guess no need for early panic stations and slagging of Google in this thread.
Now 301s, 302s, Canonicals - for me a lot more 301s Google has crawled and indexed correctly. 302s - still lots in the index (mainly supplimentals) - not seeing any new 302s that show the url of the linking site but the content of the destination site (seeing the newest at about August 2005 time) - no doubt others may find some.
What are other observations people have seen with the new crawling and indexing on this test dc.