Forum Moderators: Robert Charlton & goodroi
The index growing on 64.233.179.104 does seem to be largely a Mozilla Googlebot generated index - and this new index is being built for the future - so can we say Mozilla Googlebot is now taking over from normal Googlebot.
OK ignore supplimentals etc for a moment - as all DCs have this problem and have a look at the cache dates for pages that are indexed...... some of these pages have only been fetched by Mozzilla Googlebot (even on the same day as normal Googlebot visited)
Eg. On the test DC I have a homepage cached 30th November at 5:40 - fetched by Mozilla Googlebot - while on the other DCs it is cached on 30th November at 3:40 - fetched by normal Googlebot.
So in many ways this does look like building a whole new index parrellel to the existing index - with largely Mozilla Googlebot crawl data.
Some pages appear very old - eg another page is cached on the test dc on 6th November - but on the other dcs it has cache in December - checking the logs - 6th November was the last time Mozilla Googlebot visited this page.
OK - there are pages in the test DC only visited by normal Googlebot - however, pages crawled by Mozilla Googlebot do not appear on other DCs.
The newest pages on the DC crawled by Mozilla Googlebot seem to be in November - eg no pages crawled by Mozilla Googlebot in December have made it to the index yet.
Some pages crawled by Mozilla Googlebot in November have not made it to the index - so I dont know if G are working with a sample data size......
For confirmation that this is a whole new build of the index MC said on his blog:-
"the test data center certainly has some different crawling and indexing characteristics."
OK - folks remember also that MC said that this index will roll out in months and is in a test state so I guess no need for early panic stations and slagging of Google in this thread.
Now 301s, 302s, Canonicals - for me a lot more 301s Google has crawled and indexed correctly. 302s - still lots in the index (mainly supplimentals) - not seeing any new 302s that show the url of the linking site but the content of the destination site (seeing the newest at about August 2005 time) - no doubt others may find some.
What are other observations people have seen with the new crawling and indexing on this test dc.
Yes, lots of url onlys have the title and description back - this really need to be recrawled but the required recrawling of supplimentals is pretty much standard accross the dcs.
The caches in December - the ones I have seen tend to be normal Googlebot rather than Mozilla Googlebot. Normal Googlebot still adds pages to the test dc - but Mozilla Gbot does not add pages to the other dcs or so it seems.
fatpeter
What sort of cache date are you showing for those pages picked up by Mozzilla Googlebot - and are pretty much all the pages crawled by the bot indexed. EG 3000 crawled and it appears 3000 indexed?
Can't go past a 1000 but it looks like they were all crawled and added.Cache dates around the middle of november. Only odd thing... a site: search always gave an accurate number of about 400. I added 3000 and now the site: search gives 11000 results.
What sort of cache dates are you showing?
Are you seeing a pretty much full crawl to index return - or just a sample?
Seems my whole site is indexed. In current Google 99% of my pages are url's only. In this version they are actually indexed and ranking like normal (no sandbox or any weirdness, 3 1/2 year old site though so shouldn't be sandboxed anyway).
I had been getting deep crawls by Mozilla bot for months but very very few pages would actually end up indexed. This is more like it!
On a site with the 301 redirect, results are identical with and without the www in the site command.
This looks like an improvement for the canonical issues.
I've also noticed several of my pages returning from supplemental and now ranking again.
My experience:
What i'm seeing here is that google basically reincluded my whole site. The site was completely wiped out from google.com in the early days of Bourbon.
Since May we completely rewrote our site, fixed canonical issue, deleted thousands of pages, etc...
Our site and all of it's pages (200) are now back in the index and showing on 4th 5th page for competitive keywords.
SO:
-This index is quite fresh AND for us it's either:
1) penalties or more agressive filters are now yet applied
2) complete reinclusion and weight off of the multiple sites penalties that were on this domain.
Ok I hope it's option 2.. but...
COuld they still add filters and penalties to this index, or are we dealing with soon to be stable results with all site penalties applied?
For the benefit of further discussion, I'm recalling what Matt wrote regarding the test DC.
"Broker Boy, I do expect that data center to eventually go live, but it will take a few months, in all likelihood. That data center (64.233.179.104) recently moved into regular rotation recently, and I wouldn�t be surprised if one more data center joined it in the next week or so. After that, I�d expect those two data centers to stay in the rotation (but not spread) until after the holidays. Not sure about that, but that�s my best guess."
"Joe, I believe we've instituted some more intuitive results for site: queries within the last few weeks. The test data center will be where most of the progress on 301s/canonicalization takes place."
Wish you all a great day!
Intresting, I missed the comment to Joe on MC blog.
(I kind of think that Big Daddy might really be the real Jagger3 - ie GG talked about Canonical, 301 things and the base for a new index as Jagger3 and this did not happen - but hey whatever - I still think MC/GG get excited sometimes about a change and say it is coming before it actually happens or is ready to happen, we do know that they have pride in working for Google so I guess they get excited too when big things happen - lets hope it comes through).
The test DC is not showing test results for me at the moment.
>>I kind of think that Big Daddy might really be the real Jagger3 - ie GG talked about Canonical, 301 things and the base for a new index as Jagger3 and this did not happen - but hey whatever - I still think MC/GG get excited sometimes about a change and say it is coming before it actually happens or is ready to happen <<
What I like most about GG and Matt is that they are posative and optimistic fellow members. Great.. the two gentlemen get excited sometimes and tell us things before it happen :-)
For example:
GG & Matt! Now I have 50% of my pre-Allegra Google referrals. When do I get the rest of my pre-Allegra traffic back. Thanks a bunch :-)
can somebody please confirm that the Testcenter 64.233.179.104 currently uses the same results as www.google.com. I don't see testresults anymore.
The results are the same all along. Same cache dates, some number of pages returned. Bad as ever since Jagger for our site.
Am I the only one seeing this?
The test DC is not showing test data at the moment - when it next goes live I would be intrested in your observations on your sites which may have non-www, canonical problems etc.
Cheers
Can we get an idea of how many people have been crawled by this IP?
The difference should be:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
versus
Googlebot/2.1 (+http://www.google.com/bot.html)
In the user agent field.
I've got around 1,200 pages crawled by Mozilla/5.0 and another 700 by Googlebot 2.1 for far this month. This is for a website with approximately 1,000 pages.
I am seeing the test results now matching the rest of the dc's
I'm seeing the same.. I told my collegues a few days ago, I think it was before the weekend, that the datacenter on this IP shows very old results, with cache dates in early november.
Next to that, I wasn't seeing any URL-only results, but i do seem them again now. After doing some testing the 'Big Daddy' now seems to be the same as most of the datacenters again.
Too bad, maybe they know that we know and switched IP's ;)