Forum Moderators: Robert Charlton & goodroi
This is just odd.
The 64.* DC's return about 300 pages from my site.
The 216.* DC's return about 46,000 pages from my site.
And the 66.* return 69,000 pages from my site.
Currently I have about 65,000 pages.
If I go to google.co.uk I get 46,000 pages. If I go to google.com from my US based server I get the same 46,000 results.
It is all very odd and confusing.
[edited by: tedster at 9:56 pm (utc) on Jan. 30, 2006]
Yes, pretty sure on that.
Obviously they can get updated and they may spread to some DCs before the other - although that process is normally quick.
As that DC has the exact cache (even to the nearest minute/second) as the non-BD dc then it is not showing an update Mozilla Googlebot cache IMO but a Normal Googlebot cache.
Of course as time goes on things change and we may see a merge or something.
>>>We assumed new links and text changes were calculated at this point and used in the new serps?
Well to a degree in the past this was true - however there has always been underlying indexes etc - which means you can rank on words that no longer appear on the newly cached page etc. G1smd has a frustrating experience in this area.
BD is obviously a bit different - MC is saying no ranking changes for one - so it would not surprise me if the ranking structure of said pages is based on different data to what the Mozilla Googlebot/Big Daddy crawl has gathered aswell. EG Perhaps ranks are based on Normal Googlebot crawl.
Of course this is speculating a bit now.
[edited by: Dayo_UK at 11:54 am (utc) on Feb. 1, 2006]
BD is obviously a bit different - MC is saying no ranking changes for one - so it would not surprise me if the ranking structure of said pages is based on different data to what the Mozilla Googlebot/Big Daddy crawl has gathered aswell. EG Perhaps ranks are based on Normal Googlebot crawl.
If this is the case then sites that rank on Big Daddy but not at all on default must now be appearing for "other" structural reasons if the results are both using the same cache to rank.
Yes, that is what I have been thinking. Esp With MC coming out and saying:-
"the changes on Bigdaddy are relatively subtle (less ranking changes and more infrastructure changes). Most of the changes are under the hood, and this infrastructure prepares the framework for future improvements throughout the year."
IMO the next stage of the BD process is when things might really happen.
Bluewidgets
Mozilla Googlebot has various IP addys.
It can easily be identified in the logs by user-agent though:-
66.249.72.103 - - [31/Jan/2006:HH:MM:SS +0100] "GET /mypage.html HTTP/1.1" 200 9148 www.mysite.com "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
I have a google sitemap, but its not got all my pages as I was experimenting. Has my 300 odd landing pages, but not the much more deeper than that. (Differing counties/towns for my property stuff)
In the majority of the BigDaddy DCs it only has these pages, plus a couple of others.
In the current live databse it has 45,000 pages.
In this big daddy DC [66.249.93.104...] it has 66,000 pages which is roughly the right amount.
Possibly the sitemaps are going to be a lot more important?
I've been lurking in the data centers threads for quite some time and I must say that you guys have taught me a lot and I hope you continue to share this fascinating stuff.
I have many sites one of which is brand new, about 5 months old. This site seems to have 15000 pages indexed in BD while on regular google, only 12 pages are indexed. Regular Googlebot only visits my homepage on this newer site while Mozilla Googlebot crushes the site 24 hours per day. That explains the difference in # of pages indexed between the data centers.
I have also noticed that on my older sites (some of which are quite old and rank very well), .htm pages are now crawled by regular googlebot only while php pages are now crawled only by Mozilla Googlebot. Any thoughts?