Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Datacenters Watch: 2006-01-30

Observations, Analysis and Remarks

         

johnwards

3:55 pm on Jan 30, 2006 (gmt 0)

10+ Year Member



< continued from [webmasterworld.com...] >

This is just odd.

The 64.* DC's return about 300 pages from my site.

The 216.* DC's return about 46,000 pages from my site.

And the 66.* return 69,000 pages from my site.

Currently I have about 65,000 pages.

If I go to google.co.uk I get 46,000 pages. If I go to google.com from my US based server I get the same 46,000 results.

It is all very odd and confusing.

[edited by: tedster at 9:56 pm (utc) on Jan. 30, 2006]

Dayo_UK

11:49 am on Feb 1, 2006 (gmt 0)



>>>>Do the individual BD DC's cache pages with a seperate crawl or do they use a common Mozilla Googlebot crawl to cache pages for all Big Daddy index DC's?

Yes, pretty sure on that.

Obviously they can get updated and they may spread to some DCs before the other - although that process is normally quick.

As that DC has the exact cache (even to the nearest minute/second) as the non-BD dc then it is not showing an update Mozilla Googlebot cache IMO but a Normal Googlebot cache.

Of course as time goes on things change and we may see a merge or something.

>>>We assumed new links and text changes were calculated at this point and used in the new serps?

Well to a degree in the past this was true - however there has always been underlying indexes etc - which means you can rank on words that no longer appear on the newly cached page etc. G1smd has a frustrating experience in this area.

BD is obviously a bit different - MC is saying no ranking changes for one - so it would not surprise me if the ranking structure of said pages is based on different data to what the Mozilla Googlebot/Big Daddy crawl has gathered aswell. EG Perhaps ranks are based on Normal Googlebot crawl.

Of course this is speculating a bit now.

[edited by: Dayo_UK at 11:54 am (utc) on Feb. 1, 2006]

Ellio

11:51 am on Feb 1, 2006 (gmt 0)

10+ Year Member



I agree with you it does seem to be the defualt cache. Very odd.

bluewidgets

12:00 pm on Feb 1, 2006 (gmt 0)

10+ Year Member



Mozilla Googlebot
does anybody can give me the IP of Mozilla Googlebot

Ellio

12:01 pm on Feb 1, 2006 (gmt 0)

10+ Year Member



BD is obviously a bit different - MC is saying no ranking changes for one - so it would not surprise me if the ranking structure of said pages is based on different data to what the Mozilla Googlebot/Big Daddy crawl has gathered aswell. EG Perhaps ranks are based on Normal Googlebot crawl.

If this is the case then sites that rank on Big Daddy but not at all on default must now be appearing for "other" structural reasons if the results are both using the same cache to rank.

Dayo_UK

12:08 pm on Feb 1, 2006 (gmt 0)



>>>>>If this is the case then sites that rank on Big Daddy but not at all on default must now be appearing for "other" structural reasons if the results are both using the same cache to rank.

Yes, that is what I have been thinking. Esp With MC coming out and saying:-

"the changes on Bigdaddy are relatively subtle (less ranking changes and more infrastructure changes). Most of the changes are under the hood, and this infrastructure prepares the framework for future improvements throughout the year."

IMO the next stage of the BD process is when things might really happen.

Bluewidgets

Mozilla Googlebot has various IP addys.

It can easily be identified in the logs by user-agent though:-

66.249.72.103 - - [31/Jan/2006:HH:MM:SS +0100] "GET /mypage.html HTTP/1.1" 200 9148 www.mysite.com "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"

bluewidgets

12:20 pm on Feb 1, 2006 (gmt 0)

10+ Year Member



66-249-72-33
is that a mozila or a normal one?

Dayo_UK

12:23 pm on Feb 1, 2006 (gmt 0)



From searching on the web it looks like a Mozilla Googlebot.

johnwards

12:43 pm on Feb 1, 2006 (gmt 0)

10+ Year Member



This is starting to make more sense.

I have a google sitemap, but its not got all my pages as I was experimenting. Has my 300 odd landing pages, but not the much more deeper than that. (Differing counties/towns for my property stuff)

In the majority of the BigDaddy DCs it only has these pages, plus a couple of others.

In the current live databse it has 45,000 pages.

In this big daddy DC [66.249.93.104...] it has 66,000 pages which is roughly the right amount.

Possibly the sitemaps are going to be a lot more important?

DanMoore

1:04 pm on Feb 1, 2006 (gmt 0)

10+ Year Member



Hi guys,

I've been lurking in the data centers threads for quite some time and I must say that you guys have taught me a lot and I hope you continue to share this fascinating stuff.
I have many sites one of which is brand new, about 5 months old. This site seems to have 15000 pages indexed in BD while on regular google, only 12 pages are indexed. Regular Googlebot only visits my homepage on this newer site while Mozilla Googlebot crushes the site 24 hours per day. That explains the difference in # of pages indexed between the data centers.
I have also noticed that on my older sites (some of which are quite old and rank very well), .htm pages are now crawled by regular googlebot only while php pages are now crawled only by Mozilla Googlebot. Any thoughts?

foolsgold

3:03 pm on Feb 1, 2006 (gmt 0)

10+ Year Member



DAYO_UK 'IMO the next stage of the BD process is when things might really happen'.

So any thoughts on what this will be and when?

This 275 message thread spans 28 pages: 275