Forum Moderators: Robert Charlton & goodroi
tell me something about google data center,
my site is not in google first 12 data center.
eastyn
Here's a post Brett made back in 2002 that covers the basics -- only the number of PCs involved today is significantly larger, in all likelihood. The new infrastructure isn't called "Big Daddy" for no reason!
Google has many data centers and runs a distributed load sharing system across more than 10k pc's running linux with 80 gig drives at last report. Somehow, the copy of the index must get transferred to all those hard drives in all those data centers. You ever transfer 80gig across the net? And then distribute that 80 gig down into thousands of hard drives?All of that takes a great deal of time. It's a constant process for Google. More-than-likely, the daily updates only copy out those parts of the index that are really updated. That's yet another possibility where new and old data could get mixed.
Load sharing works transparently. You do a search on Google and the request is routed via dns magic to the either the nearest data center or the nearest data center with the least load (we don't know their load distribution criteria on that).
Lastly, they could be working on the index, rolling indexes back, switching parts of the index, backing up parts of the index, rewriting some offending part of the index, deleting parts of an index - or a multitude of other actions or problems that only Google could know about.
Take those combinations of not knowing which box you are going to connect to and which index it may have, and the possibility of daily updating going on at the same time, and results may be unpredictable. There could be dozens of different indexes floating around various data centers - we have no clue.
One minute you'll get one copy of a index during a search, and the next you'll get another. Sometimes that could be yesterdays crawl, or last months crawl, or four months ago crawl.
[webmasterworld.com...]