Forum Moderators: Robert Charlton & goodroi
It's very easy for us, as ordinary or even not-so-ordinary search users, to have a limited view of what happens when we make a query on Google. The results are fast, often faster than searches over our own website data, so we tend not to appreciate all that's going on. This article gives a nice breakdown of the process, ending up with a rather startling statment:
Today it's estimated that [a single Google query] travels across 700-1000 machines, a figure that has nearly doubled since 2006 perhaps due in part to the introduction of Google Universal.[blogoscoped.com...]
The article ends with three reference sources - an mp3 file, a video and a technical PDF document. When you take in even a bit of the complexity - and we're not even talking about the actual ranking algorithm yet - it's a wonder that we don't see more technical problems than we occasionally do.
[edited by: tedster at 1:40 pm (utc) on July 9, 2008]
The final result of this first phase of query execution is an ordered list of document identifiers(docids)
...the second phase involves taking this list of docids and computing the actual title and uniform
resource locator of these documents, along with a query-specific document summary. Document servers
(docservers) handle this job, fetching each document from disk to extract the title and the keyword-in-context
snippet.
...As with the index lookup phase, the strategy is to partition the processing of all documents by• randomly distributing documents into smaller shards
• having multiple server replicas responsible for handling each shard, and
• routing requests through a load balancer.[research.google.com...]
A lot of that hardware has only come online in the last few months, and most of it has NOT been seen in rotation via google.com until the last few weeks. Large chunks were online without any GFE name access (not even assigned a name) for quite some time before that, presumably running in test mode before being properly commissioned.
At the same time, older hardware in the 216.239... and other such blocks has been taken offline and presumably retired.
[edited by: g1smd at 1:01 am (utc) on July 12, 2008]
It's a complete new system, and a different pattern to anything that I have seen before. They are methodical guys and girls at the 'plex and so it means something.... but what, I have no idea.
When I get more data I'll post something.