tedster - 8:15 pm on Feb 27, 2010 (gmt 0)
My take is that the stored data that creates the final SERP is sharded into many many bits and pieces. The final SERP is built by a kind of layering process. It sequentially combines lists of various URLs, segmented by some metric or other (trust, semantics, PR, historical factors, etc).
An example of this process that we discussed here was whitenight's ghost data-set [google.com], but there are others.
Caffeine involves a rewrite of the way that most basic page information is stored, as well as the way it all gets layered together to make a final SERP. To get more of a handle on this, consider this description written in 2008:
Today it's estimated that [a single Google query] travels across 700-1000 machines, a figure that has nearly doubled since 2006 perhaps due in part to the introduction of Google Universal.
The opportunities for a big disconnect between what Google intends to happen and what really happens in a brand new infrastructure with a rewritten file system would be extreme.
Here's a few more references:
1. Our short discussion The Google Search Query - a technical look [webmasterworld.com]
2. The new domain Google began using in Q4 of 2009 = 1e100.net [webmasterworld.com]