tedster - 3:47 pm on Apr 10, 2010 (gmt 0)
To add some detail to the idea of different data sets:
We've developed an interesting trick that speeds up the first step: instead of storing the entire index on one very powerful computer, Google uses hundreds of computers to do the job. Because the task is divided among many machines, the answer can be found much faster.
To illustrate, let's suppose an index for a book was 30 pages long. If one person had to search for several pieces of information in the index, it would take at least several seconds for each search. But what if you gave each page of the index to a different person? Thirty people could search their portions of the index much more quickly than one person could search the entire index alone. Similarly, Google splits its data between many machines to find matching documents faster.
One query can use in the area of 500 different servers - and there are thousands of servers that hold various versions of that data. So yes, you can easily be seeing the results of different data sets a lot of time.
And with regard to the time stamp, yes, the data from every url is stored at various stages of its evolution and each one has a time stamp. Not all data-sets contain every time-stamped version of a given URL.
Watching Google move data around is a pretty intensive sport - it can either confuse things or clarify things for you, depending on how accurately you picture what they're doing.