TheMadScientist - 9:46 pm on Apr 10, 2010 (gmt 0)
Here, I'm frustrated with work for a minute, so let me see if I can explain a bit for those who may be interested...
Google is basically a database... The original storage method was referred to as 'Big Table' implicitly indicating the data collected is stored in a table, and from this inference some reasonable conclusions can be drawn when understanding the workings of a database. (The resource(s) for the references to 'Big Table' and underlying data storage changes are linked in the first post of the Oct. Updates Thread, you may have to read more articles than the one linked to find the reference(s).)
From the preceding knowledge reasonable understandings of the process and what an infrastructure change is compared to an algo change can be drawn...
Note: The algo is technically a heuristic, but algo will be used for ease of reading and typing.
1.) Google is a big database.
2.) The information they have is stored in a 'Big Table'
3.) The algo creates an index of the pages to be shown to visitors for speed of access.
So, when GoogleBot spiders it gets information and stores the information in a table. Originally the storage system was referred to as exactly what it is: Big Table (I'll call it the Original GFS), then Big Daddy (GFS 1), and now switching to Caffeine (GFS 2).
When or after the data is stored in the database table the information is retrieved and the algo is applied to the information.
To access data more efficiently in a database an index is used, which is basically a 'key' to say 'go to this portion of the hard disk to get the data' so rather than having to 'scan' the hard drive to find the location of the correct information it can be 'jumped' to quickly.
Reasonable conclusion: Google and other SEs refer to the dataset they display the results from as the 'index' because pages the algo has been applied to are stored in the index they search to find the results for a given query. (Hence the use of a noindex robots reference to remove a page from the results... It's not a 'noresults' reference or 'nodata' reference or 'nostore' reference, because the technical action happening is the page is not added to the index which is searched to generate the results. It's a direct note to the algo doing the processing and creating the index to not include a certain reference. This is also why a page with a 'noindex' reference on it can (and does) still pass PageRank. It's available in the table (data) used by the algo for calculations, but it's explicitly (at the direction of the site owner or operator) not included in the index published for searchers, so the page is not returned in the results.)
An infrastructure change can cause the results (shown from the index generated by the application of the algo) to change in a number of ways without the algo generating the index (results) changing. The results (index) generated can change from difference in the table data, including: the amount of data available, the speed of the access to the data stored in the table, the speed with which a new index that is publicly available and searchable can be generated, and a number of other ways.
A change to the algo will have an effect on the results (index created for people to search) regardless of the underlying dataset, but there may be different indexes generated from the same algo because of a change to the underlying data.
Hence, IMO, the need to draw a distinction between an algo change and an infrastructure change to have some clue of what is going on, because it's entirely possible since it appears (IMO) we are seeing an algo and infrastructure change at the same time what some of the things being reported are is 'adjustments to the algo' and not necessarily a 'roll back' to the Big Daddy dataset.
Helpnow said the results appeared old, but the pages included in the resultset were up-to-date, so what Helpnow may be observing is a reversion to a different algo (ranking and index creation mechanism) even though the underlying data could still be Caffeine... It's been reported by tedster (and others I think) the Big Daddy and Caffeine infrastructure are not compatible and cannot even be stored in the same place, so my guess is once they roll with the Caffeine infrastructure to a data center they do not revert to the Big Daddy dataset, but rather change, adjust, tune and if necessary, revert the algo generating the index to a previous more stable version...