Forum Moderators: open
This is a recap and a prediction topic.
Recall that GG said that the -sj index would move to the other datacenters, then we would see backlinks/spam filters applied across the board.
If this is the case, and if the d.c. datacenter (-dc) just got the -sj index (I saw it bouncing around last night actually); and the cable & wireless (-cw) datacenter got the index on the 15th. *Then* it seems we have 2 days for the index to be propagated to each datacenter (this is a worst-case scenario, as the datacenters may be updated in parallel and may all pop up with the -sj index very shortly).
Since we have 5 datacenters left to go, that brings us 10 days in the future for all datacenters to given the -sj index...which brings us to the 27th. At this time we should see the backlinks/spam filters being applied at every datacenter as deltas/patches (if you will) and the real "dance" will be underway. It will, of course, take considerably less time.
Notice that the prediction of the 27th is, in my opinion, a worst-case scenario; we will most likely see things happen sooner.
Peter
As a side note, will their be a deep crawl anytime soon?
Let's suppose that the new algo with major changes needs the index data in an entirely new format. They changed the format with the latest crawl data at the time that they started testing the new algo in February.
Now suppose that the deep crawl develops the equivalent of a diff between the last index and what will be the new index. With the march and april crawls the diff that is developed for the old style index will not apply to the new improved style index.
Google decides that the new algo is a major improvement. They now have a hard choice. Move the new algo with old data, and then get the diffs straightened out. Or they can build and move a whole new index from scratch and still probably end up at least a month behind, due to the complexity of the process.
So they decide to do the massive migration of the old data, and work on getting the diffs to apply to the new format. Depending on how they have the data stored, they may have to develop a March and an April diff, they may be able to do just an April diff, or even jump to a May diff if they decide to do it that way.
Now here is another fun part. This would be the first time applying that diff to the new format. It will have to be tested first, which will make it a slower process than the old updates. It is unknown how much of the 2+ weeks between the deep crawl and the update was required to generate the diff. They may have already started working with the march or april data. We do not know.
So yes, I can easily explain the February data being used for the current update. I may be wrong in the specifics, but this scenario is more likely than never having a deep crawl again.