dusky - 7:03 pm on May 12, 2010 (gmt 0)
This seems to be a popular theory. But I wonder what is really NEW here. I mean, Matt C has talked about how the caffeine infrastructure allows them to do more.
To do more also means to shuffle and assess more and to have the choice of spring cleaning while at it, they probably have to adjust and improve the algo. While this meant as moving data to a more efficient infrastructure, querying this new structure and storage system has to be different and therefore judging who should be brought first or last requires a much improved / changed algo. Because now many variables and attributes can be assessed about websites due to a larger storing capacity, there is definitely an algo tweak if not a complete overhaul of most of the algo at play here.
There are certainly strong indications this theory is gaining credibility from the symptoms many sites are experiencing when otherwise (in normal stable SERPs) they shouldn't.
There are too many legitimate white hat sites being whacked to believe it's a penalty. Many well established gov and edu sites, large commercial and corporate sites, well known news and media sites that seen a change for the worst so far for the last three months at least and starting MayDay in particular. G* sites themselves are affected as noted above and somewhere else, how many more days or weeks this is going to take is probably everyone's question, at least the people who join my "total re-index" camp anyway, or shall I call it total-recall. We used to think they lost some or most of the data when this happened, and that's why the crawl rate and the re-indexing, but it turned out it was due to infrastructure and algo updates, Florida and BigDaddy are best examples.
Why do total re-index which may take weeks and months for some sites, it's probably the only way to work out their exact algo by counting the real quantity and quality of "web Inter-neted" sites and their relation to each other in terms of votes (PR and trustrank). Consequently, as I reiterated earlier, all trustrank and real PR is dropped temporarily, hence chaos, two pager sites with a photo album are on page 2 with pages belonging to a PR8/9 multimillion $ site etc. They limit the damage in most cases by keeping rank and trustrank intact for keywords of main homepages, dominkeyword without the tld, copyright and corporate names keywords, but most of other thinner pages are in the mix.
As I noted few pages before amongst other tale tales are:
Sitelinks (where applicable) are still there;
Main corporate keyword / name is there ranking as usual, BUT if the keywords are generic and are in the domain, some sites sliding to page 2/3/4/5+ and back and forward;
No problems on WMT;
Backlinks decreasing on WMT;
Long tail taking a hammering;
PR still the same;
site: command way low and decreasing by the day;
Gbot at it like crazy;
More of product / ecommerce own products sites affected, maybe because most have large count of pages and takes longer to spider;
Even more if products are a feed from Amazon / ebay's shopping etc....well, larger number of pages I guess (if not a thin aff site);
More of large 100k+ pages sites are in this category
OK we can see the new layout may have a small impact, certainly for product / ecommerce based sites, but in my view they are re-indexing all those sites from scratch, some will be back stronger, some weaker.
Here are the two interesting analogies why some sites are (temporarily) rising while others are falling. One is, another site has to take the fallers place, the other which I think is my belief, data is moved and re-idexed in chunks / batches and they will have their turn.