freejung - 5:01 am on Jul 13, 2011 (gmt 0)
I am not sure why people aren't understanding that panda is not an algo that has been run on all sites and all keywords.
By George, indy, I think you've got it!
All this time we've been asking, why do most pandalized sites see no change in ranking even after making significant changes?
The question assumes, perhaps incorrectly, that Panda 2.0 re-evaluated all of the sites hit by Panda 1, and that 2.1 and 2.2 did likewise. This need not be the case.
We tend to think of Google as computationally all-powerful, but they are not. The web is very large. Machine learning is expensive. We suspect Panda was enabled by Caffeine, meaning maybe Caffeine has just barely enough juice to run it. What if Panda is so computationally intensive that the vast majority of sites hit have not yet been reevaluated?
If this is true, then most of us have only been scored once. Some would be rescored, of course, to test the accuracy of rescoring. This would probably be a random sample, which is why it's so hard to figure out. The rest of us just have to wait it out. It is entirely possible that the changes we have already made are completely sufficient to remove the pandalization, but we won't know it until the Panda comes back our way.
The trouble is, spam is like a hydra. The first round got rid of lots of spam that ranked well for good keywords (and some good sites too, of course, no algo is perfect). But there was plenty more spam waiting to take its place, and behind that are the scrapers who copied the spam, and on and on... we've all seen it. We complain that post-Panda the SERPs are still filled with crap, but of course they are, and there's plenty more crap lurking on page 3 and 4 and 5 waiting to take its place too.
So if you're Google, what do you choose to spend your incredibly expensive massive server farms evaluating in subsequent rounds -- the sites that you've already determined to be spam (rightly or wrongly, all that really matters to them is that it's mostly spam), or the new spam that has arisen to take its place? It would be far more efficient to evaluate the new spam and get rid of it, rather than reevaluating the old spam in hopes that some of it has been fixed.
To us this may seem horribly unjust -- our sites have been sentenced to a slow, painful death with no appeal -- but to Google it's just a numbers game. We have X amount of computer power, which can evaluate Y number of sites. We can get better results by continuing to evaluate more sites than by reevaluating over and over again the sites that we already believe are probably spam.
Trouble is, there's a lot of spam out there. They may have bitten off more than they can chew.