Page is a not externally linkable
tedster - 2:12 am on Jun 9, 2011 (gmt 0)
did not, does not, and will not make sense until we are (not likely) told what new parameters were incorporated with Panda x.x.
Here's how I see it.
The Panda algorithm is based on machine-learning. This means it's a predictive algorithm, assembled by an automated process. It's predictive because it works from a "seed set" that was generated by human judgment. The machine learning program is let loose across a huge pile of factors to discover what data might predict "shallow quality", as defined by the seed set. The prediction will not be accurate in the case of every website, but as the process iterates it does become more and more accurate.
When the machine predictions look good and their results pass some human QA, then those factors it identified, however they are weighted and combined, become the algorithm. This stays in place until that entire process can be re-run and generate a new version of the algorithm, incorporating new factors. As I understand it, that's the "running the data" part of Matt's comments.
The full list of parameters at any one time is likely to be even more confusing to the general public than the current situation is. And for a select few people, that list would open the door to gaming the algorithm.
So I'd say you're right - we're not going to get the recipe. If we did, we might be astounded at some of the data that Google is maintaining. I'm sure there a lot more than we've ever guessed, and much that they collect but have never used before.