tedster - 2:11 am on Jun 29, 2011 (gmt 0)
Maybe they are also processing the content directly - reading level, grammar, spelling, average sentence length, number of Latin-derived words compared to Anglo-Saxon words, number of reptitions of the query phrase (remember them saying that Panda 2.0 went further into the "long tail"?) and perhpas even more. Maybe there is some attempt to DIRECTLY measure content, in addition to looking for secondary and supporting signals.
Biswanath Panda is an expert in large-scale decision tree processing. The Panda algo would be complex because of two factors at least:
1. The construction program can range over a wide spectrum of data points - things I'm sure we never even considered in our wildest dreams. Google collects a LOT more data than they are actively using.
2. The decision tree could have such complex if-then looping logic that we'd also be very challenged to get the big picture.