Wilburforce - 1:46 pm on Feb 27, 2013 (gmt 0)
I don't know how many read the patent document.
I think it is a must.
If that is what is now being implemented, it actually throws some light on this mess.
It has at least one fundamental flaw: if the existing data set contains imperfect pages, then predictions based on it are weighted by those imperfections.
Presumably, that is why good pages are sinking and bad pages are rising: there are more bad pages than good pages, so any phrase-distribution "standard" is weighted in favour of bad pages.
Presumably, also, that is why thinner pages are rising: a good page with a lot of content about a subject will contain a higher number of related phrases, while averaging on the basis of the whole population of pages predicts a lower number. There are relatively few good pages with a lot of content, so they are more likely to fall at the extremes of distribution (and attract a penalty).
To paraphrase MikeNoLastName, it is Conform to the Norm or die.