danny - 6:35 pm on Jun 13, 2011 (gmt 0)
In the original post Walkman wrote: it's clear to me that this is a penalty
I call it "negative screening" instead of "a penalty", but yes I think this is clear.
I have just blogged about this at length, but it seems to me that what Google has done is to build, using human feedback and employee appraisal, a large corpus of spam - that is, of junk or near-junk pages that rank highly on searches. They have then fed this spam corpus to a machine learning system, and applied the resulting filter across their entire index.
The problem is that the machine learning system can't evaluate lack of quality or junkness directly, so it's using measurable features of web sites and pages that correlate with those. And this is where the false positives come in.
Possibly I have just been unlucky, but quite possibly sites like mine have been actively used as models by spammers, who have copied all the features that can be easily copied, just replacing the content with auto-generated gumph.