ackkster - 5:46 pm on Apr 15, 2011 (gmt 0)
I think another way to look at, combining those two points above, is like this. Google may be looking at the ratio of "good" content to "bad" content above the fold on a page.
"Good" Content is:
- Original, textual content (with perhaps some experimental English language analysis for quality of text, etc)
"Bad" Content is:
- Content duplicated across domains (i.e. syndication, scrapper sites, etc) and not identified as the "original" source
- Internal duplicated content
- Large Images
Where the algorithm seems to fail is in that it ignores relevant, large images (i.e. pop-crunch) and often mis-identifies the original source of content, penalizing the original content creator.
Viewed under this system, most of the sites in this list seem to fall under the "bad" content category (rightly or wrongly)