maxmoritz - 7:25 pm on Feb 28, 2011 (gmt 0)
The content on your site (pages affected) may not fit a QDRL (query determines reading level) type filter or assessment.
The content on your site may be totally unique, but you might not have been the 1st of say 10 to publish (or more importantly: get spidered) something similar (topically and informationally) out of all the pages GoogleBot has spidered.
The content (documents affected) may fit a classifier pattern of a 'spammy site' based on phrase frequency, related phrases, and 'natural language' factors. +(User Behavior and/or Links)?
Interesting theories, and they may well be part of it. It would seem very difficult/impossible to do this with any degree of accuracy across all sites on the web, which may be part of why we're seeing so many sites hit that "shouldn't" have been.