My first thought is that traffic data, user data, should offer some good clues - and we think Google is already measuring and using user data. So I'm thinking that any site with a huge bounce rate (like over 80%) across ALL their landing pages might want to take a look at addressing that.
I'm also thinking that they're probably going to fold this challenge into the work that their human editorial army does. For context, you may want to read our thread Google Patent - human editorial opinion [webmasterworld.com].
I honestly don't know which approach worries me more - a manual ban like Blekko is doing, or trusting an algorithm to do the job with an even hand.