It's my understanding that Penguin and Panda are very sophisticated algorithms that have to be run periodically in conjunction with the main always-running algorithm, and they may require seriously massive computing power. This suggests to me that they must both being doing something that is beyond the main algo's capabilities, or something that can't be worked into the main algo - otherwise, why have them? Panda, for example, was trying to weed out content that wasn't exactly dupe, but was still of little if any value - that's a pretty ambitious goal for an algorithm.
Therefore, it seems to me we should assume Penguin is similarly sophisticated - either evaluating something the main algo can't, or evaluating something much better or more profoundly than the main algo does. While Google's stated intentions with Panda made it pretty clear Panda was doing things the main algo had never even tried to do, their statements about Penguin were simply that it targeted spam and SEO tactics - tactics the main algo has always tried to detect and penalize. So what is Penguin doing that the main algo can't do, or can't do as well? What's Penguin doing that's worth the cost of its development and deployment?
Could it be looking not so much at the actions we take with our sites (which the main algo already looks at), but at the patterns those actions create? What if Google has identified some patterns of behavior on certain queries that (they feel) reveal spam on a whole new level? Let's say that nearly everyone trying to rank for "coarse ground widget" has a link to Wikipedia and green H1 headers. Some of these sites don't stuff keywords or manipulate their backlinks or do any traditional spammy stuff that the main algo has always dealt with. But Google has noticed this pattern and decided it indicates ALL these sites are deep into SEO and may need to be dropped in rankings, depending on other factors? (I'm assuming that even if you trigger Penguin, it gets weighed against other variables and might not take effect if you have enough strong quality signals).
What do you think? Too far-fetched, or is it possible? And if Penguin's not doing something like this, what do you think it's doing to justify the cost of development and deployment? (I really want to know, even if I'm wrong!)