tedster - 2:05 am on Mar 1, 2011 (gmt 0)
I'll guess it began with semantic analysis scores of several types - reading level, semantic variation, maybe even sentiment analysis.
Couple that with a machine learning algorithm that started with a hand seeded group of obvious junk for training. The machine algorithm was originally given wide range to grab almost any signal that Google has been accumulating, whether currently used or not. That search was looking for pages that clustered tightly with the seed group. Maybe two seed groups - one for great content, too.
If I'm right, then reverse engineering this baby will be very tough. General advice: don't cut corners in any way, shape or form.