When faced with a hard prediction problem, one possible approach is to attempt to perform statistical miracles on a small training set. If data is abundant then often a more fruitful approach is to design a highly scalable learning system and use several orders of magnitude more training data.
This general notion recurs in many other fields as well. For example, processing large quantities of data helps immensely for information retrieval and machine translation.
Several years ago we began developing a large scale machine learning system, and have been refining it over time. We gave it the codename “Seti” because it searches for signals in a large space. It scales to massive data sets and has become one of the most broadly used classification systems at Google.