tedster - 2:50 am on Oct 10, 2012 (gmt 0)
When you aggregate a lot of data, patterns emerge that are not at first glance obvious just by describing what you're going to look at. For example, when billions of Tweets are mined, spam accounts can stand out just by their content. You'd never guess that from thinking about a small sample, however.
Any user metric is really going to be measuring three things: how well the searcher phrased the query, how well Google matched it, and how good the resulting webpage is as a result.
Yes, if Google somehow mismatches the query and the web page, that can spell trouble for the site involved - and yes, we do see "some" of that pattern. I don't think that happens intentionally, however, even though some have suggested that idea.
How well the user phrased the query? That's going to be a wash over millions of data points - maybe even at a lower level. And Google Suggestions tends to confine that issue a good bit, as well.
How good the web page is among other possible results? Measuring that is clearly Google's goal. When you go down the long tail, they are not always doing well, at least so far. But it's still a lot better than I would have thought possible through machine learning.