ergophobe - 2:39 am on Dec 9, 2012 (gmt 0)
Maybe I should give an example of one experiment I ran. If you grab a table of US obesity rates by state and plug them into correlate, you'll notice that the terms that match that pattern the most closely, according to Google Correlate, mostly relate to rap music.
A friend who is a professional statistician and I were discussing this and his guess at what's happening is
1. You're only starting with 51 data points, so there's a lot of noise in the signal.
2. To save cycles, Google does a first pass approximation and then a second pass (I think - check this). If it finds something promising on the first pass, it explores that avenue some more.
3. So if you get some semi-random connection because of limited data or the random occurrence of two curves that match for no good reason, Google will look at curves for related terms to see which of those match, which means it has a self-reinforcing aspect.
So the end result is that when Google tries to build statistical correlations between a data set and a search, it can get pretty whacky. It can also be used to accurately predict flu outbreaks in the US in advance of official CDC warnings.
In short, it can be useful, but one must exercise caution. As the old saw goes, "correlation is not causation".