tedster - 12:45 am on Jan 10, 2011 (gmt 0)
This is, clearly, not an area where I can claim any solid knowledge. Here's the best understanding I have so far - but please take it as my opinion and not "fact".
Google most likely has several different sets of buckets (taxonomies). One important set would be for the query terms themselves. What is the user intention behind this query and that query? Is it informational, transactional, navigational? Is freshness important for this particular query? How about Geography?
Then there would be buckets for types of sites. Google's work would be to measure and match up the query to the right type of site. Or, in the case where intention is ambiguous or mixed, to find the right blend of results. Questions like "how many pages on this domain contain this query term" might come into play. Does this domain contain other pages that are topically related? In other words, search results today are not just a question of text matches.
The actual taxonomies Google uses are probably quite fine-grained, and not nearly as coarse as what I suggested above. The main point would be the importance of clear relevance signals on a site and each of its sections and pages. The muddier things get, the harder a job Google has sorting everything out. For instance, a home page with a ton of internal links that are all over the place semantically might create a challenge that requires a lot of testing on Google's side of things, resulting in a traffic roller coaster in both quantity and quality.
Here's a related discussion from a while back - stimulated by the QDF (query deserves freshness) idea: Blended Results, QDF and User Intention at Google [webmasterworld.com]
And here's a Google patent: Automatic taxonomy generation in search results using phrases [patft1.uspto.gov]