| 7:58 pm on Jan 19, 2009 (gmt 0)|
Those special searches include a * which is a wild card or pattern matching character. If you notice which words are bold in the results for those searches, you'll see that the rankings are no longer being calculated for just the single main keyword but now also include other words.
I'm happy that you asked this question, because there may be some value in noticing WHICH extra words are included and rank near the top.
If I were to take a guess, I'd say that the data collected by Google's phrase-based indexing [webmasterworld.com] is at work here - words that have a higher frequency of co-occurence may be more highly ranked on a wild card phrase. That's just a hypothesis at the moment, but it's one I'm going to experiment with.
| 8:33 pm on Jan 19, 2009 (gmt 0)|
I recently found a (very) small set of data released by Google two years ago to the "Linguistic Data Consortium" at the University of Pennsylvania. For anyone looking for a more concrete example of what phrase-based indexing measures, this is such a thing:
And this is the kind of indexing that I theorize might be in use with the * wildcard search results.
| 9:01 pm on Jan 19, 2009 (gmt 0)|
I'm not sure the highlighting is all that revealing here.
If you search widget*, the highlighting function will highlight that and the next word (or next punctuation character, the highlighting is very imprecise). Each asterisk will highlight an additional word.
| 9:08 pm on Jan 19, 2009 (gmt 0)|
I hear you Andy. However, the results also shuffle in the rankings -- so there's more than just bolding the next word going on.
I also found it interesting that Google seems to treat the * search results as a phrase match even though there are no quote marks.
| 9:17 pm on Jan 19, 2009 (gmt 0)|
I believe the phrase-esque matching is because the main function of the asterisk is for unknown words - e.g. if you're not sure what might come in the middle of something ([Google * News] or [Google news is a *]). It's not quite phrase matching though, as you'll get different results if you enclose the query in quotes.
What's also interesting is that the inclusion of a space (or lack of) changes results too - [google*] is not the same as [google *].
I don't think it's to do with related words though - it seems to me that a query for an unknown words is pretty tricky relevancy-wise, in a similar way to ultra generics like [0..9999999] or [site:com] - so there's some kind of fallback to trust or authority or some such thing ;)
| 11:07 pm on Jan 19, 2009 (gmt 0)|
you may find this site interesting - they have lots of ngrams freely available based on the British National Corpus [phrasesinenglish.org...]
"I don't think it's to do with related words though" - I'm thinking it doesn't show "related" words as much as it shows what words co-occur with the search term within Google's index.