---- Phrase Based Multiple Indexing and Keyword Co-Occurrence
Marcia - 6:15 am on May 11, 2007 (gmt 0)
One quick way to meaure co-occurance and its prominence would be to simply scrape the first 2 pages of rankings, the 20th 2, the 40th 2, 60th 2, etc until you hit the end, taking the last 2 full pages of results.
Cygnus, that would show you first-order co-occurrence, but would there be enough data in that kind of limited set to be able to include how the results have been influenced by second-order co-occurrence?
This paper is about LSI (which is terms not phrases), but it's a concept I've had a problem grasping and it's about the clearest explanation I've seen on second,or high-order co-occurrence:
Google has such a HUGE amount of data, billions of pages, and basically what they're looking for is the ability to predict other good phrases, and that would have to be based on a whole boatload of statistical data, including data on phrases that don't occur together, but each of which occurs with other phrases that a few hops back occur with each other.