Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- Phrase Based Multiple Indexing and Keyword Co-Occurrence


Marcia - 4:50 am on May 11, 2007 (gmt 0)


for example in the phrased based patent on spam

Exactly. Everyones' been so focused on the spam/penalty aspect, that there can be 19k posts and we could still be going around in circles with "this is spam" and "no, it isn't" and never get to the bottom of what this whole thing is really all about and how it works.

That's why I started a whole new thread with the focus on co-occurrence, because it's mentioned so many times in so many contexts across all those apps that it's almost like they're telling exactly what they're doing and we have to trip over it with our eyes closed to miss it.

It's about a whole indexing system, and let's face it: they didn't put in a whole new infrastructure (Big Daddy) as a conspiracy to bilk more Adwords dollars out of webmasters by using Adwords data against them, or as a beautification project for the Plex in between remodeling the lunchroom and restrooms with new decor and fixtures. :)

In this one it says very specificially:
"Phrase identification in an information retrieval system"

[0090]After the last stage of the indexing process is completed, the good phrase list 208 will contain a large number of good phrases that have been discovered in the corpus. Each of these good phrases will predict at least one other phrase that is not a phrase extension of it. That is, each good phrase is used with sufficient frequency and independence to represent meaningful concepts or ideas expressed in the corpus. Unlike existing systems which use predetermined or hand selected phrases, the good phrase list reflects phrases that actual are being used in the corpus. Further, since the above process of crawling and indexing is repeated periodically as new documents are added to the document collection, the indexing system 110 automatically detects new phrases as they enter the lexicon.

So no, they're not looking for sites going after "money phrases" gleaned from Adwords data, they're generating the taxonomy of phrases very specifically by analyzing the data on pages they fetch by crawling, and creating the posting lists of possible and good phrases by using data on the phrases encountered and the co-occurrence statistics. They're very clear and very specific on that point and even give details. That's why they call their collection the co-occurrence matrix.

I think if we can put the spam part aside (which may be where collateral damage is accidentally happening) and put conspiracy theories aside, we can probably get to the bottom of at least a good part of what's going on, by looking at some details of what those papers are actually saying.


Thread source:: http://www.webmasterworld.com/google/3336435.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com