Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- "Phrase Based Indexing and Retrieval" - part of the Google picture?


Oliver_Henniges - 6:37 pm on Feb 17, 2007 (gmt 0)


The universe of "all posible phrases" is gigantic, even for three-word-phrases and even for one single language. To me the key-issue seems to be those mechanisms, by means of which google narrows down this mass.

If I understood the patent correctly, this is all done "on the fly", whilst crawling, evaluating and indexing a certain bunch of a couple million pages on the web. At least the spam detection patent is NOT applied to the whole index in one big loop. How is this subset of a few million pages preselected? By accidence and link structure in the normal crawl?

It is impossible to intermediately store the co-occurance matrix, unless you concentrate on a core of a few thousand most-spammy keywords and phrases.

Again: If we want to proceed towards a closer understanding (and perhaps simulation) of the mechanisms at work, it is essential to narrow down the problem to a level computable on a normal PC.

If I'm completely wrong with this, please enlighten me about the passages I overread.


Thread source:: http://www.webmasterworld.com/google/3247207.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com