Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- "Phrase Based Indexing and Retrieval" - part of the Google picture?


tedster - 2:47 am on Feb 18, 2007 (gmt 0)


I think I do see a logical problem there, Oliver, but not a infinite loop. The following is what looks like a contradiction to me: (Note that 'bad' here means 'lacking in predictive power'.)

[0036] In each phrase window 302, each candidate phrase is checked in turn to determine if it is already present in the good phrase list 208 or the possible phrase list 206. If the candidate phrase is not present in either the good phrase list 208 or the possible phrase list 206, then the candidate has already been determined to be "bad" and is skipped.

[0039] If the candidate phrase is not in the good phrase list 208 then it is added to the possible phrase list 206, unless it is already present therein.

In [0036] it sounds like no new 'good' phrases can ever be added. Then [0039] seems to contradict that. But this must be because of the poorly writte "plain English" patent language. If the 'good' and 'possible' phrase lists really stayed empty, someone would notice.

But this is all in the preliminary stage of identifiying 'good' and 'bad' phrases, so I just let it pass and assumed poor editing and/or proofreading. I'm very willing to grant that a solid list of related phrase is built. What interests me more is how that list of 'good' phrases and documents where they occur is now put to use.

This patented process for spam detection is looking for excessive numbers of related phrases (scraping a top 30 list to create a patchwork page could create that condition). It's also looking for excessive occurances of any one of the 'good' phrases - stuffing in other words.

The thing is that phrase based processing can also be used simply to rank honest documents for relevance to the search phrase. The way I understand it, spam documents identified by this process should be way over the top -- not just a little bit more intense than an honest document.

[edited by: tedster at 3:15 am (utc) on Feb. 18, 2007]


Thread source:: http://www.webmasterworld.com/google/3247207.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com