Page is a not externally linkable
- Google
-- Google SEO News and Discussion
---- Phrase Based Multiple Indexing and Keyword Co-Occurrence


rcain - 3:12 am on May 15, 2007 (gmt 0)


Hi
Firstly thanks for a really interesting thread.

Secondly, please forgive me if my thought here seem rather simplistic; although my background is in Cybernetics, I'm new to this SE thread and haven't yet read though all the relevant patent applications which may or my not be manifest within our friend Google. However, the subject matter under discussion here suggests to me the value of reconsidering some first principles and how they might be most easily/computably implemented in practice, vis:

- Google's aim is to make searches as well as search results more 'Meaningful' to 'users' (putting aside for a while the spectre of PPC)

- Phrases support more 'meaning' than single words - it must certainly therefore represent an ultimate strategic mechanism of Google to perfect.

- central to 'meaning' is 'context' - context of the search-phrase and context of the found-phrase - this doesnt have to be terribly sophisticated in order to be 'useful' - a simple 'subject-container' type taxonomy can work pretty well (eg. subjects over phrase in para in page in site) - perhaps 'short-term memory' would be the next most useful thing to model where it isnt already implicit.

- it would seem logical/efficient to extend or 'layer' hashed inverted indexing techniques used in 'word' look-up to cover 'phrases' (strings with spaces in) in order to approximate contextual structure.

- statistical/baysian pattern matching algorithms would likely be used in conjunction with thesaurus/dictionary layers - the former being particularly useful for non-text based data and non-native language data (eg. images and composite pages - already used very successfully in certain email spam filters); the latter being implemented as a set of cached self-joins in existing index terms where native language lexicons are available.

- one mans spam is another mans gold - spam is not a type of data but a reduction of 'variety' - ie. low-meaning replication: ergo, to be fair (to providers) & useful (to consumers), SEPR's may need to redefine the phrase 'similar results...' to mean eg. 'other special offers for Product X differing in URL ONLY...'

Anyway, just some thoughts. Conversational Google anyone?


Thread source:: http://www.webmasterworld.com/google/3336435.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com