---- "Phrase Based Indexing and Retrieval" - part of the Google picture?
thegypsy - 6:49 am on Feb 11, 2007 (gmt 0)
Whipping post – strangely it is NOT and addition to any LSA/I technologies. I dare say we may have missed the boat and it never left AdSense/AdWords. This is a standalone method that is far more comprehensive than LSI …. Jim seems to be getting the idea….
Tho following the trail of LSA/I last year did bring me to this.
To keep things moving along a snippet from one of the articles – it delves into term extensions. Connecting words that create phrasings and the PaIR basic model for identification
Phrase Extensions and identification Phrase extensions are merely additional words on the core term(s). If we had the core term ‘Baseball Cards’ we could ‘extend’ it with ‘Vintage Baseball Cards’, ‘Buy Vintage Baseball Cards’ and finally ‘Buy Vintage Baseball Cards Online’ – you get the idea. To identify a potential phrase the algo looks at a phrase such as "Hillary Rodham Clinton Bill on the Senate Floor", from which it would take; "Hillary Rodham Clinton Bill on," "Hillary Rodham Clinton Bill," and "Hillary Rodham Clinton". Only the last one is kept. It would also identify "Bill on the Senate Floor" and the inferences down to ‘bill’.
And scoring ranking
In the end it is these related phrase/theme scores that are used in the ranking of documents based on a given search query. The more related phrases and secondary related phrases found in the document for the query phrases would be ranked highest. The semantically topical, relevant page gets the highest ranking.
How about backlinks?
Anchor phrase scoring is also counted in the related query phrase in the text links to other documents. There are 2 scores here being the ‘body’ score and ‘anchor’ score. Greater scoring is obviously given if a good phrase is in the text link as well as on the body of the referenced document. Additionally the anchor text TO your site is also analyzed and scored accordingly under the same methods.
Once again, the PaIR model is FAR more comprehensive in it’s abilities than the LSI model. Was LSA/I used in the Organic SERPs since 2003 (when G purchased Applied Semantics)? Maybe. If this is part of the ‘new’ world, it is one hell of an upgrade…. …and deeper we go….