Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: phranque
This looks really good. Here's the HTML in the cache, but it's a PDF
Higher Precision for Two Word Queries
Constructing and Examining Personalized Cooccurrence-based Thesauri on Web Pages
That last one looks like hitting paydirt.
When you run a search on billions of docs, keywords will take precedence. Considering the way documents are SEO'd today, lsi will not be of much use.
Searches on library books,articles/news,news groups are some what different though. Since the authors did not alter the keyword frequency and introduce semantically related words on purpose.
If there are 1000 documents SEO'd on the word printer, how can google decide to put a document related to ink or hp in it's SERP's when the other 1000 documents are more important to the person who is doing the search.
Can some one explain this to me?
ink or hp
According to Google, "inkjet" is semantically linked to "Epson" (12,500,000 pages).
Interesting, I didn't realise that semantics could extend to brandnames. Anyone else seen similar in google?
Also "discount" which doesn't really surprise me, but not a good example of effective sematic indexing.
In other words, how many times does a pair of words have to occur together on the web for google to deem the pair "related". From merely anctedotal observations it appears that level is relativly low and is dependant on the number of times the pair occurs in relation to the number of times the seperate words occur without eachother.
The result of a concrete knowledge of how google determines weather or not two words are related is the ability to manipulate "related word" recognization, adding a whole new level to google SEO.