Forum Moderators: open
The possible evolution of Google and/or its technology in Topic Sensitive PageRank [www2002.org]
Similar to the Hilltop algorithm [webmasterworld.com] we have discussed earlier but takes it further. This approach wants to make Google more relevant by dividing the pagerank into relevant topics to the search query. This is an effort to keep high PR sites from showing up on irrelevant searches.
Our approach to biasing the PageRank computation is novel in its use of a small number of representative basis topics, taken from the Open Directory
That's novel?
For instance, the user's bookmarks and browsing history could be used in selecting the appropriate topic-sensitive rank vectors.
or the Google toolbar? :)
At the end of the paper, the writers are aware of the "adversarial editors" factor in the ODP, but the algo still seems way to reliant on this one directory. This is an incomplete data pool on which to base an algorithm, IMO. Yet I can see that this may be the best option out there.
All this algo does is use one more step on top of the current PageRank system by adding one more inbound link to determine the topic (from ODP). After the topics are assigned, then PR is calculated as usual to my understanding.
Even one of Google's own software engineers wrote a paper on it. The Use of BiGrams to Enhance Text Categorization [serve.com] (1.1 mb pdf)
[edited by: msgraph at 1:15 pm (utc) on May 22, 2003]
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search [dbpubs.stanford.edu]
A quick "spot the differences" seems to be the addition of chapter 6.
Basically new offline and query-tim processing techniques are discussed such as the Quadratic Extrapolation and other recent speeding algos.
Anyone else found something interesting?