Forum Moderators: open
Topic-Sensitive PageRank [dbpubs.stanford.edu]
I would like to thank Glen Jeh and Professor Jennifer Widom for several useful discussions.
I also suggest we go back to this....
[webmasterworld.com...]
thanks for sharing this.
With the complaints about the ODP are you concerned to see the ODP used in this way? Granted it was just a choice of data for the research, but it is highly likely that any major moden search engine that wanted to implement a method similar to this would use the ODP in this way. It is a natural choice, and doesn't seem to have much competition in this kind of decision. Does this concern you, or do you think that the ODP is suitable for this job? Do you see an alternative?
Although the ODP is bult using thousands of volunteers, which reduces the effect of any small number of malicious or rogue editors, the choice of the top level topics is not actively maintained by a large group. In fact, it was chosen once and does not seem to change. Do you think this artificial sectioning of the web significantly affects the topic-sensitive pagerank calculation?
If this type of calculation begins to be used by a major search engine, what changes do you anticipate making in your page and site optimixations? For instance, do you anticipate being able to affect the rankings of other pages by changes you make on your sites listed in the ODP? Will you attempt to rank somewhat highly in each of the topic areas, or very highly in one or two?
I'm genuinely interested in the reactions to seeing the ODP used in this way, and I hope that the way that I've asked some of these questions doesn't make my personal opinions too obvious, or guide your reactions.
Every possible sectioning of the web will necessarily be artificial. The only thing you can do to reduce the potientally negative effects of this artificiality is to chose a reference source with the largest possible selection of sites. The more sites you take into account, the smaller the distortion caused by any misplaced listings will become. Under that aspect, the ODP is the best choice not only for research purposes, independently of what you or me may be thinking about its quality in detail.
what changes do you anticipate making in your page and site optimisations?
None. As Slud points out, the only important thing will be to get your site listed in the correct branch of the ODP, or to get links from sites who are listed there. The calculation of the topic specific PageRank values is completely independent of the actual contents of your site. This is really no different to the current situation. If you have eg. a gaming site, you probably should not worry about (or even optimize for) ranking high under health related keywords.
P.S. Couldn't find stochastic transitin matrix in the webmaster world glossary.
Greektomi
[almaden.ibm.com...]
Ok I downloaded the pdf and I'm going through it, but I'm not quite sure I can decipher this puppy i.e. what the heck is a stochastic transition matrix?
Can someone please tell me what 'IR' stands for?
I also will wait for a smarter more savvy WMW guy to translate this all for me or else my whole Wednesday will disappear in a puff of smoke!
A stochastic matrix is the transition matrix for a finite Markov chain, also called a Markov matrix. Elements of the matrix must be real numbers in the closed interval [0, 1].
[google.com...]
Take your pick (I just looked at the first hit) and try to ignore Mr. Markov.
In this context, it's simply the mathematical representation of the linking relations between all crawled pages (a links to b, yes=1 or no=0).
IR
From the introduction on the first page of that paper, I'd assume they mean "Importance Ranking". It's not really explicit though, maybe one of the referenced papers introduces the acronym more formally.
<added>They seem to use this for on-page factors, as opposed to PageRank, which is determined by off-page factors</added>
Another way is to do an initial SERP based on your search terms, but then every time you click on a link to see a page, the keywords from that description of the link are added via a loose vector to your search terms. When you return to the SERP page after checking out the link, the listing has already rearranged itself to take advantage of this new information and rerank the results based on this new specificity.
In fact, the theme-oriented approach is so rich with ranking possibilities that it makes PageRank itself look like a straightjacket.
It's no secret that Google is deficient in theme-recognition, and I'd be surprised if Open Directory hierarchies are the only tool being considered.
Oops, just noticed that we're talking about real numbers. So the value of the field a,b in the matrix is 1 only if a links to b, and page a contains no other links. If page a links to two pages b and c, then both the fields a,b and a,c have a value of 1/2.
Imagine that matrix as a huge table, which contains the information about how much of the PageRank of each page will get transferred to each other page (modulo some damping factor).
[mathforum.org...]
Hey, Calc class was a *long* time ago for me...
In particular:
The upside down U (unions and intersections) [mathforum.org...]
The upside down A, backward E and forward E [mathforum.org...]
Matrix: [mathforum.org...]
Matrix2: [mathforum.org...]
Matrix3: [mathforum.org...]
The conditional probability of an event B in relationship to an event A is the probability that event B occurs given that event A has already occurred. The notation for conditional probability is:
P(B¦A)
[mathgoodies.com...]
It seems the taste of the day is "offline rank computation".
Interesting is that the bias factor "alpha" for the topic sensitive pagerank can be very low (0.05 to 0.25) to give satisfactory results. It looks like links from directories just listing companies in non-topic groupings will be worthless in future.
Chapter 6 on Ongoing work, states that they would like to refine to lower levels of ODP for better topic sensitivity. All very fine for the "English" language. But how will they do this for other languages which have limited representation within ODP?
Example: A scientific article in french citing an article in German and English will need clever trans-lingual topic translation clustering. One option would be to use a technolgy a word translation company is doing;
[euroglotonline.nl...]
Every meaning of a word (also every form of a verbs morphology) has a unique id-number which relates to its counterparts in all other languages, which means you can translate "I search" from English to German to Dutch and back to English and you should still revert back to "I search" beacause they all have the same Id-number (you need the purchased version to check that easily).
(i have no affiliation to Euroglot)
Evaluating Strategies for Similarity Search on the Web [dbpubs.stanford.edu]
It's a new version of what was published back in Feb. 2001.
Similarity Search on the Web: Evaluation and Scalability Considerations [dbpubs.stanford.edu]