Forum Moderators: open
Your very interesting calculations in your recent post seem to back my estimation that there is no division by the number outgoing links in the Google algo.
If you have PR 8 site with 20 links on it you can change those links to other sites which would get PR7
(if that's true that the division doesn't happen, but it's a major factor in the original PR algorithm!)
The formula can be justified with the Random Surfer Model. The walk length of the Random Surfer is an exponential distribution with a mean of d/(1-d) [levien.com] (chapter 6.1.3). Hence, when the random surfer follows a link to a website without external outbound links, he visits on average d/(1-d) pages on that site. So, you simply have to multiply this value with the probabilty that the random surfer follows the link from page A to page B (PagRankA/#linksA) to get the PR benefit of the site.
The formula also works the other way around, because it also determines the PR loss of a site with exactly one external outbound link that is not reciprocated.
BTW, there has to ba a division between outbound links. Otherwise the algo would simply calculate link pop which can easily be attacked by spammers.
Yes, I believe that. I don't for a minute suggest that Google hasn't changed since the Backrub days, but I do believe that PageRank works mostly as it did then.
> But why should bother about the details?
As a physicist I expect that you had the same initial lectures I had, about scientific method and isolating variables. Understanding PageRank decay in a closed system can help you to analyse the effect of other parameters in Google's ranking.
For me though, discoveries are their own rewards.
> But there is no way to verify the so often cited PR formula.
It is difficult to prove that something is impossible:).
gmoney, I didn't completely follow the 1-d part, but pages B and C can have more PR than page A. Feedback loops do affect PageRank, but the logarithmic scale on the Toolbar makes the loops considerably less important than raw PageRank calculations imply.
I agree with Dino's and Markus' reasoning for why Google needs to divide PR by the links. My measurements back this up also.
“But there is no way to verify the so often cited PR formula” – Fischerlaender msg#31
You can verify that the PageRank formula is a representation of the random surfer model. This is why I think the PageRank formula will be around for a while.
However, it is difficult to verify how/if Google actually uses the original PageRank formula to calculate toolbar values. I think most people agree that toolbar PR values are not PageRank values but we are trying to see how/if toolbar PR values are calculated from PageRank values.
“Your very interesting calculations in your recent post seem to back my estimation that there is no division by the number outgoing links in the Google algo.” – Fischerlaender msg#31
I am having trouble seeing the connection between my calculation and your estimation since my calculation is based on dividing by the number of outgoing links (/#links).
----------------------------
“ . . but you rather have to talk about sites here, not about pages” – Markus msg#35
“However, a premise for the formula is that page B's site has no external outbound links. Otherwise some of the PR benefit is distributed to other sites.
” – Markus msg#35
I tried to focus on the amount PageRank transferred through the original link instead of what a particular site or page eventually received. Even though some of the PageRank may be distributed to other sites, the formula would still be valid for the amount of PageRank that flowed through the original link.
Thanks for taking the time to review my calculations and thanks for giving me another paper to review. I’m sure we will be discussing more about PageRank in the future.
-------------------------------
“Feedback loops do affect PageRank, but the logarithmic scale on the Toolbar makes the loops considerably less important than raw PageRank calculations imply. ” – ciml msg#36
I agree, but I think it is important to consider feedback loops in efforts to try and decipher the PR log scale.