|Google's Page Rank Base|
The real google's logaritmic page rank base.
This is my first post here. I want to share the research I made to find google's page rank value.
To start I'll eplain the method I used.
A - page rank recived per backlink
P - pagerank in logarithnic form
N - number of backlinks
B - goggle's logaritmic page rank base
We know that to find a page rank google uses a logarithmic formula of this kind:
P = log[base B](A x N)
P = ln(A x N) / ln(B)
P x ln(B) = ln(A x N)
B^P = A x N
A = (B^P) / N
Assuming a is constant (wich for a very large number of samples is valid) I reached the following equation, that could be easilly reach intuitively:
(B^P) / N1 = (B^P2) / N2
B = (N1 / N2)^( 1 / (P1 - P2))
Now I needed a few samples and this was what I got afted doing some averages:
Logarithmic Page Rank / Number of Backlinks
4 / 5,33
5 / 31,60
6 / 215,12
7 / 2.835,00
8 / 25.000,00
9 / 147.600,00
Crossing the values in the formula I reached:
B[9;4] = 7,7
B[9;5] = 8,2
B[9;6] = 8,8
B[9;7] = 7,2
B[9;8] = 5,9
B[8;4] = 8,3
B[8;5] = 9,2
B[8;6] = 10,8
B[8;7] = 8,8
B[7;4] = 8,1
B[7;5] = 9,4
B[7;6] = 13,2
B[6;4] = 6,4
B[6;5] = 6,8
B[5;4] = 5,9
The average value I got was:
B = 8,3
Wich means that to get page rank 7 you need about eight page rank 6 pages linking to you only.
I used 30 webpages for the samples wich is a very low number to make this value completly trustable, but from my experience base 8 is more accurate than the 10 comonly used.
If anyone is able to reach a value supported by more samples please comment
[edited by: vitaplease at 7:09 pm (utc) on April 27, 2004]
[edit reason] no email specifics please [/edit]
Whoaaaa! Nice job. (I think.) Welcome to WW!
You need to outline exactly what you would like as a real world proof.
|Wich means that to get page rank 7 you need about eight page rank 6 pages linking to you only. |
That makes sense, so it is, no doubt, not right. ;-)
Basically what I did was find a few PR 4 pages and find the average number of backlinks on all of them then do the same for ranks 5, 6, 7, 8, 9.
In total I used 30 pages for this which doesnt make the results completely credible, but before doing it I already had the feeling the base was 8.
What I meant by "reach a value supported by more samples" is do what I did but instead of using only 30 pages to get the average number of backlinks per PR level, use 100 or more.
This is one of those things a guy thinks at 4am when he cant sleep ... well one of the things =).
"Wich means that to get page rank 7 you need about eight page rank 6 pages linking to you only."
Um, no. Eight PR6 links will likely get you a PR6, not PR7.
Are you realy Daniel Jackson from the StarGate Program?.
"Eight PR6 links will likely get you a PR6, not PR7."
are you considering the 'linking to you only' part?
>>"Eight PR6 links will likely get you a PR6, not PR7."
>are you considering the 'linking to you only' part?
I know of case where many more pages than you mentioned, all PR6, linked to just one same page, and that receiving page was still PR6. (It had other links too.)
What about 40-50 instead of 8? ;)
"are you considering the 'linking to you only' part"
Sure, even if it is not a real world likelihood. If they were all PR6.999999999999999 then I suppose maybe. (Which also points out the futility of this general idea, as one PR6 page might be PR6.9 while another might be PR6.01... those are not the same thing; "PR6" is not very descriptive.)
A single link from a PR6 page is about as likely to make a PR5 as a PR6. The combined link power of eight PR6's is nowhere close to strong enough to make a PR7.
I donít have any samples of eight PR6 making a PR7, but I know of some PR6 making another PR6 and a PR8 making at least six PR7, so the proportions arenít that wrong.
Of course a PR 6,9 is very different from a 6,0 but we cant have simple concepts without generalising a bit.
1 is very different from 2 they have infinity between.
The idea was to find the google logarithmic base if there is one, that is, how many N-1 sites you need to make an N site.
If in practice it's 30-40, in theory that is impossible because if we assume a PR5 needs 4 backlinks than accordingly a PR8 would need 4 million links of the same type, which from the data I dispose is very far from reality.
Of course there is the possibility that google doesnít work in such a linear way, but the 8 base I got wasnít made up, I got it from a theoretical supposition that the practical values (average backlinks per rank) somewhat confirmed. The results diverged less than what I was expecting.
Right I'm just creating a family of sites so I guess time will tell if this results are of any use or just pretty on paper.
To to be exactly correct I must add.
To get a PR7 site you need 1,00001 PR 6,99999 links or 63,99999 PR 6,000001 links.
Being the average 8 links for a PR 6,5.
And with this I conclude that knowing google's log PR base is quite useless in practice.
Time to move on.
Nice post Perdro.
Must have taken you ages to get this far.
I agree to a certian extent with what Pedro says. I also think that the PR algo is a little more complex than that (not that it doesn't look complicated). I don't think that PR can be purley based on one number.
I haven't done any research to the extent you have and as a result, know that you know better than me.
We know that the number of outbound links on each page makes a differencem, and also that an integer PR (as we see) isn't as detailed as it gets.
|Sure, even if it is not a real world likelihood. If they were all PR6.999999999999999 then I suppose maybe. (Which also points out the futility of this general idea, as one PR6 page might be PR6.9 while another might be PR6.01... those are not the same thing; "PR6" is not very descriptive.) |
"the proportions arenít that wrong."
You are comparing unrelated things. One PR8 could easily make eight PR7's, but those eight PR7's linking to another page would certainly not have the ability of regenerating the equivalent of the orginal PR8.
Thank you. Actually I did it yesterday night :P.
Just wanted to correct myself:
"To get a PR7 site you need 1,00001 PR 6,99999 links or 63,99999 PR 6,000001 links.
Being the average 8 links for a PR 6,5."
Actually this statemet isnt correct actually the number of links is between 8 and 1.
Pedro2, thank you for all your work (and lost sleep).
My higher PR sites are hundreds of PR(? - no time right now to determine), but I just did a quick check on a few smaller sites and found one of my obscure PR5 sites that has 3 PR5 links (and including G's omitted results, 6 more internal PR5s - total of 9 PR5s).
When I have time, I'll dig into the higher PR's and see if I can help validate your calculations.
I think many of us would like to at least have a rule of thumb (dependent on outbounds and other factors), and I think you may have gone a long way to assisting us.
"You are comparing unrelated things. One PR8 could easily make eight PR7's, but those eight PR7's linking to another page would certainly not have the ability of regenerating the equivalent of the orginal PR8."
Then there is a waste factor, which means the PR you recive isnt equal to the PR of that page divided by the number of links in it.
Is there a place where I learn more about this factor?
do a sitesearch on damping factor or do a google search for it. I read before several people mentioning that the damping factor is around 0.8 which will influence all your assumptations.
As well you can take a look at the blueprints at stanford university which has quite a lot of information about the original pagerank. As it was invented (is that the right word?).
Well anyway the assumptations are that the original is still in use however that the values are changed and that there are added unknown variables added. (say they added some googlejuice).
Anyway good luck and hope to see follow ups on your sleepless nights.
>If in practice it's 30-40, in theory that is impossible because if we assume a PR5 needs 4 backlinks than accordingly a PR8 would need 4 million links of the same type, which from the data I dispose is very far from reality.
Pedro2, you have done a great job. About a year ago I too had posted here that the ratio is about 8. However,
1. I am quite familiar with the site I gave as an example and it had many more single link only PR6 pages linking to to a particluar page and that page was still PR6 only. From that example I would say that this number is at least 18-20 in case of PR6 to PR7 transition.
2. To get a PR8 rating it is true that if it is linked only from very low PR pages, it will need lots and lots of links. But you are ignoring two things. Most sites that have PR8, have a few dozen PR7, a few hundred PR6 and so on internal links and all link back to the home page. Moreover, the higher the PR of a link, higher the chances that it is linked from high PR external sources too.
Hi Pedro2 and welcome!
I think it's an interesting task to find out the logarithmic base currently used by Google. (Although, most of the people here won't agree with this statement.) However, I'm seeing at least three problems in your analysis. One point is that not all PR is transferred to other pages. As already mentioned, there is a damping factor d, i.e. only a fraction of the PR is pasted. In the original papers a value of d=0.85 was used, but Google changed this value. The second point is that Google doesn't show all backlinks but only a part. The third point is your assumption that the number of backlinks is related to the (logarithic) ToolbarPR by a logarithmic law with the same log. base as the relation between real PR and ToolbarPR.
You might have a look at the original papers or some review papers (for example PageRank Uncovered). I think you'll find a lot of suggestions for your studies.
Finally, I would say that IITian's value is close to the real one.