Forum Moderators: open

Message Too Old, No Replies

Google pr

Linkbacks

         

sajjid

10:13 pm on Sep 2, 2003 (gmt 0)

10+ Year Member



i have a site which has pr rank of 5 homepage and 4 rest when i check for link backs with google toolbar i only see one site that links to my site which has pr4 my question is where is my pr coming from?
i thought it was links that give you high pr.

vitaplease

4:19 pm on Sep 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check Alltheweb for backlinks. Many low PR links can make a good PR.

Also - do not trust Google's toolbar Pagerank and Google's backlink indication too much.

At this moment, Google does not seem to want to excel in exactness in this matter.

doc_z

4:39 pm on Sep 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Indeed, links give you high PR.

There are numerous reason for this behaviour:

- Google doesn't show all backlinks. Perhaps there are also many links with a PR lower than PR4 which are not shown.

- The PR shown in the tollbar isn't correct.

- There is just one incoming link from a (high) PR4 page (with few outgoing links). Even in this case your homepage can have a PR5 due to the feedback of your internal pages.

For example, if the link page has a real PR of PR(A) and N(A) outgoing links on it, the homepage has PR(B)=d*PR(A)/N(A)+(1-d) if it is a dead end. This is normally smaller than PR(A). However, if there is a second page on that site linking back to the homepage, then the PR of the homepage is increased: PR(B)=PR(A)/N(A)*d/(1-d^2)+1 and could be higher than the link page.

ulstrup

10:09 pm on Sep 3, 2003 (gmt 0)

10+ Year Member



Vitaplease:

Many low PR links can make a good PR

Do you have any references/proof of that?

I'm asking in a friendly way! Please let me read more. To my knowledge PR is "collected" from PR4+ pages, but one way of understanding your statement is that pages with lesser PR are part of the overall algo, at least thats what I believe.

James_Dale

11:01 pm on Sep 3, 2003 (gmt 0)

10+ Year Member



Any page is capable of awarding PR to a target page. Well, that is, any page that hasn't been penalised - since all web pages contain a certain intrinsic level of PR to begin with.

0.85*PR/links on page = PR awarded to each link.

sajjid

2:34 am on Sep 4, 2003 (gmt 0)

10+ Year Member



Thanks for all your replies yes its true i do have more link backs if i check on alltheweb site it comes back with 51 sites linking to me with google just one.
out of the 51.

vitaplease

7:37 am on Sep 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ulstrup

..one way of understanding your statement is that pages with lesser PR are part of the overall algo..

Do you have any references/proof of that?

MSgraph collected some nice references: [webmasterworld.com...]

Without the formulas:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. source [google.com]

Democratic in the sense that every vote (link) counts. Imagine some countries with languages to which nearly no other established languages would link. The majority of sites/pages would be below PR4, if no Pagerank would be passed between these pages, 98% of the pages could be Pagerank 0.

Hagstrom

8:57 am on Sep 4, 2003 (gmt 0)

10+ Year Member



To my knowledge PR is "collected" from PR4+ pages

The question is whether Google assigns PR "top-down" or "bottom-up".

Top-down would mean that Google had hardcoded PR10 (or PR11) for Google.com, yahoo.com and so on. Pages close to Google.com and Yahoo.com would get a high PR. I believe GoogleGuy once stated that Google's own PR10 was "natural" - ruling out this model.

Bottom-up means that every page is initially assigned a PR of 1 (real PR, not logarithm). Now, if all pages start with PR1 - and if pages below PR4 don't pass PR - then all pages would remain PR1.

doc_z

10:50 am on Sep 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hagstrom,

in the thread in which GoogleGuy answered several questions, he said that Google's PR isn't artifical (i.e. no "top-down" model).

And you are correct, a model where only pages with PR>=PR4 pass PR, isn't a valid model (for serveral resaons).
Of course it doesn't matter, but this would result in a real PR of (1-d) for all pages even if the first guessing is 1.

claus

11:23 am on Sep 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> hardcoded PR, bottom-up, top-down

That's not the way it works. It's a rather complicated calculation because, if you were to do it on a limited set in an excel spreadsheet it would involve so-called "circular references".

You have this basically quite simple formula, but then you let the result for any one page depend on the result for any one other page that links to it. And the results for these linking pages are again depending on the results for the pages that link to them. So, you end up having this three billion long chain of pages in which each individual page's PR is potentially calculated on the basis of the PR of the other 2.999.999.999 pages (which is, again, calculated...)

When you work with calculations of this type, what you commonly do is to "seed" the equation some number to start with. That could be done by assigning any random page any random PR value to start with. Then you run the chain of equations, collect the results, change the seed a bit(*) and run it again - these repetitions are called "iterations".

When you run a series of iterations, you will find that the end result differs a lot in the start and then differs less and less for each iteration, as the "seed" value approaches the "real" value. Then, at some point you see that the variation is no larger than the amount of fluctuation that you are willing to accept. This "acceptance point" you will of course have decided on in advance, it could be, eg. 0.2% deviation or something.

When this point is met, you stop the calculation. The important part is that (quote from above):

the "seed" value approaches the "real" value

That is: If you start out by assigning www.google.com a PR of 2 as a seed value, then (if the "real" PR is around 10) you will end up having a PR value of around 10 in stead after a certain amount of iterations. Of course you will want to keep the processing time (and the number of iterations) as small as possible, so you will probably "guesstimate" another figure than 2 to start with.

/claus


(*) Technical note: The seed will only need to be "sown" one time, as when you enter it the first time, the result of the chain of computations will give it another value than the one you gave it initially.

doc_z

12:11 pm on Sep 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



claus,

of course, the calculation (for the original PR model) doesn't depend on the intinial guess for PR. However, you can change the model be introducing additional artifical PR sources which guarantee a high PR for Google.com (independent from the link structure). I think that was what Hagstrom was talking about.(*)

By the way, in priciple you can also calculate the exact PR for each page without any initial guess by inverting the transition matrix (i.e. the final exact values are calculated in one step). Unfortunately, this doesn't work in practice for a 3 billion * 3 billion matrix. Therefore, one uses iteration schemes (as for example the Jacobi iteration which was mentioned in the original papers).

______________________________________________________________________________
(*) To give an example: You can changes the PR equations in the following way:

PR(A) = d (PR(T1)/N(T1) + ... + PR(Tn)/N(Tn)) (for all pages!= google.com)

PR(Goolge.com) = N_pages * (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where N_pages = number of pages in the index. In the random surfer model this would correspond to changing the target page for a teleportation from an arbitrary (randomly chosen) page to Google.com.

As already mentioned, GoogleGuy said that Google's PR isn't artifical.

vitaplease

12:57 pm on Sep 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maybe this very much over-simplified (and not totally correct) presentation makes things easier:

Every page starts with a "real" Pagerank of 1 (not the toolbar PR1)

Google also decided that the average of all webpages has to be "real" Pagerank 1.

If the WWW exists of three pages not linking to each other, they all have real Pagerank 1 and the average is 1.

However in addition your initial Pagerank 1, you can donate 85% of this real Pagerank 1, (0,85) to another page, by linking to it.

If you link to more pages you divide that 0,85 Pagerank donation by the number of pages you link to.

In reality, webpages are creating (increasing) overall WWW Pagerank by all the donations. So Google recalculates the webpages individual Pagerank by making sure that the average is: "real Pagerank 1" again.

You can imagine that all the interlinking and backlinking make these calculations complex and time consuming. In reality it means (re)calculating these cycles of links over all the webpages several times (iterations).

So in the end, on the whole World Wide Web, some pages for example have a real Pagerank of 0.00002 and others of 112,368.87 - just as long as the average is 1.

There might be five million webpages linking to the acrobat reader download page: link:http://www.adobe.com/products/acrobat/readstep2.html [alltheweb.com]

All these millions of fractional or substantial Pagerank donations towards the Acrobat page could add up to a real Pagerank of 112,368.87 (fictional).

On the other hand, someone else's hobby page might get only one link from anotherone's hobby page and end up with a real Pagerank of only 0.00002.

Google scales all these "real Pageranks" into chunks of presented toolbar Pagerank (from 0 to 10).

So every webpage indexed by Google has real Pagerank and donates real Pagerank through linking. Google just happened to decide to only show certain, mostly more important backlinks (often those with toolbar Pagerank 4 or higher). Alltheweb on the other hand shows all the backlinks it can find.

ulstrup

1:42 pm on Sep 4, 2003 (gmt 0)

10+ Year Member



Thank you vitaplease, for both the explanations and the reminder of MSgraph's excellent summary page.

doc_z

1:43 pm on Sep 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



vitaplease,

just some marginal corrections:

- The average real PR is only one if there are no dead ends. In practice the average is smaller than 1.

- The lower bound for the PR (assuming that the original model is still valid) is (1-d) which correspond to the self-contribution of each page (or in the random surfer language: the probability to be teleportated to that page). Therefore, a real PR of 0.00002 isn't possible (for a realistic value of the damping factor).

Also, some additions:

- The final PR is independent from the inital guess. In practice one would use the PR of the last calculation as an inital guess to speed up the calculation.

- Google increased the damping factor compared to the originally mentioned value of d=0.85

claus

8:58 pm on Sep 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> change the model be introducing additional artifical PR sources

ohyes, and "the algo" doesn't even need to be one specific algorithm, rather it can be a whole set of those and not all pages need to be included in (all of) them. And then there's the manual editions as well. I think it's a fair assumption by now to state that what goes on inside the google corporation is related, but by no means equal to the original published papers.

Btw. this comment from vitaplease holds great value, imho:

>> Google scales all these "real Pageranks" into chunks of presented toolbar Pagerank (from 0 to 10).

/claus

doc_z

9:11 pm on Sep 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think it's a fair assumption by now to state that what goes on inside the google corporation is related, but by no means equal to the original published papers.

That coincide with my observations. Google has not only changed such things as the damping factor but also made some significant changes. For example, the creation of PR due to the creation of additional pages doesn't work as one would expect according to the original algorithm.

buckworks

9:19 pm on Sep 4, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



English translation of the above:

Links are good. Get more links.