Forum Moderators: open

Message Too Old, No Replies

Applying PR algorithm in real world

         

proteus_02

5:46 pm on Jan 29, 2004 (gmt 0)

10+ Year Member



Hello,

I was wondering (yes, I know I'm not the first one :) how would you go about implementing the Google PageRank algorithm in real world.

I mean, the algorithm looks fairly easy, but all the documents I've seen so far only show some theoretical cases.

How about a real page? How would I go about computing the page's PR without knowing the PR of the other pages? Whould I go to Google, take the (let's say) first 100 pages that link back to that page, determine how they cross-link and then apply the algorithm? Doesn't sound like this could work... What am I missing?

Any help is greatly appreciated!

Best regards,
Sebastian

[edited by: DaveAtIFG at 6:37 pm (utc) on Jan. 29, 2004]
[edit reason] Removed specifics [/edit]

proteus_02

8:58 pm on Jan 29, 2004 (gmt 0)

10+ Year Member


To add something that was edited out due to a conflict with the Terms & Conditions:

I've seen on the Internet a number of "PageRank calculators" which seem to actually do the calculations on their own (as opposed to "emulating" a Google Toolbar).

Does anyone have any ideea on how they work?

Best regards,
Sebastian

doc_z

10:19 pm on Feb 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi proteus_02 and welcome!

Calculating PR works as follow:

- spider pages
- determine linking structure
- choose a initial guess for the PR of each page, e.g. PR=1 for all pages
- calculate the PR iterativ, i.e. calculate the PR of all pages for the i+1 iteration from the values of the last iteration
- stop if the values are stable

I can't answer your second question, because I don't know what you are refering to.

proteus_02

9:41 am on Feb 4, 2004 (gmt 0)

10+ Year Member



Hello doc_z,

Sorry for the delay in replying.

I understood the basic theory (fetching all linked pages, determining the links etc.). But I was wondering how one would do this for a page which has a lot of incoming links.

Take for example the homepage of this site: [webmasterworld.com...] A search on Google reports that there are 10,800 pages linking to it. Obviously, spidering all of those pages is not feasable for a "normal" script running on a "normal" server.

What practical solutions are there for cases like this? Would it be enough to spider only the first (let's say) 50 or 100 pages?

Best regards,
Sebastian

igneus

5:38 pm on Feb 4, 2004 (gmt 0)

10+ Year Member



the script on my site just queries google for the pagerank as the toolbar would

doc_z

11:32 pm on Feb 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



proteus_02,

when you try to calculate PR you have to spider all pages of the internet. You can't calculate PR locally. (Of course, in practice there are some restrictions. In practice this means for Google spidering 3 billion pages which can take a lot of time.) After this is done, you start with an arbitrary initial PR guess for each page (e.g. PR=1 for all 3 billion pages). Then you calculate the PR of each page from the PR from the last iteration according to the well-known PR formula. You have to repeat this step until this PR value for the pages are stable (this can take about 100 iterations.)