Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Percent of All Sites by Pagerank

         

lexipixel

3:38 am on Feb 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am wondering if there are any credible and comprehensive source for stats of percentage of sites by PR?

What I am looking for is information that could be used in marketing material, e.g. "PageRank is Google's measure of.... Our site is a Google PR6 and only ___% of all sites on the web rank that well".

tedster

10:10 pm on Feb 6, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



PageRank is not assigned to "a site" but only to "a page", so please don't help your clients get more confused than the average client already is.

PageRank is distributed over roughly a base-6 logarithmic scale, so there are approximately 6 times as many pages with a PR 9 as there are with a PR 10. Assume there are a few score pages with PR 10 at most. Then keep on going down from PR9 to 8 to 7--- multiplying by a factor of 6 for each step down to PR 0. This math would give you a rough idea (and it is rough) of how many pages are at each level of PR. Then you could calculate the percent - but there are many assumptions based in what I just said that may not be real, especially with PR demotions (and promotions?) being part of the picture today.

You might say "PageRank is Google's system for deciding how important any page is, based on how easy it is to find by following a chain of links through other pages." But I doubt you can come up with a non-confusing statement that goes any further than that.

And even then, what if the toolbar gives you a gray bar some day?

lexipixel

4:18 am on Feb 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Tedster. I understand PR is for a given page, (not "site" and should have worded it that way).

So if a "few score" pages are at PR10, (lets call it "50" for the sake of simplifying the math). And using 6 as the exponential factor, that would put 15,625,000,000 pages at PR9. Is this correct?

How many angels can dance on the head of a pin?

jomaxx

5:19 am on Feb 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



15,625,000,000 pages at PR9? I'm not sure where your error is, but that sure ain't correct.

Anyway I think tedster misstated it slightly. Correct me if I'm wrong, but my understanding is that the scale is logarithmic because PR10 represents 6 times as much total PageRank value as PR9 represents, and so on. That's not the same as saying there are 6 times as many pages with that score.

tedster

5:58 am on Feb 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Clearly 15.6 billion urls at PR9 just can't be it - that's more like some estimates for all the spiderable urls the Google index. My original idea was something like this: PR10 - 50 urls, PR 9 - 300 urls, PR 8 - 1,800 urls... on down to PR 0 - 3,023,308,800 urls

But those numbers would mean fewer than 4 billion total urls -- that's way too low. So even though the relative proportions for each level show something like the picture I wanted to give, it's clearly still not right over all. Part of that error is because the log-like scale stretches (or compresses) distribution throughout any given range of PR -- there are many times more urls with a low PR 6 urls than a high PR 6, and so on. And a big part of the error is because this all still includes a lot of guesswork, most notably the base of the log scale involved.

I was trying to give a general feeling for the way PR distribution seems to play out from level to level. That is, there are several times more urls at PR 6 (or any given number) than the total number of urls above that level. Because of the PR formula, it will always be that way, no matter how many total urls are in the index.

There's been so much conjecture around PageRank and log scales that it just plain boggles the mind. Anything more I write here might add to that pile of mystification, so I'm reluctant to go any further.

To address your opening question, I don't think there's any such study. It would amount to scraping Google's toolbar server and they're not likely to let that happen.

tedster

6:01 am on Feb 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's not the same as saying there are 6 times as many pages with that score.

You're probably right. I've been wrestling with the math as it interacts with the PR equation, and my brain just won't cope with it at the moment.

CainIV

7:52 am on Feb 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The last research I did, about 3 years ago when I cared about Toolbar pagerank put the factor at about 12.3, which would put an estimate on pagerank 9 pages on the Internet closer to 615 at that time, but the number would likely have fallen.

lexipixel

7:09 pm on Feb 7, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'll start by saying, at this point, this thread should be probably be considered "just for fun"... but the math is interesting anyway.

I used "50" as the total number of PR10 urls. Then calculated (50)^6, e.g.-

(50)^2 = 2500
(50)^3 = 6,250,000
.
.
(50)^6 = 15,625,000,000

If Tedster meant "six times", (multiplying but not exponentially), at each PR level, it would indicate a number that is way too low, (3,627,970,550) for total URLs which PR is distributed across.

PR10 = 50
PR9 = 300
PR8 = 1,800
PR7 = 10,800
PR6 = 64,800
PR5 = 388,800
PR4 = 2,332,800
PR3 = 13,996,800
PR2 = 83,980,800
PR1 = 503,884,800
PR0 = 3,023,308,800

A search on Google for pages containing the word "a" = 13,810,000,000, so we know the "times six" method is too low.

Does anyone actually know how many pages (unique URLs) are in Google's index? ...or were yesterday?

Again -- this is now "just for fun"...