Forum Moderators: open
10 - Three
9 - About seventy (see this overview of PR9 & PR10 sites) [webmasterworld.com]
8 - about 4.200
7 - about 180.000
6 - about 850.000
5 - about 2.500.000
I am also interested in ways/methods of estimating these numbers - any ideas? and how did you reach that number?.
The PR10 and PR 9 sites are easy. Just look for the highest ranking SERP's on stopwords [webmasterworld.com] add a search for "university" and "news" and check Google's/Dmoz rankings for major publicly traded companies. [directory.google.com]
One way of estimating the total number for the lower PR5 to PR8 sites is checking the above mentioned company rankings [directory.google.com] and estimating how many fall into which pagerank value. However these are listed companies and therefor more important.
DMOZ lists 383.000 categories so they claim. I would guess only approx 1 % of these categories carry one PR8 site on average – which would be 3.800 sites. (Science e.g. is a well represented category) – do not forget many high ranking sites list several times in different categories! Add some PR 8 sites not listed in DMOZ (unlikely, but say another 400).
Guessing the number of PR5, 6 and 7 sites is much more difficult. I recall Google saying they were using the most topical/interesting +/- 3 million pages for their Fresh label and that in general, in the beginning, most sites (but not all) had a minimal PR of 5 or 6 to qualify for this Freshness. Saying 40% of the PR5 and PR6 sites have Fresh and some have more than one Fresh page per site I estimated the above numbers.
Google claims to have indexed approx 2 billion webpages. Another estimate would be the number of pages per website. I would say 10. That would mean 200 million websites in the Goolge index. In an earlier thread [webmasterworld.com] we discussed if the real pagerank follows a log 6 of 7 scale. That should also allow for some guessing on the number of sites per toolbar pagerank digit – however in this case webpages are counted and not sites.
It seems bizarre to have only 3 PR10s. If there are no more it possibly indicates that the distribution is not just logarithmic (the base doesn't matter to this), or that the normalisation is wierd.
Either that or the zipf/pareto distribution needs log's of both scales to work. I think this makes sense, but I can't explain or justify math's on Friday afternoons...
If the three PR 10 sites: Google, Adobe and Apple have proportionally way more links to them then the rest of the PR9 sites would it not be normal to have only three in the PR 10 league?
The problem is, Normalisation is something I remember I forgot 20 years ago..
With PR5 I mean only PR5. You think 2,5 million is way to high?
It's hard to speculate on the curves when we can't get a representative sample, but the affects that you describe may well be to do with the increase in Google's index. The first billion URLs are likely to contain the most well-linked URLs, so the second billion must contribute much more rank source than rank sink.
<added>Sorry, I read the PR5 as being 2.5 billion, hence PR5 and below. 2.5 million sounds sensible, but it feels too low.</added>
[dmoz.org...]