Forum Moderators: Robert Charlton & goodroi
The new version finally seems to report accurate-ish numbers for page counts beyond 1000, instead of the usual order-of-magnitude exageration.
The new version is currently available on several DCs, but seems to be rolling out as I write, e.g:
72.14.203.99
64.233.167.99
64.233.167.104
Interestingly, it now also shows a few www pages, something it has not done since the days of the 302 redirect hijacks (when I mentioned on WebmasterWorld that site:www.dmoz.org returned 30 million pages none of which were on www.dmoz.org itself, that SERP went to zero results within a few hours and had been there ever since. The "302 bug" was not fixed for other domains that I was watching and which I did not mention here.)
[edited by: g1smd at 9:40 pm (utc) on July 9, 2006]
I can't believe that people do not understand the correlation between inflated results and how Google views your site. It's been covered so many times by various people, you just have to read between the lines.
Here's the explanation:
When Google estimates (yes, estimates) the number of pages of yours that it has in its index it does not count them all. It just counts how many are in the top 'x' results (given that pages are stored by overall 'value' - maybe some variant of PR - in a rather large database). IF your site is well-ranked then more of your pages will be in the 'sample' dataset and hence google will overestimate your total page numbers.
It would be foolhardy to count every page when looking up a site: search. This also explains why numbers generally get more accurate as you page through results , page 1 deals with less 'sample' data than page 10.
My guess is that most people who even know how to check things on Google will be above average on rankings, therefore the general opinion of the webmaster 'in the know' is that Google inflates page numbers. However, if your site is poorly ranked (very poorly ranked) then the number of pages returned for a site: query will be lower (as less pages will show up in the first 'x' results).
I could be wrong, but I'd stake a fair amount on this being at least a part of the way to explaining why site: counts are 'wrong'.
[edited by: europeforvisitors at 4:53 am (utc) on July 10, 2006]
Got a kick out of that statement! It has been a couple of years since I've seen a remotely accurate page count out of big G. I guess it is pretty hard either that the PHD's had to dump some of their basic education to make room for their Google egos.