I just checked a few sites I'm familiar with (way too familiar, LOL) and still see the old bogus numbers. I'm glad you see a change, though. It's hopeful.
[edited by: tedster at 12:36 am (utc) on July 10, 2006]
It's strange, the new more accurate results only seem to show for certain sites.
Take a look at site:www.cnn.com on 18.104.22.168 for an example of the new data.
Good catch. I just tried that data center and got credible numbers for my two sites. (4,760 for my main site, compared to Google.com's estimate of 24,600 pages.)
"... certain sites", I think is correct.
This is a page count discussion, please excuse me.
That DC is even further out of line for ours, maybe in e a couple of centuries the 'plex's AI will have learned how to count.
Those DC's are showing 2.9 Million pages for my site and trust me when I tell you that I do not have anywhere close to that many pages.
The site I have been watching is even worse than ever today, on that DC and my local one.
Accross the 620 known IP addresses for Google, there are at least a dozen different site counts for every site out there.
The differences in reported page numbers are huge.
The 22.214.171.124 shows a correct page count for the ODP (site:dmoz.org) for the first time in many years. All the pages have been non-www for years. The count for non-www had been massively inflated for quite a while.
Interestingly, it now also shows a few www pages, something it has not done since the days of the 302 redirect hijacks (when I mentioned on WebmasterWorld that site:www.dmoz.org returned 30 million pages none of which were on www.dmoz.org itself, that SERP went to zero results within a few hours and had been there ever since. The "302 bug" was not fixed for other domains that I was watching and which I did not mention here.)
[edited by: g1smd at 9:40 pm (utc) on July 9, 2006]
Same old bogus numbers for me. Not even close to reality.
Those (bogus) numbers are single largest factor in determining how much traffic Google sends me.
##Apologies for being patronising, I'm just mad at this.##
I can't believe that people do not understand the correlation between inflated results and how Google views your site. It's been covered so many times by various people, you just have to read between the lines.
Here's the explanation:
When Google estimates (yes, estimates) the number of pages of yours that it has in its index it does not count them all. It just counts how many are in the top 'x' results (given that pages are stored by overall 'value' - maybe some variant of PR - in a rather large database). IF your site is well-ranked then more of your pages will be in the 'sample' dataset and hence google will overestimate your total page numbers.
It would be foolhardy to count every page when looking up a site: search. This also explains why numbers generally get more accurate as you page through results , page 1 deals with less 'sample' data than page 10.
My guess is that most people who even know how to check things on Google will be above average on rankings, therefore the general opinion of the webmaster 'in the know' is that Google inflates page numbers. However, if your site is poorly ranked (very poorly ranked) then the number of pages returned for a site: query will be lower (as less pages will show up in the first 'x' results).
I could be wrong, but I'd stake a fair amount on this being at least a part of the way to explaining why site: counts are 'wrong'.
Not sure if that was directed at me, but your post sounds reasonable to me.
inbound, I might buy your argument except for the fact that the vastly inflated numbers are a fairly recent phenomenon, unless I'm mistaken. I never encountered them before the middle of last year (it might have been even later than that), and I don't recall hearing complaints about inflated page numbers before then.
[edited by: europeforvisitors at 4:53 am (utc) on July 10, 2006]
Cmon, I can't even imagine how or why they would "estimate". How hard is to count an actual number of URLs with the same domain address?
(No improvement on counting pages that I can see.)
[edited by: steveb at 9:29 am (utc) on July 10, 2006]
" Cmon, I can't even imagine how or why they would "estimate". How hard is to count an actual number of URLs with the same domain address?"
Got a kick out of that statement! It has been a couple of years since I've seen a remotely accurate page count out of big G. I guess it is pretty hard either that the PHD's had to dump some of their basic education to make room for their Google egos.
>> the vastly inflated numbers are a fairly recent phenomenon, <<
I had seen hints of it at least two years ago, but I no longer have the data that would have confirmed or denied what was happening in mid-2003 too.