Forum Moderators: Robert Charlton & goodroi
A few months ago, some people noticed that the count was inflated by 8 to 10 times the real number of pages for some sites, but only when the count was more than one thousand.
If you have several folders on the site, or several types of pages, you could do searches like this to get each one below 1000:
site:domain.com inurl:products.php
.
Make sure, too, that you are not showing pages at both www and at non-www. That can be a big problem.
These searches are also useful:
site:domain.com
site:domain.com -inurl:www
site:www.domain.com
I'd checked for the canonical issue (www v/s non-www) and we're good there, no duplication.
So, essentially, this means that the only way for me to know the exact number of pages Google has indexed is to go folder by folder? Also, I remember reading somewhere that the Hotbot saturation number was a good indication of the actual number of pages indexed by Google. How much water does that theory hold?
/widgets/
/widgets/index.html
are the same page but count as 2. On one site all my pages got counted like this, so the number's doubled.
Then, Google figured out how to get ahold of the list of recent searches, and indexed all of them. Every time googlebot stops in, he checks to make sure all the old searches work, then adds the new recent ones. Now the number indexed goes way up.
Later, googlebot experimented with dropping variables out of searches, and found various variations like terms=all or terms=any or just not including terms= for 3 variants of the same page.
How many pages do I have?