|# of indexed pages more than website size|
Website has 10K pages, but Google's index has 15K pages
I have the opposite question of one that's probably most often asked on this forum. My website hasn't been dropped from Google or the number of indexed pages have not gone down drastically. My website contains approximately 10,000 pages but with the "site:mywebsite.com" or "site:mywebsite.com -dgfdgfff" operator on Google yields 15,000 results. My website is static, so there's no question of sesson ids.
Why is Google showing this inflated number?
Is there a more accurate way of finding out how many pages exactly have been indexed? I don't want a missed opportunity where I'm missing non-indexed pages because of the inflated number.
Google takes a "guess" at the number and refines it as you click through the pages of the SERPs. You can see this in action for sites that only have a few hundred pages, The initial figure is often slightly wrong ("1 to 10 of about 750" on the first page, and then "731 to 737 of 737" on the final page).
A few months ago, some people noticed that the count was inflated by 8 to 10 times the real number of pages for some sites, but only when the count was more than one thousand.
If you have several folders on the site, or several types of pages, you could do searches like this to get each one below 1000:
Make sure, too, that you are not showing pages at both www and at non-www. That can be a big problem.
These searches are also useful:
Thank you, g1smd, those are all nuggets to know.
I'd checked for the canonical issue (www v/s non-www) and we're good there, no duplication.
So, essentially, this means that the only way for me to know the exact number of pages Google has indexed is to go folder by folder? Also, I remember reading somewhere that the Hotbot saturation number was a good indication of the actual number of pages indexed by Google. How much water does that theory hold?
try: site: en.wikipedia.org .. 154.000.000 pages.. lol
My site has roughly 2,000 pages, but Google index was showing 17,000 on the weekend. It is down to 12,000 today.
Try different datacentres. You'll probably find that the 12 000 is reported as 50 000 in some and 350 in others too...
I jest ye not.
The way they count can inflate your number of pages. For example:
are the same page but count as 2. On one site all my pages got counted like this, so the number's doubled.
Then, Google figured out how to get ahold of the list of recent searches, and indexed all of them. Every time googlebot stops in, he checks to make sure all the old searches work, then adds the new recent ones. Now the number indexed goes way up.
Later, googlebot experimented with dropping variables out of searches, and found various variations like terms=all or terms=any or just not including terms= for 3 variants of the same page.
How many pages do I have?
Use the robots.txt file or robots meta tag to get the extra ones back out of the index.
It varies between the dc's from 11,800 - 12,500, down from 16,500 - 18,000 five days ago.
Another site has 1,500 pages and I have only a single page in the index straight accross all dc's.
First site is 8 years old, and the other is 4.
When checking how many pages I have indexed accross the dc's this morning, I noticed that I range today from 940 pages all the way up to 17,000 again. I am top 5 the the dc's at the high end and low end.
64.233.161.xx is listing me at 940 pages and this is accurate with how many pages out of my 1,500 should be listed (no index, no follow would apply to the other pages). Rankings for me are solid here and match allinanchor.
72.14.207.xx has me at 17,000 pages with also good rankings. That number equals approximately how many total pages have been made for that site since the beginning. This would mean that dupe content filter has been removed for this set. Allinanchor has me 3 but I don't show until page 6.
All other dc's list between 9,300 and 14,400 pages. Rankings are poor on these sets of dc's.
My site is top 3 on Y & MSN and has been top 3 in G for 4 years, so I figure that the 64.233.161.xx dc's are the future of G.