homepage Welcome to WebmasterWorld Guest from 54.196.63.93
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
# of indexed pages more than website size
Website has 10K pages, but Google's index has 15K pages
Digerati

5+ Year Member



 
Msg#: 34149 posted 7:06 pm on May 3, 2006 (gmt 0)

I have the opposite question of one that's probably most often asked on this forum. My website hasn't been dropped from Google or the number of indexed pages have not gone down drastically. My website contains approximately 10,000 pages but with the "site:mywebsite.com" or "site:mywebsite.com -dgfdgfff" operator on Google yields 15,000 results. My website is static, so there's no question of sesson ids.
Why is Google showing this inflated number?
Is there a more accurate way of finding out how many pages exactly have been indexed? I don't want a missed opportunity where I'm missing non-indexed pages because of the inflated number.

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34149 posted 11:34 pm on May 3, 2006 (gmt 0)

Google takes a "guess" at the number and refines it as you click through the pages of the SERPs. You can see this in action for sites that only have a few hundred pages, The initial figure is often slightly wrong ("1 to 10 of about 750" on the first page, and then "731 to 737 of 737" on the final page).

A few months ago, some people noticed that the count was inflated by 8 to 10 times the real number of pages for some sites, but only when the count was more than one thousand.

If you have several folders on the site, or several types of pages, you could do searches like this to get each one below 1000:

site:domain.com inurl:products.php

.

Make sure, too, that you are not showing pages at both www and at non-www. That can be a big problem.

These searches are also useful:

site:domain.com
site:domain.com -inurl:www
site:www.domain.com

Digerati

5+ Year Member



 
Msg#: 34149 posted 12:59 pm on May 4, 2006 (gmt 0)


Thank you, g1smd, those are all nuggets to know.

I'd checked for the canonical issue (www v/s non-www) and we're good there, no duplication.

So, essentially, this means that the only way for me to know the exact number of pages Google has indexed is to go folder by folder? Also, I remember reading somewhere that the Hotbot saturation number was a good indication of the actual number of pages indexed by Google. How much water does that theory hold?

mattg3

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 34149 posted 1:19 pm on May 4, 2006 (gmt 0)

try: site: en.wikipedia.org .. 154.000.000 pages.. lol

theblackjeep

5+ Year Member



 
Msg#: 34149 posted 1:38 pm on May 4, 2006 (gmt 0)

My site has roughly 2,000 pages, but Google index was showing 17,000 on the weekend. It is down to 12,000 today.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34149 posted 9:51 pm on May 4, 2006 (gmt 0)

Try different datacentres. You'll probably find that the 12 000 is reported as 50 000 in some and 350 in others too...

I jest ye not.

treeline

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34149 posted 10:00 pm on May 4, 2006 (gmt 0)

The way they count can inflate your number of pages. For example:

/widgets/
/widgets/index.html

are the same page but count as 2. On one site all my pages got counted like this, so the number's doubled.

Then, Google figured out how to get ahold of the list of recent searches, and indexed all of them. Every time googlebot stops in, he checks to make sure all the old searches work, then adds the new recent ones. Now the number indexed goes way up.

Later, googlebot experimented with dropping variables out of searches, and found various variations like terms=all or terms=any or just not including terms= for 3 variants of the same page.

How many pages do I have?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34149 posted 10:18 pm on May 4, 2006 (gmt 0)

Use the robots.txt file or robots meta tag to get the extra ones back out of the index.

theblackjeep

5+ Year Member



 
Msg#: 34149 posted 10:52 pm on May 4, 2006 (gmt 0)

It varies between the dc's from 11,800 - 12,500, down from 16,500 - 18,000 five days ago.
Another site has 1,500 pages and I have only a single page in the index straight accross all dc's.
First site is 8 years old, and the other is 4.

theblackjeep

5+ Year Member



 
Msg#: 34149 posted 4:59 pm on May 5, 2006 (gmt 0)

When checking how many pages I have indexed accross the dc's this morning, I noticed that I range today from 940 pages all the way up to 17,000 again. I am top 5 the the dc's at the high end and low end.
64.233.161.xx is listing me at 940 pages and this is accurate with how many pages out of my 1,500 should be listed (no index, no follow would apply to the other pages). Rankings for me are solid here and match allinanchor.
72.14.207.xx has me at 17,000 pages with also good rankings. That number equals approximately how many total pages have been made for that site since the beginning. This would mean that dupe content filter has been removed for this set. Allinanchor has me 3 but I don't show until page 6.
All other dc's list between 9,300 and 14,400 pages. Rankings are poor on these sets of dc's.
My site is top 3 on Y & MSN and has been top 3 in G for 4 years, so I figure that the 64.233.161.xx dc's are the future of G.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved