homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Indexed Pages / Panda Survey

 9:28 pm on Nov 15, 2011 (gmt 0)


I'm doing a quick survey to see the correlation between the number of pages in your sitemap, how many pages indexed, and whether your domain has been hit by Panda. I realize there are many factors outside of this but just wanted some basic information if you would like to share.

All of this information stems from your Google webmaster tools section >> Sitemap.

For one of my sites:
Submitted URLs = 207,188
URLs in web index = 204,940
Pandalized = Yes

2 Months prior before starting to fix anything
Submitted URLs = 207,188
URLs in web index = 27,659
Pandalized = Yes



 9:51 pm on Nov 15, 2011 (gmt 0)

Here's one I work with:

Current Situation
Submitted URLs = 460,000
URLs in web index = 130,000
Pandalized = Yes

2 Months prior to fixing anything
Submitted URLs = 16,240,000
URLs in web index = 1,900,000
Pandalized = Yes

As you can see, part of the "fix" was removing a HUGE number of dynamic URLs that had very shallow (and even duplicate) content. Still looking for that next Panda iteration to see if it helped.


 10:07 pm on Nov 15, 2011 (gmt 0)

Submitted URLs = 1729
URLs in web index = 1722
Pandalized = Yes

I have not reduced the number of pages in my sitemap.


 10:33 pm on Nov 15, 2011 (gmt 0)

question - how are you calculating the "URLs in web index"


 11:03 pm on Nov 15, 2011 (gmt 0)

I have noticed major ranked top #1 sites removing sitemaps for a while now. Any ideas?


 11:21 pm on Nov 15, 2011 (gmt 0)

Wouldn't it be more relevant to know how many pages Google has actually indexed (i.e. site:example.com) versus pages submitted rather than sitemap urls accepted.

In our case that would be:
Submitted URLs = 507
Sitemap URLs accepted = 502
URL's actually indexed = 1,020 (duplicate content issues)
Pandalized = Yes

Prior to Panda those numbers were more than double. We've combined pages, addressed technical issues causing duplicate content and added canonical tags which are all bringing the numbers down gradually.


 11:53 pm on Nov 15, 2011 (gmt 0)

I'm not really sure that if site:example.com shows more pages than their actual number, it means duplicate content. The site: operator gives an aproximate result anyway...


 1:00 am on Nov 16, 2011 (gmt 0)

I wouldn't go by the site: command - on one site I was just looking at today, there are 1993 URLs in the sitemap, GWT says 1981 of them are in the index, but the site command says 169k URLs. And the site has never been touched by Panda.

(Part of the URL discrepancy can be explained by the fact that we changed platforms and rewrote all the URLs for the old site last night, and despite all the rewrites, there are still tons of the old URLs in the index. Not 169k of them, though, I'm pretty sure)


 1:23 am on Nov 16, 2011 (gmt 0)

Thanks for causing me to check against the Site: command. Here's the story:

URL's Submitted: 3068
URL's in Web Index: 3005
Results using Site: 8,940
Pandalized = Yes

I got hit by the October 14 update. I had a thousand duplicate pages according to WMT due to poor URL structure and design. A mistyped url would return a page wrapper, not a 404 error. I fixed the ones that were showing in WMT a few weeks ago and thought I was done... until I just checked the Site: command and notice I have a ton of work left to do.

In fixing the previous thousand it seems like it has taken google a long time to realize that the bad pages were either 404'd or 301'd. I submitted a new sitemap, but according the the Duplicate title diagnostics tool google is only about half way done digesting the first round of fixes. Is there a way to speed this process up as I now 404 about 6,000 more pages (and reduce my site size by nearly 2/3's in google's eyes)?

Here's a problem. Let's say when I do a Site: command for a sub directory and it has more than 1,000 good pages in it and probably 2,000 bad pages. Of course the good pages will generally be among the first 1,000 returned and then I can't even see the bad URL's in order to fix them. Anyone know a solution?


 3:11 am on Nov 16, 2011 (gmt 0)

(I wrote last night in my post, and I meant last year. d'oh)


 3:29 am on Nov 16, 2011 (gmt 0)

Submitted URLs = 45,793,544
URLs in web index = 33,784,984
Pandalized = No

My site has fluctuated with some panda iterations but has doubled in traffic overall since February. After panda 1.0, I updated my sitemaps from 16 mil to the current number above and the number of pages indexed has roughly doubled since then.


 4:55 am on Nov 16, 2011 (gmt 0)

Submitted URLs = 2,542
URLs in web index = "About 2,200 results"
Pandalized = Think so... Up through Feb 21, Down Feb 24 and continuing down weekly


 5:49 am on Nov 16, 2011 (gmt 0)

My alpha website -
On Feb14 (Panda 2, prior to fixing anything)
Submitted URLs = ~1200
URLs in web index = ~1200
Pandalized = Yes

Current Situation:
Submitted URLs = 1135
URLs in web index = 1135
Pandalized = Yes

Since Panda2, the site traffic stands still with no move up or down while passing Panda2.1 to 2.5.2 .

Activities since February:
1. 250 new articles added.
2. Deleting roughly 300 articles and category pages 2-~.
3. Improving 100 article's content.
4. Strengthening its Facebook/twitter accounts.
5. Overall look and feel improved. Navigation etc
6. Some minor brandings.
7. Full of ads. As always. Accept 1, all others under fold..

* NEXT: Planning to add videos, slides, infographic etc.
* The site is a wordpress blog and still gets thousands of visits per day.
* Bing/Yahoo provide good quality traffic lately, but not in the same traffic scale as Google.

Some subjective details -
* The niche is very competitive.
* Most, but not all, of the articles are written by good writers that have good knowledge on the topics they wrote about.
* Some of the articles (~50) are masterpieces.
* Easy reading articles minimum 500 words and some are 1500 though.
* Many competitors got hit even harder and still suffer every Panda iteration.
* Only two (2!) big brands and some small sites are the winners after all these Panda rounds. Some small sites weren't hit.

My takes -
+ Unless Google decides to change its strategic the site is not going to recover (nor any will).
+ Over time, the site might get more traffic from other sources what may improve its authority in front of SEs.


 5:57 am on Nov 16, 2011 (gmt 0)

For my main website:

Submitted URLs = 68,300
URLs in web index = 66,900
Pandalized = No


 5:57 am on Nov 16, 2011 (gmt 0)

My takes -
+ Unless Google decides to change its strategic the site is not going to recover (nor any will).

Sorry for this slightly OT comment, but at this point several sites have posted here about Panda recoveries and I've spoke privately with several more - some who are now seeing traffic beyond their pre-Panda levels.

So the situation is NOT hopeless. Challenging, oh yes, but not hopeless.


And now back to our regularly scheduled survey of indexed pages.


 2:46 pm on Nov 16, 2011 (gmt 0)

Yes, I chose not to include the site: operator since page numbers can change hourly or daily. I'm the master of my sites and I know exactly what is getting indexed.


 2:57 pm on Nov 16, 2011 (gmt 0)

I have been quite curious about doing this survey for awhile. There were strange anomaly of Google bot activity on Panda releases (new versions). Since my fix, I have not seen any odd spikes of activity.

My traffic is slightly going up, but not by much. I know I still need to kill a large percentage of pages on the site that are older the x amount of time and hold little relevance to the company's site anymore.


 3:25 pm on Nov 16, 2011 (gmt 0)

Submitted URLs = 16,240,000

16 million pages?


 8:57 pm on Nov 16, 2011 (gmt 0)

Here's one I observe :

Current Situation
Submitted URLs = 158
URLs in web index = 128
Pandalized = Yes / recovering

2 Months prior to fixing anything
Submitted URLs = 4,500
URLs in web index = 4,200
Pandalized = Yes

The fix involved the blocking of all aggregated content and restoring of new unique and fresh content

There are significant delays in Google reindexing URL's with new content onto previously blocked URL's
(anything up to 4 weeks on pages that previously got updated every 7 - 10 days or less).

Existing URL's with new content update usually 7-10 days.

Are others finding this issue which is slowing down recovery?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved