Welcome to WebmasterWorld Guest from 18.208.211.150

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Loss of Indexed Pages - Causes?

     
6:38 am on Jun 9, 2016 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 3, 2015
posts: 263
votes: 44


I've noticed a very steady decline of page loss over the last few months on Google's index.

At the start of the year we had 5.8 million pages index. Now we are at about 4.3 million. Sometimes there are 200K pages lost on Google's Index (per webmaster tools) in a week.

However, I looked at our stats and our traffic is up from last year and day to day we are getting more or equal the amount of organic search engine traffic from Google, so no loss there. We still have good keywork rankings.

Because we have a large forum with over 16 million posts and 500K threads, I had a ton of messages on WMT for "duplicate meta descriptions and "duplicate title tags." Because of how Vbulletin works, it likely overpopulated my site with millions of pages from our forum - so I brought in an SEO/Coder to make sure the titles tags and meta descriptions - especially of forum threads with multi-pages were not identical to each other. I also made use of canonical links because there were VBulletin flaws were certain pages had more than one URL pointing to it.

I believe between those fixes and likely some Google updates, it cleaned out a lot of unnecessary pages.

Just wondering if anyone with a massive amount of indexed pages has gone through this before?

I did see that a year back the same thing happened and then a two weeks span 2 million pages were added to index.

Just hoping this is something I don't have to worry about.
12:29 pm on June 9, 2016 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:25913
votes: 880


I've also noticed pages with any kind of error tends to trip Google into putting it into a sin-bin of sorts. They may not be warnings in the search console worthy of a heavy slap, but might just be a fault or error showing up on Google's crawl.

Fix the erros and once re-crawled it seems to recover, after while, but it may take a while.
11:58 am on July 7, 2016 (gmt 0)

New User

joined:Apr 30, 2015
posts:37
votes: 9


We have noticed and are now searching for answers.

Dec 2015 we deleted about 30,000 web pages of our 350,000 site. This was a change in business services no longer provided and something in 9 years the site has never done. So much so that we got a WMT email from Google saying

"Googlebot for smartphones identified a significant increase in the number of URLs on [domain.co.uk...] that return a 404 (not found) error. If these pages exist on your desktop site, showing an error for mobile users can be a bad user experience. This misconfiguration can also prevent Google from showing the correct page in mobile search results. If these URLs don't exist, no action is necessary."

So we did nothing, since then our index has reduced slowly from 333,707 to 267,400. I am unable to figure out why, I have tested the dropped pages in Fetch as Google tool and they come back ok.
5:59 pm on July 7, 2016 (gmt 0)

New User

joined:Oct 19, 2014
posts:15
votes: 3


this is really interesting...In the past google used to pride themselves on the size of their index, featuring it on the homepage...apparently, crappy content used to get collected in the supplemental index. It seems iwth panda and consolidation of unique presences in search results, it seems that they've become more selective with what they index....in the past, i used to use bing, who had a different approach of not indexing content that they considered low quality and comparing pages indexed to get some heuristic idea of of content quality...just thinking out loud, i'm wondering if because you're a forum, if its possible that the content thats been deindex'd only has a couple comments or included a lot of archived threads and so forth...if you've ruled out that there are any barriers to indexation, maybe try looking at if there is something you can do to tweak templates to make them better mashups...also, maybe do some log analysis see if you can glean some insights that way....
7:27 pm on July 7, 2016 (gmt 0)

Junior Member from US 

10+ Year Member

joined:Jan 11, 2007
posts:152
votes: 1


@vegasrick - Have you checked to see if any of your pages have moved to the supplemental index aka. omitted results?
Google should be able to figure out pagination on a forum, dupe page titles to pagination shouldn't be a bullet its a sin that MANY sites still do.
Does your install of vBulletin use rel=next and prev?
12:47 am on July 8, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 15, 2003
posts:954
votes: 30


I wouldn't get overly concerned about your indexing counts. The various restructurings of Google's index over the years (especially the most recent) has led to them having frequent problems in extracting accurate page counts in the GSC - especially for larger websites. If you're confident that you haven't had a technical issue that would have blocked Googlebot from sections of your site or caused sections to be unavailable, then I'd consider any fluctuations in the report to most likely be transient issues in the GSC data.
8:13 am on July 8, 2016 (gmt 0)

New User

joined:Apr 30, 2015
posts:37
votes: 9


These are not fluctuations in the reporting of the index. In our case these are manually checked pages by ourselves that's been removed from the index.

I am currently using the Fetch as Google tool to check then submit to index it an attempt to wake up google and force these pages back into the index. This seems to have an immediate effect, will monitor to see for how long. But I am not able to do this for 30-35 thousand pages.

There seems no pattern nor reason for the dropping of these pages. For eg, a in a list of 20 products by category, the list page is indexed but in this list maybe 3 or 4 product pages are no longer listed. The same dynamic page, same product group and manufacture, just a different product. If I could see a reason I could address it, its just very random without logic.

The only thing I can grasp is that we have deleted in one go 30k pages, triggering a heads up from google about this and since then google seems to be removing random chunks of our site from the index.
6:14 pm on July 11, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 14, 2008
posts:80
votes: 4


@FishingDad When you fetch as google and the page gets indexed I've noticed that the same page will get de-indexed after about a week. Please let us know if you see the same thing happen to you.

Note my feedback is regarding an extremely large website - millions of pages, reference website.

- We've been experiencing a huge and steady decline in index rate in Google most recently in February of this year - it's not stopping. It's a problem for us because it directly correlates to the declining amount of traffic (so not just a problem w/ Google reporting).

- After much analysis and many folks working on this problem, our best lead is that this is due to duplicate and thin content. We identified some failures handling canonicals & 404s correctly and in how we handled pagination - we ended up with a TON of paginated pages that didn't need to be indexed. These issues have been resolved.

- During this de-index process we've noticed that Gbot has slowed to a snails craw. Dropping to 1/10th of what it used to be. This is a problem since it'll take Google a VERY long time to de-index those low value pages and replace with quality content.
8:53 pm on July 11, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


I've noticed a very steady decline of page loss over the last few months on Google's index
If it is truly a loss in the index, that's one thing. If however the loss is displaying only in Google Search Console "indexed pages" then it may not be an actual loss, just the report. I've seen this happen several times and within a week or two, the correct number of indexed pages will once again display.
8:55 am on July 12, 2016 (gmt 0)

Preferred Member from BG 

Top Contributors Of The Month

joined:Aug 11, 2014
posts:546
votes: 173


A common reason for pages disappearing from the index is if they have expiry date on, for example pages related to events scheduled reports, meetings or similar. Another reason for disappearing pages from the index is bad javascript. Seen it on a couple of occasions where on the documentation subdomain the documentation solution had some bad JS updates causing entire directories to be uncrawlable and thus removed from the index. The last and most often seen case is due to heavy duplication (as is with your case). If you suddenly have a large number of duplicated pages, via no automatic rel=cannonical , bad JSON_LD implementation of the forum posts, or any other technical issue, you might end up with huge pile of dupes, that will be initially indexed and then dropped over time.
8:55 am on Sept 27, 2016 (gmt 0)

New User

joined:Apr 30, 2015
posts:37
votes: 9


Update on our experience;

After WMT recorded a loss of over 100,000 page from the index, viewed from WMT site and the google site:www.domain.com search. They have after what has been a 9 month gradual loss of pages started to reindex.

site:www.domain.com search reporting 70,000 increase in the last week, WMT showing 20,000 increase.

We have done nothing to change anything in fear of making matters worse as could not find anything wrong with our site. I just hope this odd event which seems to be sparked off by us deleting about 30,000 pages from our site at the same time the loss started is now going back up in our favour.
7:11 pm on Sept 29, 2016 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 14, 2008
posts:80
votes: 4


Thank you very much for the update FishinDad. No change yet in a positive index rate on our end - although our rate of deindexing seems to be declining.
2:51 pm on Oct 13, 2016 (gmt 0)

New User from US 

joined:Mar 24, 2016
posts:5
votes: 0


I manage a forum as well, and have seen a very similar issue. We had a problem with bloat on our site (PHPBB creating 10+ duplicate URLs every time a post was made) which we recently fixed, and our index count has dropped from 45K to 16K over the past few months - Normally I would think this is a good thing (since our goal was to reduce the size of the site), but we've seen a lot of weird side effects in WMT.
See my post here for more info:
[webmasterworld.com...]