Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Reasons Why Pages Drop Out of Google Index?

         

bw3ttt

5:47 pm on Jun 26, 2007 (gmt 0)

10+ Year Member



I have/had a site with 64,000 pages indexed.. Now the pages seem to be dropping out at about 2,000 pages per day. Yesterday 3500 new pages were added, but today 4,000 dropped out.. I'm down to 49,000 pages indexed.. What could be the problem? The site is relatively new and did nothing but add pages in consistently for a few months.

Patrick Taylor

6:23 pm on Jun 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've recently seen pages drop right out of the index, although they show up on a site: command as supplemental. The TBPR goes grey and the pages are gone.

Typically, they are pages whose menu links are lower down on the sidebar menu and where they are similar in character to those above - eg product variants. The titles and headings etc are different but the pages are most likely seen by Google as 'more of the same' (as the ones higher on the menu). That's my theory anyway.

trinorthlighting

6:35 pm on Jun 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check to see if there is any duplication. Google seems to be dumping duplicated content out of its index recently. Check to see if your being scraped and outranked by a scraper.

tedster

7:37 pm on Jun 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Also, if you can zero in on them, check to see if those "dropped" urls are still getting Google traffic. The site: command often has inaccuracies - after all, it is only a reporting function and Google shards their back end data in a way that can make page counts merely rough estimates (they do use the word "about", after all.)

Your actual traffic is the hard data here, and it's important to keep that in the big picture when Google's various reports seem to show either an up or down trend.

SEOold

8:06 pm on Jun 26, 2007 (gmt 0)

10+ Year Member



The site: command often has inaccuracies - after all, it is only a reporting function and Google shards their back end data in a way that can make page counts merely rough estimates (they do use the word "about", after all.)

I have to agree with this. I seen this with my site which is over 150K pages. I can break down my pages because some of them start with specific urls and I can see that it doesn't add up.

Patrick Taylor

8:33 pm on Jun 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, in the case I'm referring to, the pages that dropped out recently and went grey barred (but show as supplementals with the site: command) never received much serps traffic, so overall traffic isn't really affected. Previously these pages had TBPR 2, like the ones above them in the menu and which are still TBPR 2 and fully indexed.

The way Google has dropped the pages whose link buttons are lower in the menu (on several different menus, in fact) suggest a deliberate pruning job rather than a glitch. But then in a few other cases, pages are grey barred and still in the index.

bw3ttt

9:54 pm on Jun 26, 2007 (gmt 0)

10+ Year Member



I think Patrick Taylor has it nailed..

It's a comparison shopping site in the UK and has a lot of results like this:

Main results --> refined results based on price, brand, color etc..

Really there are hundreds of millions of URLs based on about one million individual products.

The refined results are extremely similar and I think Goo is getting rid of them..

ichthyous

4:32 pm on Jun 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have also seen the same trend. I noticed a sharp drop in traffic in the last two weeks and so I started to investigate. Many of my pages with low PR have been moved into supps and now just have a grey bar. All of the links come into the first page of each category, and by the 10th page or so they pages drop out entirely.

pageoneresults

5:36 pm on Jun 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Its all about click paths. Those pages further down the chain of command are going to be candidates for the Supplemental index. PageRank™ appears to be the major determining factor in which pages end up there (in this scenario).

There have been PageRank™ updates recently along with Google's constant cleansing. If you have pages sitting at PR3 and below, those are prime candidates for the Supplemental index. But, there are other determining factors. Like what the overall PR is for others in your industry. If the leaders are at PR5/PR6, then pages at PR3 are fairly strong. So I would expect to see pages at PR2 and below falling into the Supplemental.

If you find that you have a high ratio of Supplemental pages, that may be cause for concern. Think about quality scoring issues. If I have 1,000 pages and 900 of them are Supplemental, that is a very high ratio and one that I'd be concerned with.

ichthyous

2:24 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That is exactly what I am seeing pageone. I don't understand why this is happeneing now though? The pages have had the same rank all year and I have had no problem with supps until now. It seems that Google is constantly tightening the standards for what qualifies to remain in the index and what will fall out. Since my site is a photo site I condensed more images onto each page, which means fewer overall pages and hopefully the PR of each page will go up enough to bring them back into the index.

bw3ttt

3:34 pm on Jul 2, 2007 (gmt 0)

10+ Year Member



I wasn't talking about pages going into the supp results.. They are no longer in the index at all.. I'm down to 49,100 pages from 64,000 using site:domain.co.uk

There's no real drop in traffic, but the site is new and there's never really been much anyway..

The slaughter of my pages seems to have stopped now.. nothing has dropped out in about a week..

pageoneresults

5:40 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That is exactly what I am seeing pageone. I don't understand why this is happeneing now though? The pages have had the same rank all year and I have had no problem with supps until now.

If I'm not mistaken, Google may have reached their maximum index size a while back. Since then it has been a constant merge and purge of data.

It seems that Google is constantly tightening the standards for what qualifies to remain in the index and what will fall out.

If you were a search engineer and had limitations on what took priority, naturally you'd be looking at pages that sit at the bottom of the click path that may not represent the best value for the users click. That's not always the case, but when dealing with an algorithmic solution, there is always collateral damage.

Just because a site: search shows you have pages in the Supplemental, doesn't exactly mean there are any major problems. Too many people fixate on that site: command and the results aren't always what they appear to be. ;)

On a site that gets indexed frequently, you can expect a certain percentage of your pages to be in constant flux within the Supplemental results. Its a given.

Since my site is a photo site I condensed more images onto each page, which means fewer overall pages and hopefully the PR of each page will go up enough to bring them back into the index.

I would have kept the individual pages and just blocked them from getting indexed for now. Direct the bot to the upper level click paths so that you can establish PR across those levels. Then open up the lower level click paths and spread it there.

Or, I would have figured out a way to bring those pages further up in the click path. In fact, that would be my first solution. How to get those pages up one or two levels?

I wasn't talking about pages going into the supp results.. They are no longer in the index at all.. I'm down to 49,100 pages from 64,000 using site:example.co.uk

Merge and purge. A cleansing of the indices.

There's no real drop in traffic, but the site is new and there's never really been much anyway.

For many, there really isn't any drop in traffic. In fact, for most, traffic is increasing. Its just the natural flow of The Gorg. If you see traffic decreasing, that may be cause for concern.

New site, 64,000 pages. Hmmm, that's a lot of pages for a new site. And, if others in your industry don't have that number of pages, you've probably fallen prey to the "historic" algo. In due time things will settle down. You'll add one day to your life for each day that you check your search engine positions, believe me. :)

The slaughter of my pages seems to have stopped now.. nothing has dropped out in about a week.

New site, bouncing around the data centers. It will be a while before it gets seated in the SERPs, if it does. Anytime you have a large number of pages like you do with a new site, you have to look at what your industry practices are before unleashing that much content. You'll suck the life right out of site, right out of the gate. It will take a long time to "build" trust, history, etc. Fewer pages for indexing might have been a better solution.

ichthyous

6:13 pm on Jul 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would have kept the individual pages and just blocked them from getting indexed for now. Direct the bot to the upper level click paths so that you can establish PR across those levels. Then open up the lower level click paths and spread it there.

Not sure what you mean by this. None of the lowest level pages have been deleted, just reorganized. Each category now has fewer pages, i.e. categories that used to have 6 thumbnails per page now have 12, so basically the categories have half the number of pages. I am hoping that since they are now fewer clicks away from the main category page it will "concentrate" the PR on the fewer pages. I am not sure what you mean "direct the bot to upper level click paths"...how would that increase PR on those pages?

Or, I would have figured out a way to bring those pages further up in the click path. In fact, that would be my first solution. How to get those pages up one or two levels?

I agree with that, but it's simply not possible when you have a huge archive of images. They have to be organized in some fashion thematically or people will get lost. I don't use sub-categories anywhere except where it is absolutely necessary...and the subcats are fairing better since I also focused on getting deep links directly to them a while ago. I think the categories suffering the most simply have too many pages and not enough links comeing in to pass PR to so many pages. It's geting harder and harder to get good links in at all, and now many sites have gone back and either removed links or are using nofollow tags on old links.

bw3ttt

7:46 pm on Jul 12, 2007 (gmt 0)

10+ Year Member



""New site, 64,000 pages. Hmmm, that's a lot of pages for a new site. And, if others in your industry don't have that number of pages, you've probably fallen prey to the "historic" algo. In due time things will settle down. You'll add one day to your life for each day that you check your search engine positions, believe me. :) ""

Well, bizrate.com has 7.4 million pages indexed and shopping.com has 11.5 million pages indexed.

My 64,000 is pretty wimpy compared to them.. Give me about 5 years of history and a PR7 and I hope to get a few million pages indexed too.

I have a PR3 and I'm wondering if I've reached the maximum amount of pages they will allow for my lowly PR. Once I reach 64,000 pages they start dropping out, then they add some more in, but I can't get past 64,000...It really is a constant cycle of joy and pain.. Why bother hitting my site 25,000 times per day if they won't do anything with the data?

tedster

7:53 pm on Jul 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you're getting that kind of spidering, you're getting attention and building history with Google. I'd say don't try to fix indexing issues from inside the PR3 box -- just build your business, get those natural backlinks and the attendant PR growth, and things will evolve well. Your clean history, which may not be noticable in the SERPs right now, will serve you well in the future.

pageoneresults

9:20 pm on Jul 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I'm not mistaken, Google may have reached their maximum index size a while back.

I do believe I am mistaken, Google probably have enough storage for the next 100 years. Its just getting all the data in there. ;)

Constant merge and purge of data.

This occurs daily. There are larger updates on a regular basis and the constant flux on a day to day basis.

Well, bizrate.com has 7.4 million pages indexed and shopping.com has 11.5 million pages indexed.

Those are two sites that are in a category by themselves, they are authority sites. You surely can't expect to compete in that space with 64,000 pages? Wait, you could, but you'd have to target a niche and not try to replicate what they are doing.

My 64,000 is pretty wimpy compared to them.

Not if it is niche related. Pick a specific topic and focus on that. Once you've established history, start introducing a broader range of topics. Take it slooowww...

Give me about 5 years of history and a PR7 and I hope to get a few million pages indexed too.

These days, it might take a bit more than that to get a few million pages indexed. This time last year, yes. Right now, probably not. Unless of course you are an authority in your space.

I have a PR3 and I'm wondering if I've reached the maximum amount of pages they will allow for my lowly PR. Once I reach 64,000 pages they start dropping out, then they add some more in, but I can't get past 64,000.

I've seen instances on number of pages indexed being capped. But, further advanced searches may uncover those other 136,000 missing pages. ;)

PR3? Not enough juice for 64,000 pages. Not in the space you are referring to. Nope.

It really is a constant cycle of joy and pain.. Why bother hitting my site 25,000 times per day if they won't do anything with the data?

Oh, but they are. Its bouncing around the various datacenters right now being crunched, filtered, regurgitated, etc. At some point, it will make it into the main index, or it may not. PageRank™ is going to be the determining factor. PageRank™ will determine crawl frequency. PageRank™ will determine how many pages you can push into the index without them dropping into the Supplemental Index. PageRank™ will also determine how many pages have ended up in the SI in the past 90-120 days too. Google have really been on a rampage lately with purging lower PR pages.