Pages dropping out of the index - in two months time it will be 0 - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Pages dropping out of the index - in two months time it will be 0

Number of pages indexed drops from 112,000 to 270!

The_Tank

8:07 am on Apr 13, 2006 (gmt 0)

Has anyone else suffered from this? Has anyone else got their pages back? Did any one make changes to their site? If so what did you do?

I can't be the only one, I know some forum sites that have had similar experiences - but what about other sites?

yandos

10:47 am on Apr 21, 2006 (gmt 0)

To Martin:

The site in question is a nationwide business directory, so its not hard to have a large amount of pages with unique content (each listing has seperate name, address, contact details and business type). The new site I also mentioned in my previous post has gone from 49K to 77K, and seems to be climbing. This is also a business listings directory BTW.

gford

12:15 pm on Apr 21, 2006 (gmt 0)

So after seeing some people post here about recovering I went to check my logs and see if true for me too. It's not. What is even more fun is for one site I just checked, in the last 24 hours, it has visited the home page twice and a depth=3 page 25 times. Thats it, just those two pages. :p

Fun huh!?

yandos

1:33 pm on Apr 21, 2006 (gmt 0)

Martin,

I've learnt from experience that sending on/posting URLs is not alway wise and have had issues with scraper sites/click bombing from more immature visitors.

The site itself is of course database driven and data has been purchased in the same way you can buy databases of links, scripts etc. You could say that other sites would be using the same data as mine and you'd probably be correct, so may lead to the assumption that I have a dup content penalty.

If this was the case however why does a search for 'blue widget x' bring back 500K results from different sites selling the same product? Why are people finding forums, one of the 'purest' forms of fresh content, being de-indexed? Why have newsgroup scraper sites which have exactly the same content as google groups still have millions of pages in the index? Why has a site I launched after the BD rollout not been affected by this de-indexing issue, or why is it even getting indexed at all?

I think it all points to a problem at googles end rather than anything else. Unless someone can find a clear concise explanation of what all the sites that are facing these issues have in common then i'm going to assume its something else.

yandos

1:41 pm on Apr 21, 2006 (gmt 0)

One more thing, I have a amazon web services site around the same as as the de-indexed directory site. AWS sites are pretty much ten a penny nowdays with 1000's of the same pages being generated by different people chasing that one sale. These sites must be easy for G to detect and wipe out, but the site still has the same amount of pages indexed that it did a year ago.

netmeg

3:24 pm on Apr 21, 2006 (gmt 0)

We're seeing all kinds of problems. My three biggest sites (actually they're my client's sites, I just host and manage them) all have thousands of pages each, and have gone either mostly supplemental or are down to a couple hundred pages in the index. These are not duplicate content or scraper sites, they are commerce-driven dynamic sites; one sells sporting goods, one sells salvage cars and one sells office supplies of a sort - they have thousands of pages because they have thousands of product listings.

I've been graphing the Googlebot traffic to these sites, and with the changeover to the Mozilla 5.0 bot, there's a very dramatic drop off about the end of March or so in all of the sites. Where there used to be several hundred to several thousand pages spidered every day per site, now it's down to maybe 20 or 30 - and they are the same 20 or 30 every day.

Meanwhile, I've put up five new sites in the past 30 days, and submitted them; needless to say they are not listed nor has the Googlebot even come to see them. I know that there have been a lot of postings about Google sandboxing new sites, but I have never experienced it myself - every time I add a new site, it seems to get picked up within a week. No more. Two of the sites have multiple inbound links from news articles on the local major newspapers (Detroit Free Press and Detroit News - they've been around a while) and those ARTICLES showed up in Google within hours after they were posted, so you'd think that the sites would make it that way, but nope.

Meanwhile, a personal site I run that has a single database driven page is just going like gangbusters - if anything, it's getting betters SERPS and traffic than ever, and has gained 2 PR in a very short time.

It's a mystery. Hard to come up with explanations for my clients.

iProgram

3:40 pm on Apr 21, 2006 (gmt 0)

It seems google has some problem with their BigGaddy and they have to roll back its datacenters.

Freedom

4:20 pm on Apr 21, 2006 (gmt 0)

It seems google has some problem with their BigGaddy and they have to roll back its datacenters.

Are you assuming this or getting your information from someone authoritative?

Likewise, are there any SEO bloggers that are reporting on this? I can't find anything from the few I visit.

Martin Galloway

5:07 pm on Apr 21, 2006 (gmt 0)

yandos,

Thank you for taking the time to explain in more detail how one can procure six digit page numbers.

I guess even at the age of 45, with 12 years experience on the Internet (but not in business on it), I have a lot to learn.

I don't even know what a scraper site is. I also didn't know that commercial databases were available for sale to the public.

On that note, in my innocence with four new sites that HAD good rankings - I have lost everything in Google.

No tricks, just hard work over a 6 month period copying data from a book because it wasn't available on the Internet. (believe me I looked day and night)

Someone mentioned that they proposed commercial sites were getting hit harder than amateur ones... I beg to differ. With my obvious lack of experience with SEO and all the neat tricks that come along with such education. I know none of them - but I have been bombed by Google and now I am right back to where I was in December 2005 before the robots.txt said c'mon in.

I have 7500 outbound link spread over four domains, lists of places to visit in the UK and Ireland. I have 5000 hotel/b&b listings and over 1200 high quality photos of the UK, increasing every day. My OBL's have been described by one chap as link spam, yet they were individually hand typed by me. (selected ironically enough from searches on Google).

After all the work I have done and continue to do every day, I am receiving no benefit from being honest with unique content.

Thanks again for your time yandos. I am very grateful for this forum, I have learned a lot in the few days I have been here.

Cheerio.

Martin.

[edited by: tedster at 5:51 pm (utc) on April 25, 2006]

Stefan

1:31 am on Apr 22, 2006 (gmt 0)

I knew that post would be like poking a hornet's nest, but did it anyway - I'll never get a good grip on discretion - so it goes.

"be honest - if you have a million pages, and you're not Wikipedia, most of it was generated to increase SE traffic"
site:www.webmasterworld.com

Ok, point taken, some of you have a million organically created pages. And you think each and everyone of them should be indexed, eh? Personally, as a user, I don't. I have to wade through that stuff every time I do a search.

I have much sympathy for those with sites that have hundreds of pages of real original content missing, but they're likely collateral damage from G trying to clean out the dross.

To echo some other post in one of these threads (doubt if I can find it to quote), this appears again and again - the algo changes (now it's a infrastructure/whatever), and there are a flood of posts from people whose sites went missing. It's always in competitive fields, commercially and SEO-wise. It's seldom that niche sites go missing. But I'm to believe that Google is broken every time? Why does the line-up never change much for the research-site searches that I do? The best, most-pertinent sites might shuffle positions slightly, but that's it. For more popular search terms, G, Y, and MSN are as bad as ever (moreso, because of the MFA's these days).

That said, there have been serious problems with canonicalization for the last three years, not only with G, but with Y as well (it first happened to me in Jun 2003 and I fixed it then). A lot of damage has been done by that, people aren't aware of it, and when they do become aware of it, things are so messed up it takes ages to get it straightened out. That's always the first place to look for problems.

Anyway, for those of you who are having problems because of fallout from G's ongoing battle with the spammers - best of luck.

Richie0x

3:53 am on Apr 22, 2006 (gmt 0)

My pages are also dropping from the index. Travel and tourism.

Actually for one of my main keywords, the top three results are all owned by one person. They are different domain names but they are all the same site, so duplicate content. To make matters worse, they all have AdSense on them. I'm extremely annoyed that Google are favoring those sites over my legitimate site. Grr.

whozyodaddy

7:49 am on Apr 22, 2006 (gmt 0)

Same here. I just checked today how many indexed pages I have... only a couple. I used to have hundreds. Now I'm pretty worried.

They will be re-indexed eventually, right?

EDIT: Could this: [tech.cybernetnews.com...]
be a reason as to why our pages were removed?

tigger

11:43 am on Apr 22, 2006 (gmt 0)

>They will be re-indexed eventually, right?

sure as hell hope so I'm 30% down in traffic due to dropped/removed pages!

I can't see how its the proposed new layout changes that is causing this more (hopefully) a bid daddy bug that has dropped data and hopefully once a full crawl has been done the dropped pages will return - but like the rest I'm just guessing

Blueshadow

12:01 pm on Apr 22, 2006 (gmt 0)

2 of my site uses google sitemap and both are dropping in pages and both are showing old supplemental pages that don't even exist anymore.

My other sites that don't use sitemap still have the same pages index with no old (unexist)supplemental pages.

so i think that google sitemap is the problem. does anyone else have the same experience?

LuckyGuy

1:18 pm on Apr 22, 2006 (gmt 0)

Beeing hit hard, too on that friday. Went down from 11,000 pages ( in real about 6,000 ) to 650.Now up to 1280 pages. Not only that they deindex thousands of pages, they although tweaked the algo. For some of my 2 word searchstrings i was degraded down from #1 to now 5+ or not to be found. Instead i found heavy Duplicate Content sites an sites that have nothinbg in common with the keys. If i�m right i saw the same index about 3 or 4 weeks ago.

Now to googles effort to index the whole web!? Why should they deindex thousands of pages? This would bring them in leeway to their competitors yahoo and msn.

Blueshadow

1:20 pm on Apr 22, 2006 (gmt 0)

google = screw up right now

LuckyGuy

1:41 pm on Apr 22, 2006 (gmt 0)

Made just some inquiries in matter of the lost pages.

If i do a site: search all the 3rd level pages are gone. The pagecount gives back the right number of indexed pages.

If i do a site: search with the option to search for a 3rd level page, it will be found, but as supplimental.

So could it be that all our pages are in supplemental index and the site: search will not including the pages in the supplimental index?
And if so? Is it a bug or are our 3re level sides condemned to live their weblive in supp hell for ever?

Martin Galloway

1:59 pm on Apr 22, 2006 (gmt 0)

To arbitrary:

Thank you for your concern, I really was just kidding. One of the downfalls of text only on a forum is the inability to see the posters expression. In this case, my eyebrows jumping up and down with a wicked grin on my face.

It is a kind warning from you however, about Copyright. Fortunately Arbitrary, I have been involoved in photography since 1972 as a young boy and fully appreciate the implications of copying someone elses stuff. Akin to running through a pool of petrol with a match eh?

I do believe that there are many ways to make money from websites that infringe upon the "dodgy" side of life. However I am not the type of soul to flat-out steal from others, or trick the engines. Sooner or later they will figure you out, take down your IP address, note your hosting Company and you are buried forever.

I'd prefer to have original content, let's face it who wouldn't? Originality however takes a long long time to procure and this is where the "assistance" of bending the rules comes to fruition.

I have to say that I am eternally grateful for finding this forum, for there has been no trickery in the willingness of folks to offer support and advice.

For that I thank you all, for your interest in setting a newbie to the World of SEO and SERPS on the right path.

My appreciation doesn't stop at the end of your posts. I remember your advice the following days and weeks, but mostly when I am just about to press that "put" button in Dreamweaver.

Thank you.

Martin.

arbitrary

5:43 pm on Apr 22, 2006 (gmt 0)

Martin, nice to hear from you. I figured you were not so serious about copying, I detected a good guy in there ;-) Best of luck to you.

Good sites are not being fairly rewarded these days. I do think Google's inability to discern the good from the bad is causing many to go over to the dark side.

tigger

5:53 pm on Apr 22, 2006 (gmt 0)

having been in this game for 10 years producing good clean sites it does seem the only way to play at G's game is mass produced spamm on multiple sites - if thats the game G wants us to play then I'm just about ready to start throwing up sites

Its a sad day for me - wheres that black hat

mattg3

6:00 pm on Apr 22, 2006 (gmt 0)

4 days ago I had the highest uniques, now I have the lowest uniques since months. Last week was low, before that high, before that even higher a week, before that kinda stable for 6 months. On the english site, same idea less work, slow increasing although it dropped 90% of all pages out of the index. Sense this does make not, young padawan.

Forplaz

6:46 pm on Apr 23, 2006 (gmt 0)

Has anyone started to see there pages getting reindexed? Mine are still 75% mia.

arbitrary

6:55 pm on Apr 23, 2006 (gmt 0)

Mediabot, the AdSense crawler is now cralwing for the main index too. This has been acknowledged by Matt Cutts. Matt Cutts has offered the reason as being 'to save bandwidth'.

I find it odd that just when Google is losing pages, Mediabot is spidering for the main index. Mediabot crawling is not about saving bandwidth. Yeah, BigDaddy is supposed to be complete but I think they are having big problems with it. Problems like having pages disapper from the index and having Mediabot pull up the slack.

When BigDaddy was complete, we were supposed to get a cannonical fix too. Well that hasn't happened.

Saving bandwidth with mediabot, now that is a spin worthy of the political landscape.

cbartow

7:05 pm on Apr 23, 2006 (gmt 0)

Forplaz:

I'm not losing anymore pages. Googlebot is visiting but it's still being picky about adding new pages and updating old ones.

vordmeister

7:08 pm on Apr 23, 2006 (gmt 0)

I was feeling silly that the start of February was a bad time to change over from fully indexed phpbb to vbulletin. I even made the effort to 301 from old posts and threads to new ones.

This provides a useful perspective. Yahoo has managed to include much of my new forums in their index, but MSN and Google haven't yet.

It's been observed in WebmasterWorld that Google started Big Daddy with an old index. Even so they are level pegging with MSN. Assuming by changing my sites around I've put myself on a level playing field with other large site that have been lost here, I'd say two things. First I'm not doing badly, and second useful content will find it's way back into Google once they've gotten around to indexing.

Thye second thing is speculation of course. But I'm convinced I'll be back with my new and very useful forum.

walkman

9:09 pm on Apr 23, 2006 (gmt 0)

very busy Googlebot, almost 80% of my 1400+ pages have been spidered today

nippi

9:30 pm on Apr 23, 2006 (gmt 0)

From the recent guest post on Matt cutts, its seems confirmed that google has, as I thought a month ago, wound up the duplicate content filter to the max. I guessed this when i lost all my product pages 6 weeks ago, that did not have a long description. ONly the short ones disappeared, those without sufficient different content to not get hit by the filter.

All had unqiue titles, kewyords etc. but clearly not enough

I now have all pages reindexed, by placing a whole of heap of total random #*$!e on the pages so they are unique. The content is total shite, the pages are now less user friendly, less relevant, but they are all back up there in the index.

Best of all? Most of my competition has been wiped out by Google, so all of a sudden I have the top search results for about 7000 products. Thankyou google. You are truly wise, keep on keeping on.

Google, its stupid. You've wound it up too high. Good for me, sure, but its wrong, wrong wrong.

A product listing, is not duplciate content just because the page it is listed on is only 3-4 % different that the other shopping pages on the site. its NOT right to try and place all these products on one page, they won;t fit, they need to go on seperate pages.

Freedom

9:38 pm on Apr 23, 2006 (gmt 0)

Okay, are we talking about those who use templates that are the same throughout 100s or 1,000s of pages? You're saying Google's duplicate content filter is turned up so much that it's interpreting template based pages as too similar to others and is then dropping them?

If so, that would make a little sense, except mine do have unique content on them, apparently not enough.

nippi

12:56 am on Apr 24, 2006 (gmt 0)

its the not enough part.

Google is trying to cull content pages, added for the purposes of bulking up sites.

eg.

opening up your forms to search engines creates hundreds of thousands of pages, many are very similar.

Products pages. you may have unique titles, keywords etc, but if your product descriptions are short, google is questioning why does the product deserve its own page with so little infomration?

Stefan

1:28 am on Apr 24, 2006 (gmt 0)

having been in this game for 10 years producing good clean sites it does seem the only way to play at G's game is mass produced spamm on multiple sites - if thats the game G wants us to play then I'm just about ready to start throwing up sites

Tigger, don't do it. Regroup and figure things out, but don't move to the dark side - you know all that crap will be caught eventually, and then all you'll be working with is grab and dash.

Okay, are we talking about those who use templates that are the same throughout 100s or 1,000s of pages? You're saying Google's duplicate content filter is turned up so much that it's interpreting template based pages as too similar to others and is then dropping them?

That certainly looks like a possibility, judging by some of the posts. G prefers pages that are unique. Sites with many thousands of very similar pages might be getting flagged as spam.

Products pages. you may have unique titles, keywords etc, but if your product descriptions are short, google is questioning why does the product deserve its own page with so little infomration?

Good observation, nippi. The fact is that those pages don't deserve their own URL, they're only there to run up the page total (this approach used to work). I'm regularly amazed by the number of WW members who forget about G's ongoing war with spam/scraper/pseudo-directory sites and never factor this into their SEO methods. If you look like a spammer, even if you're not one, you can expect problems eventually.

mattg3

1:49 am on Apr 24, 2006 (gmt 0)

google is questioning why does the product deserve its own page with so little infomration?

How would a computer program actually know that FG-2323 -FDS 3GB is a product? Maybe some bayesian filter, but is this feasible on total webword content?

Likely is that G has very little chance to distinguish product FG-2323-FDS 3GB from random letter junk FH-2323-FDS 3FB.

Even a human would have to actually be told what is what.

This 168 message thread spans 6 pages: 168