Welcome to WebmasterWorld Guest from

Message Too Old, No Replies

Pages Dropping Out of Big Daddy Index

Part 2



7:59 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Continued from: [webmasterworld.com...]

internetheaven, you said:

I had 20,300 pages showing for a site:www.example.com search yesterday and for the past month. Today it dropped to 509 but my traffic is still pretty constant. I normally get around 4,500 - 5,000 to that site per day and today I've already got 4,000.

So, either Google doesn't account for even a small percentage of my traffic (which I doubt) or the way Google stores information about my site has changed. i.e. the 20,300 pages are still there, Google will only tell me about 509 of them. As far as I can tell, I think the other pages have been supplemented.

That resonated with something that I was talking about with the crawl/index team. internetheaven, was that post about the site in your profile, or a different site? Your post aligns exactly with one thing I've seen in a couple ways. It would align even more if you were talking about a different site than the one in your profile. :) If you were talking about a different site, would mind sending the site name to bostonpubcon2006 [at] gmail.com with the subject line of "crawlpages" and the name of your site, plus the handle "internetheaven"? I'd like to check the theory.

Just to give folks an update, we've been going through the feedback and noticed one thing. We've been refreshing some (but not all) of the supplemental results. One part of the supplemental indexing system didn't return any results for [site:domain.com] (that is, a site: search with no additional terms). So that would match with fewer results being reported for site: queries but traffic not changing much. The pages are available for queries matching the supplemental results, but just adding a term or stopword to site: wouldn't automatically access those supplemental results.

I'm checking with the crawl/index folks if this might factor into what people are seeing, and I should hear back later today or tomorrow. In the mean time, interested folks might want to check if their search traffic has gone up/down by a major amount, and see if there are fewer/more supplemental results for a site: search for their domain. Since folks outside Google couldn't force the supplemental results to return site: results, it needed a crawl/index person to notice that fact based on the feedback that we've gotten.

Anyone that wants to send more info along those lines to bostonpubcon2006 [at] gmail.com with the subject line "crawlpages" is welcome to. So you might send something like "I originally wrote about domain.com. I looked at my logs and haven't seen a major decrease in traffic; my traffic is about the same. I used to have about X% supplemental results, and now I hardly see any supplemental results with a site:domain.com query."

I've still got someone reading the bostonpubcon email alias, and I've worked with the Sitemaps team to exclude that as a factor. The crawl/index folks are reading portions of the feedback too; if there's more that I notice, I'll stop by to let you know.

[edited by: Brett_Tabke at 8:07 pm (utc) on May 8, 2006]


8:04 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

GoogleGuy! Heh, can you confirm that old Supplemental pages with cache dates in 2004, and showing snippets for stuff no longer on the page, or showing for pages that are 404 or domain expired, or have been returning a 301 for a year, have finally been thrown away?

I see them missing from some datacentres now. Is that intentional (I hope it is), or a glitch that they have disappeared, but you plan on putting them back in? (I hope not).


GoogleGuy, can you point to any specific datacentre IP addresses where the "missing Supplemental Results problem" is most appearent? Google has very different results on some IPs at the moment.


To username Relevancy (last post in previous thread):
I am told that it isn't a problem, but I would make them more diverse than that.
Maybe "big" and "old" sites can get away with it, but newer sites cannot?

[edited by: g1smd at 8:30 pm (utc) on May 8, 2006]


8:09 pm on May 8, 2006 (gmt 0)

10+ Year Member


What about the fact that new pages are not being indexed? Or is that only a problem with fairly newer sites? Will there be a chance that those supp pages that were dropped might get re-indexed as non-supp pages?


8:18 pm on May 8, 2006 (gmt 0)

10+ Year Member


Do you want results which have sites with all pages crawled as www. but have supplementals for the non-www.?

These are all 301 redirected pages now showing supplemental for the non-www. including the homepage.

The site also shows an outdated DMOZ listing as the title rather than the current tite or the current DMOZ title.


8:40 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

I see a big problem with "phantom duplicate content", in a site:domain.com search, brewing in the last few days. I first saw it reported at least several days ago, but have only seen it for myself in the last couple of days. Today the effect is far worse than it was yesterday.

A site that had 150 pages fully indexed, has shown "1 to 120 of 150" in a site:domain.com search for very many months. All of the titles and meta descriptions on those 150 pages have been different for a long time, and the on-page content is also unique per page.

There have been no Supplemental Results for this site at the .com location in the last year or more (except for a couple of pages that were deleted a very long time ago). The old site:domain.co.uk search now shows either zero or 50 or so Supplemental Results depending on which datacentre that you look at. The .co.uk page URLs have all had a 301 redirect pointing to the matching .com pages for at least a year.

Previously, last year, when all those meta descriptions were exactly the same, Google used to show just "1 to 3 of about 120" in a site:domain.com search, and you needed to click on the "repeat this search with omitted results included" link to see any more.

A few days ago, in some DCs, the results were down to 1 to 40 of about 150, for this site, and I found that very odd. Clicking the "repeat search with omitted results included" link then revealed the rest of the pages, but every one of them now has exactly the same snippet --- the snippet of what it used to be 6 months or more ago, back when the meta descriptions were all identical.

Today, the site: search is down to "1 to 3 of about 150" again, and every result shows exactly the same (old) snippet in a site: search at or at for example.

I think this is a bug, or Google reverting to using old data for the snippet, or something.

Can you take a look?

[edited by: g1smd at 8:56 pm (utc) on May 8, 2006]


8:46 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member


"Today, the site: search is down to 1 to 3 of about 150 again, and every result shows the same snippet in a site: search. I think this is a bug, or Google reverting to using old data for the snippet, or something."

Any DC IP?


[edited by: reseller] I see you have just added the DCs IP to your post. Thanks, g1smd [/edit]


10:01 pm on May 8, 2006 (gmt 0)

5+ Year Member


We have seen a similar bug effect the way our site: search pages are displayed.

All 435 pages remain indexed but instead of a natural ordering and all pages available for view we now have only 45 available to view without clicking the "similar ommited pages" link.

This would appear to be because Google has started to use "Header" text including header image alt text as the snipet in about 350 of our pages. Obviously this is the same on these pages resulting in them not being displayed without clicking the similar pages link.

All have seperate content, title, description, keywords etc. They just use a standard Header via a template.

This is new as of yesterday, very odd and a glitch in my opinion.


10:06 pm on May 8, 2006 (gmt 0)

5+ Year Member

This is just great. Most of the people reporting the missing-pages problem have been seeing the problem for around two months solid. Obviously, we've all seen a significant drop in traffic to accompany the loss of 95%+ of our pages. The pages are completely missing. They are not supplemental. They deliver no phantom traffic. They are gone.

Then, one person see's a slight glitch on one particular data centre on one particular day, and this - copmpletely atypical example - is what "resonates"?


10:18 pm on May 8, 2006 (gmt 0)

10+ Year Member

Agree with Clint.

My site had 100,000++ pages, as soon as Big Daddy hit, 99% of the pages were gone.
99% of the hits gone too.

I had virtually no suppliment results before big daddy.

Still heavily crawled everyday (5-15K) but very few pages added, fluctuating around only 300 pages in the index for the last 1-2 months.


10:22 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

ClintFC, I found out about the site: not returning some supplemental results and then went reading WebmasterWorld. I noticed internetheaven's comment, then several other comments such as
when my pages dropped I saw a 30%/40% reduction in traffic and a few days ago I saw a small increase with command site:url not much from 148 to 204 but the odd thing is traffic has shot up in fact yesterday was my best day for a few months and according to my stats G traffic has increased by 50%

(from a moderator) and then someone else said

I have a solid established site (no funny business or dup etc) the has lost 75% of its pages on a site: search.
Traffic however is only fractionally down, and accountable now that the sun is coming out.

That was what I was noticing. I'm not saying that that's 100% of things. When I looked through the crawlpages feedback, I did see a few people with spam penalties, for example. That could also explain why a site would be crawled less.

g1smd, I would certainly say that the days of those older pages are numbered in that I expect a reindex of most of the supplemental results over time (although it could take a while).

optimist, I'd expect that supplemental non-www results would be refreshed then, so I wouldn't report those right now.


10:28 pm on May 8, 2006 (gmt 0)

5+ Year Member

Google Guy, i have notice that some of myspace users are using some of the contents from some of the pages on my site and those pages are being drop completely. i've already sent an email to bostonpubcon2006 regarding this problem. can you please tell me what are the best solution to resolve this problem?.


11:30 pm on May 8, 2006 (gmt 0)

10+ Year Member


I have written an e-mail to the bostonpubcon address again. (german portal Y ...).

I would be very pleased about an feedback.

Many thanks first of all for your help!


12:19 am on May 9, 2006 (gmt 0)

5+ Year Member


Thanks. I appreciate the opportunity you've extended for some specifics from our end.

I've sent an email to bostonpubcon2006 also, with some stats from both log entries and Google Analytics. I also have graphs comparing all organic search referrals oct04-thru-sep05 to oct05-thru-ytd06. Let me know if they'd be helpful.




1:47 am on May 9, 2006 (gmt 0)

WebmasterWorld Senior Member whitey is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

GG - No feedback has been recieved by us and I'm wondering why. It's not only pages dropping out of the index that's a worry.

I sent a re inclusion request via SiteMaps about 10 days ago [ sticky me if you want to ] , advising that we are receiving almost zero results from Google compared to pre Jul05, when we had strong results. At this time [ Jul05 ] we were hit by a hacker/180 day exclusion with an illicit robots.txt entry + the BD update / supplemental issues on the exit from this exclusion period around [Jan/Feb06 ].

Since then we have strange things happening on an ongoing basis. Here's just a few :

-Page No's wildly fluctuating on the DC's
-Supplementals appearing and disappearing
-Meta descriptions being ignored and restored
-Navigation replacing meta descriptions on index
-No results in preceeding positions - in fact they appear often below supplementals

Our site's have been thoroughly examined by an SEO and believed to be fully compliant since Jan06

9-10 months is a long time to be kept in silence and disruption.

I'd very much appreciate someone looking into this for us, if they are prepared to Sticky me.


2:39 am on May 9, 2006 (gmt 0)

5+ Year Member


I suspect that one of the reasons the "missing pages" problem is taking so long to get a handle on is because we Webmasters are blind to the root cause of the problem.

I believe that the problem is rooted in PageRank and/or Backlinks. Neither of which we can see accurately:

1. Immediately after Big Daddy was rolled out, people started to report "whacky" PRs, as reported by the Google Toolbar. I believe those funny's have now disapeared (I never saw any), but we have no way of knowing what has happened to the "real" PRs.

2. Post Big Daddy, a "link:www.mydomain.com" shows just one backlink to my site (even though there are many more). Again, since "link:" searches were changed a while back so that they no longer show the whole picture, these discrepencies can always be dismissed. I have no way of knowing if the backlinks are truly missing, or if I am simply unable to see them. I suspect that they may well be missing. Hence maybe my PR is now a lot lower than I am seeeing on the Toolbar. Certainly, before BD, a site: search showed many more backlinks for my site.

3. An incorrectly deflated PR, could explain why I see what I see. Only pages that I link directly from my Home page get indexed nowadays. As soon as I put a link to a page on my Home page, in it goes. Remove the Home page link, and out it goes. Maybe my "real" PR is now so low that it only merits indexing one level deep?

Has anyone at Google looked at the "real" PRs and the "real" backlinks lately? If I am right, and these have somehow gone wrong - maybe a lot of backlinks are now truly missing? And, therefore, the PRs are now innacurate?

This 249 message thread spans 17 pages: 249

Featured Threads

Hot Threads This Week

Hot Threads This Month