Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Pages Dropping Out of Big Daddy Index

Part 2

         

GoogleGuy

7:59 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Continued from: [webmasterworld.com...]


internetheaven, you said:

I had 20,300 pages showing for a site:www.example.com search yesterday and for the past month. Today it dropped to 509 but my traffic is still pretty constant. I normally get around 4,500 - 5,000 to that site per day and today I've already got 4,000.

So, either Google doesn't account for even a small percentage of my traffic (which I doubt) or the way Google stores information about my site has changed. i.e. the 20,300 pages are still there, Google will only tell me about 509 of them. As far as I can tell, I think the other pages have been supplemented.

That resonated with something that I was talking about with the crawl/index team. internetheaven, was that post about the site in your profile, or a different site? Your post aligns exactly with one thing I've seen in a couple ways. It would align even more if you were talking about a different site than the one in your profile. :) If you were talking about a different site, would mind sending the site name to bostonpubcon2006 [at] gmail.com with the subject line of "crawlpages" and the name of your site, plus the handle "internetheaven"? I'd like to check the theory.

Just to give folks an update, we've been going through the feedback and noticed one thing. We've been refreshing some (but not all) of the supplemental results. One part of the supplemental indexing system didn't return any results for [site:domain.com] (that is, a site: search with no additional terms). So that would match with fewer results being reported for site: queries but traffic not changing much. The pages are available for queries matching the supplemental results, but just adding a term or stopword to site: wouldn't automatically access those supplemental results.

I'm checking with the crawl/index folks if this might factor into what people are seeing, and I should hear back later today or tomorrow. In the mean time, interested folks might want to check if their search traffic has gone up/down by a major amount, and see if there are fewer/more supplemental results for a site: search for their domain. Since folks outside Google couldn't force the supplemental results to return site: results, it needed a crawl/index person to notice that fact based on the feedback that we've gotten.

Anyone that wants to send more info along those lines to bostonpubcon2006 [at] gmail.com with the subject line "crawlpages" is welcome to. So you might send something like "I originally wrote about domain.com. I looked at my logs and haven't seen a major decrease in traffic; my traffic is about the same. I used to have about X% supplemental results, and now I hardly see any supplemental results with a site:domain.com query."

I've still got someone reading the bostonpubcon email alias, and I've worked with the Sitemaps team to exclude that as a factor. The crawl/index folks are reading portions of the feedback too; if there's more that I notice, I'll stop by to let you know.

[edited by: Brett_Tabke at 8:07 pm (utc) on May 8, 2006]

ClintFC

1:31 am on May 10, 2006 (gmt 0)

10+ Year Member



Not only are some sites having less pages appear in the index (these are the "experimental" and "cleanup" datacentres as far as I can tell)

Just to be crystal clear: The missing pages problems are accross all datacentres. That is, sites that are effected by the bug, see 95%+ of their pages dropped from Google's index on all datacentres (obviously there are the usual slight variations from DC to DC).

Steph_R

1:36 am on May 10, 2006 (gmt 0)

10+ Year Member



Is there anything that G has proposed to do in order to help the webmasters that have been affected? I know an email address has been provided, but after reading this post it seems that nothing has been done for those that sent in examples of domains affected by this bug. What to do?

ClintFC

1:37 am on May 10, 2006 (gmt 0)

10+ Year Member



"Fourteen of Google's top executives and directors sold $4.4 billion worth of stock last year...founders Sergey Brin and Larry Page, each of whom sold about $1.3 billion worth of stock."

I guess they saw Big Daddy coming.

Right Reading

1:41 am on May 10, 2006 (gmt 0)

10+ Year Member



I've been aggressively creating new pages in an effort to make my site more appealing to visitors. This is a frustrating exercise since G not only refuses to index the new material but also continues to show only about 20% of the pages that used to appear in its index.

Steph_R

1:45 am on May 10, 2006 (gmt 0)

10+ Year Member



What you are describing is the same thing that many webmasters have experienced in the last month or so with G.

Again, I ask....what has G. proposed in order to help these webmasters? I know they have announced an email addresss for webmasters to provide examples, but what else? Anything? Hello....is anyone there?

Relevancy

1:52 am on May 10, 2006 (gmt 0)

10+ Year Member



They are doing nothing more then looking at the issue to see if there is a problem. I am betting they know exactly what is going on. Big Daddy = death of crap back links/band aid for capcity issues.

guru5571

1:56 am on May 10, 2006 (gmt 0)

10+ Year Member



One thing I have never seen mentioned in WW is Big Table, which was the supposed name of a proprietary database that Google started developing last year. O'Reilly's Radar recently mentioned Big Table again and I was wondering if Big Daddy is simply another name for Big Table. It seems like a new proprietary database could match Matt Cutt's description of new infrastructure. What do you think?

ClintFC

2:04 am on May 10, 2006 (gmt 0)

10+ Year Member



Again, I ask....what has G. proposed in order to help these webmasters? I know they have announced an email addresss for webmasters to provide examples, but what else? Anything? Hello....is anyone there?

The simple answer is: Nothing, beyond the email address.

Google run a very tight ship when it comes to disseminating information. While this policy has many obvious advantages, it has some serious downsides as well. When a serious bug is introduced, the lack of communication, both within Google and with the outside world, can seriously hamper their ability to identify and fix the problem. Maintaining the high level of secrecy that they do, requires a great deal of "need-to-know" segmentation. I'm certain that only a very small handful of Google employees have the full picture of exactly what is going on. How many of Google's employees have a birdseye view of all of the changes encompassed by "Big Daddy"? I don't know the answer, but I'd guess it is a tiny, tiny, number. What chance then, of identifying and sorting out the current problems?

moftary

2:52 am on May 10, 2006 (gmt 0)

10+ Year Member



Well - One site of the badly affected site of pages dropping has seen an increase from 10 to 600 indexed pages.

I suggest webmasters not to change their sites structure, linking, etc.. It's a matter of time IMO.

trinorthlighting

3:02 am on May 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I noticed as i have been adding quality pages, I lose a few the following week, then after that they slowly come back.

One thing I did notice, I had a bunch of old 404 pages from last august dump in the supplemental index and suddenly my good pages disappeared.

May be I am getting hit with a duplicate pentalty due to these old and outdated 404 pages caches that all the sudden showed in the index.

tigger

7:41 am on May 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>The simple answer is: Nothing, beyond the email address.

Well I'm shocked I got a reply to the email I sent in, basically told me I didn't have a canonical problem and suggested I used the G site maps! Although I'm not using G site maps on this site I do have my own SM which in the past have always done there job well and I really think to tell a webmaster just to use their SM is a little lame considering that prior to them dropping pages which for me started about 3 weeks ago and lack of being able to crawl sites I never had problems getting content crawled

whitenight

7:57 am on May 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No offense but that sounds like a more personalized form of the usual canned form email.

In other words, we don't know.
And even if we did know were not telling.

Same ole. Same ole.

arubicus

7:57 am on May 10, 2006 (gmt 0)

10+ Year Member



Good for you tigger...well on a response part. I wish I can get one. All I want to know if I need to fix something (penalty or whatnot) or if it is a Google problem and for us to sit tight. That is all I ask! Don't need details really just some sort of direction to go here. After a year of waiting and starting a comeback...this past couple of month has been like finally getting a bike tire fixed so you go out for a joy ride and getting up to speed when someone runs out and shoves a broomstick in your front spokes...ouch!

tigger

8:06 am on May 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>No offense but that sounds like a more personalized form of the usual canned form email

I do agree the email didn't really offer any answers, other than use Gmaps! but I've replied back to it so it will be interesting to see if I get a more detailed reply back

whitenight

8:22 am on May 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I do agree the email didn't really offer any answers, other than use Gmaps! but I've replied back to it so it will be interesting to see if I get a more detailed reply back

Well do keep us updated if they do.
As Arubicus said, any info on whether it's something we can "fix" or it's something on their end would be heaven sent.

This 249 message thread spans 17 pages: 249