Forum Moderators: Robert Charlton & goodroi
I had 20,300 pages showing for a site:www.example.com search yesterday and for the past month. Today it dropped to 509 but my traffic is still pretty constant. I normally get around 4,500 - 5,000 to that site per day and today I've already got 4,000.So, either Google doesn't account for even a small percentage of my traffic (which I doubt) or the way Google stores information about my site has changed. i.e. the 20,300 pages are still there, Google will only tell me about 509 of them. As far as I can tell, I think the other pages have been supplemented.
That resonated with something that I was talking about with the crawl/index team. internetheaven, was that post about the site in your profile, or a different site? Your post aligns exactly with one thing I've seen in a couple ways. It would align even more if you were talking about a different site than the one in your profile. :) If you were talking about a different site, would mind sending the site name to bostonpubcon2006 [at] gmail.com with the subject line of "crawlpages" and the name of your site, plus the handle "internetheaven"? I'd like to check the theory.
Just to give folks an update, we've been going through the feedback and noticed one thing. We've been refreshing some (but not all) of the supplemental results. One part of the supplemental indexing system didn't return any results for [site:domain.com] (that is, a site: search with no additional terms). So that would match with fewer results being reported for site: queries but traffic not changing much. The pages are available for queries matching the supplemental results, but just adding a term or stopword to site: wouldn't automatically access those supplemental results.
I'm checking with the crawl/index folks if this might factor into what people are seeing, and I should hear back later today or tomorrow. In the mean time, interested folks might want to check if their search traffic has gone up/down by a major amount, and see if there are fewer/more supplemental results for a site: search for their domain. Since folks outside Google couldn't force the supplemental results to return site: results, it needed a crawl/index person to notice that fact based on the feedback that we've gotten.
Anyone that wants to send more info along those lines to bostonpubcon2006 [at] gmail.com with the subject line "crawlpages" is welcome to. So you might send something like "I originally wrote about domain.com. I looked at my logs and haven't seen a major decrease in traffic; my traffic is about the same. I used to have about X% supplemental results, and now I hardly see any supplemental results with a site:domain.com query."
I've still got someone reading the bostonpubcon email alias, and I've worked with the Sitemaps team to exclude that as a factor. The crawl/index folks are reading portions of the feedback too; if there's more that I notice, I'll stop by to let you know.
[edited by: Brett_Tabke at 8:07 pm (utc) on May 8, 2006]
Not only are some sites having less pages appear in the index (these are the "experimental" and "cleanup" datacentres as far as I can tell)
Just to be crystal clear: The missing pages problems are accross all datacentres. That is, sites that are effected by the bug, see 95%+ of their pages dropped from Google's index on all datacentres (obviously there are the usual slight variations from DC to DC).
Again, I ask....what has G. proposed in order to help these webmasters? I know they have announced an email addresss for webmasters to provide examples, but what else? Anything? Hello....is anyone there?
Again, I ask....what has G. proposed in order to help these webmasters? I know they have announced an email addresss for webmasters to provide examples, but what else? Anything? Hello....is anyone there?
The simple answer is: Nothing, beyond the email address.
Google run a very tight ship when it comes to disseminating information. While this policy has many obvious advantages, it has some serious downsides as well. When a serious bug is introduced, the lack of communication, both within Google and with the outside world, can seriously hamper their ability to identify and fix the problem. Maintaining the high level of secrecy that they do, requires a great deal of "need-to-know" segmentation. I'm certain that only a very small handful of Google employees have the full picture of exactly what is going on. How many of Google's employees have a birdseye view of all of the changes encompassed by "Big Daddy"? I don't know the answer, but I'd guess it is a tiny, tiny, number. What chance then, of identifying and sorting out the current problems?
One thing I did notice, I had a bunch of old 404 pages from last august dump in the supplemental index and suddenly my good pages disappeared.
May be I am getting hit with a duplicate pentalty due to these old and outdated 404 pages caches that all the sudden showed in the index.
Well I'm shocked I got a reply to the email I sent in, basically told me I didn't have a canonical problem and suggested I used the G site maps! Although I'm not using G site maps on this site I do have my own SM which in the past have always done there job well and I really think to tell a webmaster just to use their SM is a little lame considering that prior to them dropping pages which for me started about 3 weeks ago and lack of being able to crawl sites I never had problems getting content crawled
I do agree the email didn't really offer any answers, other than use Gmaps! but I've replied back to it so it will be interesting to see if I get a more detailed reply back
Well do keep us updated if they do.
As Arubicus said, any info on whether it's something we can "fix" or it's something on their end would be heaven sent.