| This 249 message thread spans 9 pages: < < 249 ( 1 2  4 5 6 7 8 9 ) > > || |
|Pages Dropping Out of Big Daddy Index|
Continued from: [webmasterworld.com...]
internetheaven, you said:
|I had 20,300 pages showing for a site:www.example.com search yesterday and for the past month. Today it dropped to 509 but my traffic is still pretty constant. I normally get around 4,500 - 5,000 to that site per day and today I've already got 4,000. |
So, either Google doesn't account for even a small percentage of my traffic (which I doubt) or the way Google stores information about my site has changed. i.e. the 20,300 pages are still there, Google will only tell me about 509 of them. As far as I can tell, I think the other pages have been supplemented.
That resonated with something that I was talking about with the crawl/index team. internetheaven, was that post about the site in your profile, or a different site? Your post aligns exactly with one thing I've seen in a couple ways. It would align even more if you were talking about a different site than the one in your profile. :) If you were talking about a different site, would mind sending the site name to bostonpubcon2006 [at] gmail.com with the subject line of "crawlpages" and the name of your site, plus the handle "internetheaven"? I'd like to check the theory.
Just to give folks an update, we've been going through the feedback and noticed one thing. We've been refreshing some (but not all) of the supplemental results. One part of the supplemental indexing system didn't return any results for [site:domain.com] (that is, a site: search with no additional terms). So that would match with fewer results being reported for site: queries but traffic not changing much. The pages are available for queries matching the supplemental results, but just adding a term or stopword to site: wouldn't automatically access those supplemental results.
I'm checking with the crawl/index folks if this might factor into what people are seeing, and I should hear back later today or tomorrow. In the mean time, interested folks might want to check if their search traffic has gone up/down by a major amount, and see if there are fewer/more supplemental results for a site: search for their domain. Since folks outside Google couldn't force the supplemental results to return site: results, it needed a crawl/index person to notice that fact based on the feedback that we've gotten.
Anyone that wants to send more info along those lines to bostonpubcon2006 [at] gmail.com with the subject line "crawlpages" is welcome to. So you might send something like "I originally wrote about domain.com. I looked at my logs and haven't seen a major decrease in traffic; my traffic is about the same. I used to have about X% supplemental results, and now I hardly see any supplemental results with a site:domain.com query."
I've still got someone reading the bostonpubcon email alias, and I've worked with the Sitemaps team to exclude that as a factor. The crawl/index folks are reading portions of the feedback too; if there's more that I notice, I'll stop by to let you know.
[edited by: Brett_Tabke at 8:07 pm (utc) on May 8, 2006]
|Not only are some sites having less pages appear in the index (these are the "experimental" and "cleanup" datacentres as far as I can tell) |
Just to be crystal clear: The missing pages problems are accross all datacentres. That is, sites that are effected by the bug, see 95%+ of their pages dropped from Google's index on all datacentres (obviously there are the usual slight variations from DC to DC).
Is there anything that G has proposed to do in order to help the webmasters that have been affected? I know an email address has been provided, but after reading this post it seems that nothing has been done for those that sent in examples of domains affected by this bug. What to do?
"Fourteen of Google's top executives and directors sold $4.4 billion worth of stock last year...founders Sergey Brin and Larry Page, each of whom sold about $1.3 billion worth of stock."
I guess they saw Big Daddy coming.
I've been aggressively creating new pages in an effort to make my site more appealing to visitors. This is a frustrating exercise since G not only refuses to index the new material but also continues to show only about 20% of the pages that used to appear in its index.
What you are describing is the same thing that many webmasters have experienced in the last month or so with G.
Again, I ask....what has G. proposed in order to help these webmasters? I know they have announced an email addresss for webmasters to provide examples, but what else? Anything? Hello....is anyone there?
They are doing nothing more then looking at the issue to see if there is a problem. I am betting they know exactly what is going on. Big Daddy = death of crap back links/band aid for capcity issues.
One thing I have never seen mentioned in WW is Big Table, which was the supposed name of a proprietary database that Google started developing last year. O'Reilly's Radar recently mentioned Big Table again and I was wondering if Big Daddy is simply another name for Big Table. It seems like a new proprietary database could match Matt Cutt's description of new infrastructure. What do you think?
|Again, I ask....what has G. proposed in order to help these webmasters? I know they have announced an email addresss for webmasters to provide examples, but what else? Anything? Hello....is anyone there? |
The simple answer is: Nothing, beyond the email address.
Google run a very tight ship when it comes to disseminating information. While this policy has many obvious advantages, it has some serious downsides as well. When a serious bug is introduced, the lack of communication, both within Google and with the outside world, can seriously hamper their ability to identify and fix the problem. Maintaining the high level of secrecy that they do, requires a great deal of "need-to-know" segmentation. I'm certain that only a very small handful of Google employees have the full picture of exactly what is going on. How many of Google's employees have a birdseye view of all of the changes encompassed by "Big Daddy"? I don't know the answer, but I'd guess it is a tiny, tiny, number. What chance then, of identifying and sorting out the current problems?
Well - One site of the badly affected site of pages dropping has seen an increase from 10 to 600 indexed pages.
I suggest webmasters not to change their sites structure, linking, etc.. It's a matter of time IMO.
I noticed as i have been adding quality pages, I lose a few the following week, then after that they slowly come back.
One thing I did notice, I had a bunch of old 404 pages from last august dump in the supplemental index and suddenly my good pages disappeared.
May be I am getting hit with a duplicate pentalty due to these old and outdated 404 pages caches that all the sudden showed in the index.
>The simple answer is: Nothing, beyond the email address.
Well I'm shocked I got a reply to the email I sent in, basically told me I didn't have a canonical problem and suggested I used the G site maps! Although I'm not using G site maps on this site I do have my own SM which in the past have always done there job well and I really think to tell a webmaster just to use their SM is a little lame considering that prior to them dropping pages which for me started about 3 weeks ago and lack of being able to crawl sites I never had problems getting content crawled
No offense but that sounds like a more personalized form of the usual canned form email.
In other words, we don't know.
And even if we did know were not telling.
Same ole. Same ole.
Good for you tigger...well on a response part. I wish I can get one. All I want to know if I need to fix something (penalty or whatnot) or if it is a Google problem and for us to sit tight. That is all I ask! Don't need details really just some sort of direction to go here. After a year of waiting and starting a comeback...this past couple of month has been like finally getting a bike tire fixed so you go out for a joy ride and getting up to speed when someone runs out and shoves a broomstick in your front spokes...ouch!
>No offense but that sounds like a more personalized form of the usual canned form email
I do agree the email didn't really offer any answers, other than use Gmaps! but I've replied back to it so it will be interesting to see if I get a more detailed reply back
|I do agree the email didn't really offer any answers, other than use Gmaps! but I've replied back to it so it will be interesting to see if I get a more detailed reply back |
Well do keep us updated if they do.
As Arubicus said, any info on whether it's something we can "fix" or it's something on their end would be heaven sent.
In your original email to Google did you describe your problem as a "crawling" problem or an "indexing" problem? From their suggestion to try using a sitemap, it seems they are asuming that your missing pages are as a result of not being crawled.
This is not the usual symptom of the missing pages problem. The missing pages are crawled regularly, they just don't make it into the index.
PS: A Google sitemap won't help.
>>>>Well I'm shocked I got a reply to the email I sent in, basically told me I didn't have a canonical problem
Hmmmz - they appear to have changed something on your site though to correct the possible Canonical issue - so I am not sure why they have said you didn't have the problem.
Eg. I am sure that internal pages with the non-www within the site had PR0 - and now they have PR the same as the www...... (Not 100% sure as I cant remember)
I'm at a loss Dayo!
>In your original email to Google did you describe your problem as a "crawling" problem or an "indexing" problem? From their suggestion to try using a sitemap, it seems they are asuming that your missing pages are as a result of not being crawled
the title was as GG said "crawlpages" and within my email I explaned how pages used to get crawled and ranked that were no longer showning any cached info - so to tell me to use the SM is a little annoying
The latest Matt Cutts missives make for very depressing reading. He more or less dismisses all of the "problems" reported via GG's email address, and once again, refers to it as though it is a "crawling" issue:
|The crawl/index team checked into several reports and each time came up with other reasons why the site wouldn’t be crawled as much (e.g. the ‘next page’ url on one site wasn’t short; it was a total hairball with like 200 chars of params), and some supplemental results folks have been through the raw emails, which is how one of the site: changes was noticed. So far, about half of the feedback to the email isn’t about pages dropped. Of the other half, one factor is that several sites have spam penalties. Of the remaining feedback, the two site: changes were the only two that we noticed. We’re going to keep digging in, but people need to bear in mind that Bigdaddy does have different crawl priorities, so a site that had more pages indexed by the earlier Googlebot won’t necessarily have as many pages indexed in the future. But don’t get me wrong; we’re still going through the feedback to see if there’s anything else to be identified and improved. |
AND! what about pages that were ranking that have now vanished WHY! grrrrrrrr need to move on and work on some content that hopefully one day will get crawled and stick
With reference to MCs blog entry it does look a bit Grim.
"Of the remaining feedback, the two site: changes were the only two that we noticed."
What does he mean?
I don't know if this is good news or not for you guys ... but Google has been out crawling like mad for the last two days on my site. Maybe there is hope for you soon!?
I haven't experienced any of these problems, so can't help you there.
I'm beginning to think I'm in a rebuilding process. Before everything went haywire my site had about 600 good pages and 11,000 supplementals. As of late last week most of the DC's had me down to 17 pages and no supplementals. Now, I'm seeing a couple of pages come back each day with an occasional burst of crawl activity from GoogleBot. This has a strong resemblance to how my site first populated into the database over a year ago. Hopefully over the next few weeks I'll see a full rebuilding of my good content.
If the old supplemental results are gone then I'm happy. The supps for my site were of an early version of the site and I've been wanting them to be recrawled or go away for months. All of the pages pointed to have been 301 to new pages for over a year and in many cases the new pages had much more content than the old versions.
Hopefully the new crawl doesn't trip too many duplicate content triggers on my good content. The site is database generated with URL rewriting to create static URL's. The data involves geo-location so a lot of names will appear on multiple pages but in different orders on those pages. This probably plays hell with Google's dup content discovery process. Several hundred pages have additional content such as photos and commentary which make those pages highly unique, but several thousand just have the geo-data. I'm sure this is a large part of why a lot of my content ended up in the supps before. I'm noth bothered by that I just wish Google would figure out when those pages gain detail and bring them back to the active index. Sitemaps doesn't seem to be helping there as much as I would hope.
Anyway, enough rambling on. I'm not fretting too much but it looks like this update is going to take some time to recover from. Good thing this isn't my day job or I'd be eating beans and rice for a long time to come.
It's not the crawling that's the problem! The problem is that the pages are crawled, but not added to the index and further more, if they used to be in the index, they have steadily been dropped, until we are now left with just the homepage.
I'm sure there is no justifiable reason, why these pages have been removed. Every page validates, has no spam, no links to bad neighbourhoods etc, etc.
So basically, what Google are saying, is that if our pages aren't in the index, it's not their fault, - it's ours!
If that is the case, then they should make some mechanism for informing us of the reason why they have dropped the pages. After all, it was a 'decision' that was taken, either by a human, or a machine, which ever, the machine or the human knows why, so is it too much to ask, for them to add the status into sitemaps?
This is so disappointing. I guess if they are not going to fix it, because they don't believe it is broken, we'll just have to forget about Google and rely on the other search engines. I for one will not continue to promote Google while I'm not in their index. I've already closed my Adwords account and Adsense is next as soon as I have setup an alternative. It does not make sense to promote a search engine in which you can't get your own websites listed!
I don't see any point in maintaining the sitemap either and besides.. why should Google need a sitemap, when Yahoo and MSN manage to find my deepest pages, without one? Google know that the pages are there, if they refuse to include them, a sitemap is not going to help. In fact, it was only 3 weeks after I set up my sitemap that my site began to be dropped from the index!
I have 2 sites out of about 20 or so in our network that got effected by this like back in Feb and are still totally screwed up in google. Homepage is in but not cached, PR is there but no ranking and all pages where dropped out then a handful, like 10% where put back in April but perform horribly.
One site is PR 7 and one site is PR 6, both are about 6 years old with thousands of IBL's from authority sites linking to our 100% custom content for each sites given verticals that took 5 plus years for our experts to write up. Some things I have come to the conclusion on.
- PR isn't important in regards to the issue, no difference in the problem from our pr7 or pr6 site.
- No 301 issues on either site, they are on the same server with 20 other sites of ours and no others have been effected, despite interlinking of the network.
- Authority systems seem to be irrelevant. We have scrapers by the tens of thousands weekly crawling and spamming our content out and now, these pr 1 or 0 sites with little more then 1 ibl are crushing our 2 sites in the serps for almost all positions and our company name. This despite having PR7, thousands of IBL's, clean seo, no changes to the properties in years.
I have given up monitoring the subject, we have emailed them and been told we do not have any penalties put against the properties, we have had our coders pour hundreds of hours looking into the properties to make sure this isn't a mistake on our part and they have come to the conclusion it is not. This is some form of crazy manipulation on Googles part that makes no sense whatsoever.
This hasn't shattered my confidence in Google but is most definitely a eye opener. I don't blame Google like others here, things happen and after all it is there search engine and they can do what they want but it is fairly upsetting to us to see properties we poured our hearts into and "played by there rules" on get destroyed in less then 30 days.
|We’re going to keep digging in, but people need to bear in mind that Bigdaddy does have different crawl priorities, so a site that had more pages indexed by the earlier Googlebot won’t necessarily have as many pages indexed in the future. But don’t get me wrong; we’re still going through the feedback to see if there’s anything else to be identified and improved. |
After re-reading and re-reading MC response to Donna, it seems pretty obvious that they simply don't know what the problem is. And while the sudden appearance of GG is the largest indicator that G acknowledges there MIGHT BE a problem, it sounds like they no clue where to begin.
|This is so disappointing. I guess if they are not going to fix it, because they don't believe it is broken, we'll just have to forget about Google and rely on the other search engines. I for one will not continue to promote Google while I'm not in their index. I've already closed my Adwords account and Adsense is next as soon as I have setup an alternative. It does not make sense to promote a search engine in which you can't get your own websites listed! |
From the looks of things, the new Big Daddy "algo" (for lack of better word) percieves most every site as having a spam penalty. If we go by MC/GG explanation, all of us are incurring some type of spam penalty which is resulting in our pages being dropped.
Does that make any sense?
I don't see any problems with crawling. I have recovered a few hundred pages in the last week or so, but now I appear to be losing some again.
Meanwhile Yahoo, MSN and Ask all list all of my pages. As has Google in the past. If giving a Google a sitemap is going to help, then they need to explain why.
Also filing a reinclusion request is often quoted as well but to do that you have to admit to being a reformed spammer and then beg for repentance.
Hi - I don't know if this is relevant. However, I just noticed that when doing a "find web pages that "contain the term" www.mysite.com" the results for my site have dropped from about 190 for the last month or two to 21. I did this search and got 190 two days ago but as of this morning a 171 drop to 21. Has anyone else noticed this? If Google is dropping pages could it create a cycle - as more pages drop, more sites lose their ibl's or are the links actually still there and is the "contains the term" option only as accurate as the site command? Anyway based on what I've been seeing lately it will look completely different tomorrow!
Well, I've reached the point of nostalgia now.
Two or three times a day, I go to Yahoo, MSN, and Ask.com so I can see my previously highest earning, best Google referral URL in the top 10. Because it isn't even in Google anymore since the robots.txt "issue" of April 11-12.
But maybe that's not relevant.
| This 249 message thread spans 9 pages: < < 249 ( 1 2  4 5 6 7 8 9 ) > > |