Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Pages Dropping Out of Big Daddy Index

         

GoogleGuy

6:11 am on Apr 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Continued from: [webmasterworld.com...]


One thing to bear in mind is that Bigdaddy will have different crawl priorities. That can account for some of it. If you've run into any spam problems in the past, you might also want to do a reinclusion request. Otherwise, please send an email to bostonpubcon2006 at gmail.com with the subject line "crawlpages" (all one word), and I'll ask someone to see if they notice any commonalities.

g1smd

11:47 pm on May 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see that a lot. Get the other "pages" to either return "404" status or else get a 301 redirect in place pointing to the one that you want to be listed.

bonneville

12:08 pm on May 5, 2006 (gmt 0)

10+ Year Member



I posted this opinion already on the "huge machine crisis"-thread, but i think here are more involved people.

I don't think, the "dropping sites fast" thing is a technical or "run out of disk space" error.

Remember that we all saw a BigDaddy Index with million of pages indexed from our Websites. This Index ran on the Infrastructure Jan/Feb 2006. Why should the googlers shut down so many mchines, that their core-business "searching and indexing websites" go down to 10% of its power/capability?

So i think they did a major bug or a huge problem merging some databases (the old one aka "Supplemental #*$! 2004" and the new one aka "BigDaddy Data").

Another idea could be, that they show us a MiniGoogle (1-10% of old Indexdata) to have enough idleness and Serverpower for preparing the Mega-Super-Spam-Free BigDaddy-Index in the backround.

Conclusion: Sit down and wait and dont waste time with DC-Watchin'!

Grettings from sunny Germany,

Bonneville

mattg3

3:55 pm on May 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another idea could be, that they show us a MiniGoogle (1-10% of old Indexdata) to have enough idleness and Serverpower for preparing the Mega-Super-Spam-Free BigDaddy-Index in the backround.

This does not seem totally logical to me as GG in post one states:

One thing to bear in mind is that Bigdaddy will have different crawl priorities. That can account for some of it.

So I would assume from this statement, what we see now is BD, as the statement admits a connection between BD and hurt sites.

irhusker

4:21 pm on May 5, 2006 (gmt 0)

10+ Year Member



Hello all. I'm new to this forum but I'm really enjoying your posts.

I must be in a minority here. I have several sites that I'm seeing them increase greatly in the number of pages G is indexing. Recently one went from 300-650 and another from 2-350. All of this in the last month.

However my new sites seem to be stuck in sandbox land after 4 months.

youfoundjake

4:52 pm on May 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The good news is that my pages are starting to show back up in the data centers, big daddy already has 7 out of 100, in less then 12 hours.
For those that suffered way worse then me, hang in there, its almost over.
I am currently checking on [72.14.203.99...]

nedguy

4:53 pm on May 5, 2006 (gmt 0)

10+ Year Member



Heads up.

MC has just posted a comment in his snuffle snuffle blog

[mattcutts.com...]

Looks like those emails we all sent GoogleGuy are getting some serious considerations.
----

last week when I checked there was a double-digit number of reports to the email address that GoogleGuy gave (bostonpubcon2006 [at] gmail.com with the subject line of “crawlpages”).

I asked someone to read through them in more detail and we looked at a few together. I feel comfortable saying that participation in Sitemaps is not causing this at all. One factor I saw was that several sites had a spam penalty and should consider doing a reinclusion request (I might do it through the webmaster console) but even that wasn’t a majority. There were a smattering of other reasons (one site appears to have changed its link structure to use more JavaScript), but I didn’t notice any definitive cause so far.

There will be cases where Bigdaddy has different crawl priorities, so that could partly account for things. But I was in a meeting on Wednesday with crawl/index folks, and I mentioned people giving us feedback about this. I pointed them to a file with domains that people had mentioned, and pointed them to the gmail account so that they could read the feedback in more detail.

So my (shorter) answer would be that if you’re in a potentially spammy area, you might consider doing a reinclusion request–that won’t hurt. In the mean time, I am asking someone to go through all the emails and check domains out. That person might be able to reply to all emails or just a sampling, but they are doing some replies, not only reading the feedback.

-----

Relevancy

5:06 pm on May 5, 2006 (gmt 0)

10+ Year Member



youfoundjake, are these pages coming back all saying supplemental results? The only time I see pages come back are when they throw the missing ones into supp index for a day or so then they drop again.

youfoundjake

6:17 pm on May 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



youfoundjake, are these pages coming back all saying supplemental results? The only time I see pages come back are when they throw the missing ones into supp index for a day or so then they drop again.

3 of the seven pages are supp, the other 4, including my index are showing listed, but, it is showing my index as a older index in their cache, sigh...

Relevancy

6:22 pm on May 5, 2006 (gmt 0)

10+ Year Member



The homepage will never go in the supp index. Having pages in the supp index is just as bad if not worse then not having them indexed. Now those pages in the supp index might get trapped forever instead of non-indexed pages eventually getting indexed.

300m

6:37 pm on May 5, 2006 (gmt 0)

10+ Year Member



it sounds more and more like the index/crawl team needs to have more exposure as they seem to be the people to communicate with. I appreciate everything that Matt does, but for the past several months everytime I have an issue in google I always hear that its an unrelated data refresh. Now it seems that he is introducing the idea that the index/crawl team need to get more involved by reading the feedback etc.

With that being said would it not make more sense for them to have a "call to action" blog or even a guest post explaining what they are doing as it seems as the past several months go by they are being mentioned by Matt more and more. It seems unfair to keep on after Matt on index/crawl issues when he is the webspam guy.

Again, I really appreciate everything mat has done, but I would like to hear more from the index/crawl team in general as I think it would answer or address a lot of the concerns some may have.

walkman

7:25 pm on May 5, 2006 (gmt 0)



now I have about 700 pages in the index...from 50 or so. Still about 800 missing, but Gooogle has been crawling lately. I get ~200 GoogleBot visits these days (was getting 10-30 a day for the past few months)

Relevancy

7:45 pm on May 5, 2006 (gmt 0)

10+ Year Member



Again are these supp results or real indexed pages?

walkman

7:54 pm on May 5, 2006 (gmt 0)



real. No more supps

kiwiwm

3:20 am on May 6, 2006 (gmt 0)

10+ Year Member



I'm glad for anyone who is getting pages back - for me my page numbers are still up and down like a brides nightie. Every day this week when I have done site:xyz.com - I get a different answer lol Interestingly in the site:xyz.com results I see my main / page with the dmoz description but when I do keyword searches I find my page with my meta description - but I have never seen the page with the meta description come up when I have used the site command. Go figure.

tigger

5:51 am on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not jumping up and down yet but I am seeing a small recovery when using the site:www. command - so far this week moved from 148 - to 206 thats a long way off the 600 pages that should be indexed and I'm still not getting new content in despite being linked from the index but its a start

RichTC

1:28 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



100% convinced now that Google has a problem with the new bot and its cash system following the infastructure upgrade.

A page that was 6 in the serps for a two keyword search (not big money) vanished a few days back. We had added some fresh content to the page so i originally assumed that perhaps it had hit a new sandbox filter or something and it would show position 200+ or something - anyway, it turns out that the page is gone, not showing on the google site: command and not showing if you search for the url string, yet google bot did visit the page recently according to my logs.

Same situation on a number of other pages on various other sites we work on.

In conclusion, Google does have a problem here and whilst often they like to look like they are having issues when they are not, this time i think they have a genuine data holding problem and it is affecting the serps

doois

1:57 pm on May 6, 2006 (gmt 0)

10+ Year Member



To illustrate how bad google is at the moment, type in:
<snip>country holiday widgets</snip> into google. This is one of the single most popular phrases for this country, a non existent page ranks in at number 3. No content, no meta, zip....just about 3400 backlinks and PR 7 as well as .edu domain extension.

Amazing.

[edited by: trillianjedi at 1:59 pm (utc) on May 6, 2006]
[edit reason] Examplifying.... [/edit]

g1smd

6:04 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see a LOT of old supplemental results being thrown away in the "experimental" DC, and that reduced data set has spread to a lot of other datacentres today...

Is this the supplemental cleanup that we have been waiting for, or just some glitch?

Atomic

7:11 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Things are really looking good. What's going to happen to this forum if Google fixes its ship? No one will having anything to complain about but their rightful ranking.

LuckyGuy

7:15 pm on May 6, 2006 (gmt 0)

10+ Year Member



what me distracts is the point that there a so many pages in index by doing the site: search that are already 404 since over ahlf an year. That makes me think it´s a bug. Yeah it´s a bug. But then i read the posts from you and think: Well its all about Shops and Forums, and those pages were hurt sensitively. That would make sense!? Yeah its not a bug.

But, then, based on my engineering studies I did some preliminary considerations. What is it that google follows:
1. having the biggest index of the world and having all information stored and adjusted.
2. beeing the first in internet search and stay first...
3. doing relevant internet search

All this points by doing their guiding principle "DON´T BE EVIL" gets me to the conclusion that they must have a problem to be solved with or without us webmasters.

So: IT MUST BE A BUG

Greets LG

g1smd

7:24 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Which datacentre are you looking at?

Results in most of the Google DCs are the original BigDaddy SERPs (two slightly different versions of that out there), and in other DCs they are the "experimental" results that started out at 72.14.207.99 and have now spread to many other datacentres (but still very much in the minority).

There are major differences between these two sets of results, especially in the handling of Supplemental Results from before about 2005 June. In the "experimental" DCs most of those Supplemental Results have disappeared today.

Atomic

7:33 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm looking at 72.14.207.104

mattg3

7:48 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Things are really looking good. What's going to happen to this forum if Google fixes its ship? No one will having anything to complain about but their rightful ranking.

It is a simple economic fact that an economy based on an instable system is a fragile economy. Since Google is the main traffic distributor, the constant instabilities in their SERPS are bad for anyone doing business on the NET.

Imagine a government would constantly rebuild the roads, no one of sound mind would accept it.

Atomic

8:20 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Imagine a government would constantly rebuild the roads, no one of sound mind would accept it.

So you suggest that Google sits tight and does nothing?

I don't think anyone would benefit from that. Not Google. Not consumers. Not webmasters. No one.

mattg3

11:35 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So you suggest that Google sits tight and does nothing?

No the obvious conclusion is that they release to a controlling body, what they do, when and how long it takes.

Soon they will be forced to anyway, see the child porn thread etc.

Relevancy

11:55 pm on May 6, 2006 (gmt 0)

10+ Year Member



Does anyone else feel like their clients think you are lying to them, because we always have to tell them something is screwed up with google? Our clients are going to start to say "sure something is wrong with Google again". Google has more months of updating/fixing then actual months of solid good results.

newwebster

12:11 am on May 7, 2006 (gmt 0)

10+ Year Member



Is this the supplemental cleanup that we have been waiting for, or just some glitch?

I say it is a cleanup. I am hoping all datacenters are cleaned within the next few days. This is much overdue and much welcomed to see.

hutcheson

1:22 am on May 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The situation is a zero-sum game. Google isn't running short of pages to show. Ten results feature in the top ten, same as always. If some site dropped down a thousand places in the results, a thousand other sites must be doing better as a direct result. And so long as surfers are easily finding some acceptable site, the roads are working fine. But Google isn't the road. Google is just the map. The internet is the road.

And people keep tweaking sites, which then move up or down in the results. So long as the net keeps churning, it's not reasonable to blame index-churn on the indexer. So long as people keep building new roads, the map is going to have to keep changing.

But it's worse than that. A lot of webmasters aren't building "roads" for surfers, they're digging trenches, spreading barbed wire, and trying to restrict access that doesn't go through their checkpoints. It's a war zone, not a subdivision! Google is trying to map the remaining open roads, to enable people to travel past the ambush sites to their real destination.

And yeah, the maps change REAL fast in the middle of a war. They have to. And in any case, if they didn't change, your sites would all STILL be falling out of the SERPS -- pushed out by new legions of doorway spammers, new scrapers, new links on other people's sites. Your checkpoints, in other words, being shut off by people laying barbed wire to funnel the sheep through their own checkpoints.

This is the new normalcy: get used to it.

F_Rose

1:44 am on May 7, 2006 (gmt 0)

10+ Year Member



On the experimental datacenter our site shows 0 supplemental results which is great.

However, Google is only indexing 24 pages of our entire site, the rest is not coming up in G's DC.

Is anyone else having the same problem, my question is do I need to do something about it, or just wait for Google to overcome this update, bug whatever you call it?

mattg3

2:23 am on May 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



maps change REAL fast in the middle of a war.

Yes war economies tend to invest in weapons. ;)

Once the war against spam mail has been won, I start believing G/Y/MS can win this war.

It's the new normalcy, get used to it

Well one should better tell G/Y/MS etc that .. :)

This 254 message thread spans 9 pages: 254