Google CEO admits, "We have a huge machine crisis " - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google CEO admits, "We have a huge machine crisis "

Google admits that they have a problem with storing the world's information

«
1
2
3
4
5
6
7
»

Enkephalin420

4:48 pm on May 3, 2006 (gmt 0)

10+ Year Member

Google CEO admits - "We have a huge machine crisis - those machines are full".

I was reading the New York Times article Microsoft and Google Set to Wage Arms Race [nytimes.com] and there was paragraph that caught my eye on page 2 that quoted Eric Schmidt (Google CEO) admitting that they have problems with being able to store more web site information because their "machines are full" (see page 2 of NYT article).

I am a webmaster who has had problems with getting / keeping my webpages indexed by Google. I follow Google's guidelines to the letter and I have not practiced any blackhat seo techniques.

Here are some problems I have been having;

1. Established websites having 95%+ pages dropped from Google's index for no reason.
2. New webpages being published on established websites not being indexed (pages that were launched as long as 6-8 weeks ago).
3. New websites being launched and not showing up in serps (as long as 12 months).

We're all well aware that Google has algo problems handling simple directives such as 301 and 302 redirects, duplicate indexing of www and non-www webpages, canonical issues, etc.

Does anybody think that Google's "huge machine crisis" has anything to do with any of the problems I mentioned above?

[edited by: tedster at 5:03 pm (utc) on May 3, 2006]
[edit reason] fix side scroll potential [/edit]

legallyBlind

1:51 pm on May 5, 2006 (gmt 0)

10+ Year Member

I'm sorry, but what kind of a publicly traded technology company is capable of not monitoring it's storage needs and increasing it as needed with time. Why wait for the last minute and then tell everybody: sorry guys we ran out of hard drive space. Don't they still have programmers working on the search algorithm anymore to see that coming?

JuniorOptimizer

1:57 pm on May 5, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I think Blogger did'em in.

drall

2:24 pm on May 5, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You have to admit this is pretty funny. A company that is solely in business to store data runs out of room? Lol thats like Ford running out of metal or McDonalds running out of hamburger.

joeduck

2:44 pm on May 5, 2006 (gmt 0)

10+ Year Member

35 trillion IP enabled devices...potentially

That seems very high. This would be about 5,000 devices for every person on earth.

But a key question about content is how much of it is worthy of indexing? Seems to me the best search applications will be those that know what NOT to index in the first place rather than those that try to index everything and then sort it out later.

europeforvisitors

2:58 pm on May 5, 2006 (gmt 0)

I'm sorry, but what kind of a publicly traded technology company is capable of not monitoring it's storage needs and increasing it as needed with time. Why wait for the last minute and then tell everybody: sorry guys we ran out of hard drive space

IMHO, it's usually a mistake to take hyperbolic remarks literally.

If the guy had said "We're getting killed by our electricity supplier," we'd probably see a thread here with the title: "Google staff electrocuted, bodies pile up at the plex." :-)

theBear

3:03 pm on May 5, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

IPv6 addresses are 128 bits long, or four times the size of IPv4 addreses.

The theoretical number of IPv6 addresses, about 3.4 � 10^38, is almost unimaginably large.

Me thinks that is larger than 35 trillion by a few.

decaff

5:58 pm on May 5, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Thanks Bear...
I was fascinated when I first read about IPv6 several years back...and the number of IP numbers that can be assigned to devices is staggering....everything can be connected to the web ... and I mean everything...(embedded, rfid chips, sites, cars,..watches, cell phones each with its own IP number...the possibilities are staggering...so are the problems...
I have seen several different numbers quoted on this...

Example of an IPv6 address:
1080:0000:0000:0000:0000:0034:0000:417A

Here's the math:
2¹²⁸, or about 3.403 � 1038 unique host interface addresses. That translates into 340,282,366,920,938,463,463,374,607,431,768,211,456 addresses.

39 digits equals 1000 sextillion

IPv6 goes officially live in 2008 (though there is already an active IPv6 network in place....and all Linux dists support this protocol currently..)

LuckyGuy

6:19 pm on May 5, 2006 (gmt 0)

10+ Year Member

Hey everyone,

Matt Cutts did a post to the dropped pages today.

[mattcutts.com...]

He says all is fine! I think its a lie.

europeforvisitors

6:32 pm on May 5, 2006 (gmt 0)

Matt Cutts did a post to the dropped pages today.
He says all is fine! I think its a lie.

I can't find that post. Can you identify it by timestamp?

LuckyGuy

6:34 pm on May 5, 2006 (gmt 0)

10+ Year Member

This is the post Matt Cutts Post on his blog:

maxD, last week when I checked there was a double-digit number of reports to the email address that GoogleGuy gave (bostonpubcon2006 [at] gmail.com with the subject line of �crawlpages�).

I asked someone to read through them in more detail and we looked at a few together. I feel comfortable saying that participation in Sitemaps is not causing this at all. One factor I saw was that several sites had a spam penalty and should consider doing a reinclusion request (I might do it through the webmaster console) but even that wasn�t a majority. There were a smattering of other reasons (one site appears to have changed its link structure to use more JavaScript), but I didn�t notice any definitive cause so far.

There will be cases where Bigdaddy has different crawl priorities, so that could partly account for things. But I was in a meeting on Wednesday with crawl/index folks, and I mentioned people giving us feedback about this. I pointed them to a file with domains that people had mentioned, and pointed them to the gmail account so that they could read the feedback in more detail.

So my (shorter) answer would be that if you�re in a potentially spammy area, you might consider doing a reinclusion request�that won�t hurt. In the mean time, I am asking someone to go through all the emails and check domains out. That person might be able to reply to all emails or just a sampling, but they are doing some replies, not only reading the feedback.

Sounds not so great

europeforvisitors

6:45 pm on May 5, 2006 (gmt 0)

I saw that post, but it didn't say anything about everything being fine, so I'm still wondering what LuckyGuy thinks Mr. Cutts was lying about.

LuckyGuy

6:48 pm on May 5, 2006 (gmt 0)

10+ Year Member

europforvisitors,

hes says that the posted sides are either have a spam penalty or its the bot that has a different way of spidering. Conclusion:all is fine!

europeforvisitors

7:28 pm on May 5, 2006 (gmt 0)

That may be your conclusion, but even if it is, accusing him of "lying" (when the conclusion is in your head) seems a bit over the top.

Given the amount of nastiness and bile that people like Matt Cutts and GoogleGuy have to take from unhappy Webmasters, it's a wonder they're willing to communicate at all.

StupidScript

11:27 pm on May 5, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Reading the NYTimes article through to the end reminds us of the true issue at hand, regardless of how some people think the storage crunch at Google seems to point to an operations snafu.

As John Battelle, the editor of SearchBlog, stated:

"In the long run, it's about whether you have the best service."

Those of you who are pointing to recent shifts in the algo results as "evidence" of a storage problem would do well to look back through the recent past and remember that there is a shuffling every time algo changes are made, and we just experienced one, and that common sense tells us repeatedly to wait until things stabilize at all of the data centers before freaking out.

Google does not disclose technical details, but estimates of the number of computer servers in its data centers range up to a million.

Before spending an additional $1.5 billion on more.

Has anyone EVER ... in the history of the Earth ... tried to manage a project like that? Anyone? No, you haven't, and all of your wisdom on that topic (predicting need, rolling out infrastructure, etc.) is pretty frail in the face of the simply staggering numbers that this enterprising company is dealing with.

YOU GO, GOOGLE! ROCK OUR WORLD!

trinorthlighting

12:39 am on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Before they buy new servers, they just need to clean up the index a bit. There is no reason to store very old and outdated data. Google knows this and I am sure they are working on it.

Start by cached pages from ebay auctions from a year ago.....

legallyBlind

3:13 am on May 6, 2006 (gmt 0)

10+ Year Member

Very Simple. Google has become an ad agency, their main goal is to collect and store old data. Analysis on old data can help search results increase ppc revenue.

Free search is free and you get what you paid for, that's all. It's time for all of us to wake up, smell the coffee and admit that google has become a global advertisement agency. Free / organic searches are for collecting stats and demographics.

We are a bit crazed with Google are we not? Is this normal?

Stefan

3:15 am on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Has anyone EVER ... in the history of the Earth ... tried to manage a project like that? Anyone? No, you haven't, and all of your wisdom on that topic (predicting need, rolling out infrastructure, etc.) is pretty frail in the face of the simply staggering numbers that this enterprising company is dealing with.

Well yes, they've assembled the largest pile of internet garbage to date, and yes the numbers involved with that giant pile are truly staggering. But the thing about garbage is - even if you sort it carefully, and wash it before piling it, in the end it's still just garbage.

And EFV, I mostly agree with your view on all of this, but remember that Matt/GG/whomever are not posting those comments here and in blogs because they care about us webmasters - they're doing it as part of a PR operation for a giant, profit-minded company, and it's in their financial interest to do so.

cabowabo

3:22 am on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Google will get huge discounts, though.

You are obviously not from the Bay Area. PG&E gives discounts to no one.

Cheers,

CaboWabo

simonmc

11:58 am on May 6, 2006 (gmt 0)

10+ Year Member

YOU GO, GOOGLE! ROCK OUR WORLD!

They are certainly rocking worlds. I will give you that observation.

mattg3

12:11 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You are obviously not from the Bay Area. PG&E gives discounts to no one.

No I am not, but I have an aunt that lives there.
G hopefully does not have data centers only in major earth quake areas ... :O

g1smd

5:57 pm on May 6, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

>> Before they buy new servers, they just need to clean up the index a bit. <<

For the last few days, on the "experimental" DC, the erroneous search that previously returned 900 vague supplemental results (that didn't match the search query) instead of just a few dozen relevant supplemental results (for deleted pages and expired domains), occasionally returned zero results - which is the correct result if Google ever cleaned up the old supplementals - for a phone number that has been completely removed from the web during the last few years.

Today, many DCs return zero results every time for this and several other similar queries for stuff that Google should have cleaned up long ago.

Now, is this a DC that has been cleaned up of old Supplemental Results, or is it a DC that has the Supplemental data missing and Google is going to add it back in again, in the next few days?

Time will tell.

.

A large website whose domain expired two weeks ago, had 12 000 pages listed, many of them supplemental for the last few years. The root shows a "domain expired" message. All other pages are gone from the site.

Google reindexed the site and overnight the number of listed pages has been reduced to under 100 on the "experimental" DC. It seems like Google is aggressively throwing away old data, whereas before they would have held on to it for years and years...

On the old "normal" DCs, Google still shows 12 000 pages listed.

bradical

8:02 am on May 9, 2006 (gmt 0)

10+ Year Member

Time to invest in Rackable Systems.

ulysee

2:52 pm on May 9, 2006 (gmt 0)

10+ Year Member

So basically some links are not being counted some pages are not being added and some webmasters are bleep out of luck.

heisje

3:36 pm on May 9, 2006 (gmt 0)

10+ Year Member

Top Contributors Of The Month

.

statistics from over 50 domains show that the volume of crawling / indexing by Google has been a fraction of that by Yahoo, Ask, MSN (individually) - since june 2005.

does not smell nice, does it now . . .

heisje

.

optimist

4:02 pm on May 9, 2006 (gmt 0)

10+ Year Member

50 domains does not sound like alot, where is ths data from?

Ellio

5:07 pm on May 9, 2006 (gmt 0)

10+ Year Member

Since we were "fixed" in Big Daddy we have seen Googlebot Mozilla overtake the MSN and Yahoo bots to take No.1 spot.

The site is now fully spidered every day. Whether this equal to TrustRank status or not who knows? We are PR6.

rbacal

7:44 pm on May 9, 2006 (gmt 0)

Maybe their capacity plans didn't allow for a flood of multimillion-page, template-based sites from Webmaster World members. :-)

I know you're kidding, but in all seriousness, can there be any doubt that the adsense program has probably been THE major contributor to the development of millions of useless websites and pages that have to be indexed by googlesearch?

Oops. Talk about the law of unintended consequences.

whoisgregg

7:49 pm on May 9, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

the adsense program has probably been THE major contributor

SE spammers were already building the massive throw away sites well before Adsense. Any efficient monetization program would have had the same effect of increasing the quantity of those types of sites. (Regardless of whether it was Google's program or not.)

Point in case, the MFA sites are as much a problem for Yahoo and MSN (if not more so) then they are for Google.

g1smd

9:38 pm on May 9, 2006 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Hmm, I added the phone number to a page that has never had that phone number on it before; and in the experimental DCs that page now appears in the index after just one day, with the telephone number showing in the snippet, but pointing to a 6 day old cache that pre-dates the edit.

In the older "BigDaddy" datacentres that page has failed to appear in the index for that search term (but still appears for the other search terms that it has ranked for, for the last 2 years).

I am guessing that those versions of the "BigDaddy" index are not being maintained, and will be phased out soon. Those "BigDaddy" datacentres usually return a higher number of pages, but are littered with ancient supplemental results.

In contrast, the "experimental" datacentres show a lower number of fully-indexed pages, but all of the supplmental results from before 2005 June are now gone. In their place some sites now show supplemental results from 2 to 10 months old instead. Supplemental results are for pages that no longer exist, or are the ghost of the content of pages that still exist but the supplemental results represent the previous version of the content of those pages.

Ellio

9:44 pm on May 9, 2006 (gmt 0)

10+ Year Member

g1smd,

Remind me of a few of the "New Big Daddy" DC's so I can do some comparisons.

Thanks

This 183 message thread spans 7 pages: 183

«
1
2
3
4
5
6
7
»