homepage Welcome to WebmasterWorld Guest from 54.211.180.175
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 183 message thread spans 7 pages: 183 ( [1] 2 3 4 5 6 7 > >     
Google CEO admits, "We have a huge machine crisis "
Google admits that they have a problem with storing the world's information
Enkephalin420




msg:725455
 4:48 pm on May 3, 2006 (gmt 0)

Google CEO admits - "We have a huge machine crisis - those machines are full".

I was reading the New York Times article Microsoft and Google Set to Wage Arms Race [nytimes.com] and there was paragraph that caught my eye on page 2 that quoted Eric Schmidt (Google CEO) admitting that they have problems with being able to store more web site information because their "machines are full" (see page 2 of NYT article).

I am a webmaster who has had problems with getting / keeping my webpages indexed by Google. I follow Google's guidelines to the letter and I have not practiced any blackhat seo techniques.

Here are some problems I have been having;

1. Established websites having 95%+ pages dropped from Google's index for no reason.
2. New webpages being published on established websites not being indexed (pages that were launched as long as 6-8 weeks ago).
3. New websites being launched and not showing up in serps (as long as 12 months).

We're all well aware that Google has algo problems handling simple directives such as 301 and 302 redirects, duplicate indexing of www and non-www webpages, canonical issues, etc.

Does anybody think that Google's "huge machine crisis" has anything to do with any of the problems I mentioned above?

[edited by: tedster at 5:03 pm (utc) on May 3, 2006]
[edit reason] fix side scroll potential [/edit]

 

tedster




msg:725456
 5:07 pm on May 3, 2006 (gmt 0)

Here's a fuller version of the quote:

As Google grows, so does its need to store and handle more Web site information, video and e-mail content on its servers. "Those machines are full," Mr. Schmidt, the chief executive, said in an interview last month. "We have a huge machine crisis."

This is exactly what Big Daddy is purposed to fix, I thought. And we are now seeing the effect of "filling" the new machines on an expanded infrastructure. This is data handling on a mega-scale that scares the stuffing out of me, and I'm not surprised to see some bits and bytes falling off the new baskets from time to time. Doesn't mean I like it, but I do feel it's aimed at a future improvement and is a necessity for Google.

tedster




msg:725457
 5:14 pm on May 3, 2006 (gmt 0)

Another relevant part of the article:

Last month, when reporting its quarterly earnings, Google reported a doubling in its rate of capital investment, mainly in computer servers, network equipment and space for data centers, and said it would spend at least $1.5 billion over the next year.

$1.5 billion in one year for infrastructure alone. Wow!

karmov




msg:725458
 6:00 pm on May 3, 2006 (gmt 0)

Wow indeed...

Especially when you consider that the web and Google's ambitions are not getting any smaller. 1.5 Billion this years, more next? After that?

The numbers are truly impressive and eye opening, even more when you realise the simplicity of it all from the user end.

texasville




msg:725459
 6:01 pm on May 3, 2006 (gmt 0)

Which is why I think they are doing what they are doing now. Makes perfect sense seeing what is happening. I bet all duplicate content and all 404's will finally disappear forever.

decaff




msg:725460
 6:05 pm on May 3, 2006 (gmt 0)

Wasn't this last upgrade (Big Daddy) a migration from 32-bit computing to 64-bit computing? ... and also ... Google is preparing for the onslaught from IPv6 (2008)..(35 trillion IP enabled devices...potentially)..(or is that 35 thousand trillion IP numbers enabled through the 24 digit IP number structure?)

europeforvisitors




msg:725461
 6:11 pm on May 3, 2006 (gmt 0)

This goes to show that, if the "next Google" is born in a garage, the garage will have to be the size of a stadium parking ramp--and the founders won't be able to fund their venture by hocking their Playstations. :-)

tedster




msg:725462
 6:11 pm on May 3, 2006 (gmt 0)

The numbers are truly impressive and eye opening,
even more when you realise the simplicity of it all from
the user end.

Well said. And even from the site owner's perspective, Google appears to be MUCH simpler than it really is. We just want to know "how many of my pages do you have" and so on. Questions like that seem like no-brainers, until the issues of mega-scale come into play.

g1smd




msg:725463
 6:20 pm on May 3, 2006 (gmt 0)

Hmm, so was all that discussion on this forum about DOCIDs, just about a year ago, spot on but just a tad too early?

Actually, once a problem like that had been identified at Google, it must have spawned an update project of enormous proportions.

Something like that could not be planned and implemented in just a few months, so I guess the DOCID people were probably right...

whoisgregg




msg:725464
 6:57 pm on May 3, 2006 (gmt 0)

Except the DOCID discussion was about running out of unique keys to label the data...

These quotes suggest that they are running out of disk space to store the data itself.

narsticle




msg:725465
 6:59 pm on May 3, 2006 (gmt 0)

is google the next enron? Maybe they are getting too big for their britches.

P.S. I love google.

CanadianLove




msg:725466
 7:07 pm on May 3, 2006 (gmt 0)

This goes to show that, if the "next Google" is born in a garage, the garage will have to be the size of a stadium parking ramp--and the founders won't be able to fund their venture by hocking their Playstations. :-)

Unless of course the new start-up has no infrastructure what-so-ever. How about, hummm, maybe... Distributed?

arnarn




msg:725467
 7:11 pm on May 3, 2006 (gmt 0)

The G CEO referring to full machines may not have specifically been talking about the DC's for searching.

G has had many resource problems (e.g. remember the Analytics fiasco) where G underestimated user demand?I would be highly surprised if they'd let their search DCs to be under-resourced.

.. although anything is possible.

Essex_boy




msg:725468
 7:20 pm on May 3, 2006 (gmt 0)

Love it! Now several years ago I think around or just after the Florida fiasco some one pointed out on here that google may well be having problems adddressing all of its content.

Hence the dropped sites.

All those that agreed were seriously slated and laughed at.

I wonder if those pious souls will now admit to being wrong?

Get in line.

iblaine




msg:725469
 7:26 pm on May 3, 2006 (gmt 0)

Technology is never a barrier. Particularly when you have billions of dollars to spend. It's cute of Google to say they are having problems and they will be fixed.

walkman




msg:725470
 7:29 pm on May 3, 2006 (gmt 0)

OK Google, here's how to do it:
1. DROP the freaking pages deleted 2 years ago, and index the current ones
2. Stop storing everything users do while at Google.com (or at a site that uses Adsense)
3. You should be fine by now.

I have to say that I'm amazed that will take $1.5 Billion, just in hardware a year, to run Google; I would have never thought it was so much.

[edited by: walkman at 7:36 pm (utc) on May 3, 2006]

mcavic




msg:725471
 7:35 pm on May 3, 2006 (gmt 0)

1. DROP the freaking pages deleted 2 years ago, and index the current ones

Agreed. As a user I've found supplemental cached results useful, but as a site owner, I'm not sure that I want Google retaining them.

And what about Gmail? I don't know how many users they have, but it seems to have turned into a free file storage service.

Cakkie




msg:725472
 7:50 pm on May 3, 2006 (gmt 0)

Well, if you don't want your pages to end up in Google cache, simply set the <META NAME="ROBOTS" CONTENT="NOARCHIVE"> tag, and Google will delete the page from the cache the next time it crawls the page. (or you can use the Automatic url remover tool [services.google.com] from Google to have it done sooner.)

Swanson




msg:725473
 7:53 pm on May 3, 2006 (gmt 0)

Ha ha - priceless!

I don't think anyone could sum up Google's problem better than Webmasterworld - just look at a few of the main forum topics at the moment:

1. "Google CEO admits, "We have a huge machine crisis"
2. "Pages Dropping Out of Big Daddy Index"
3. "Somethings Up Right Now! -- 30 domains just went "home page only"

At the end of the day, in some part it can explain some of the things that are going on (including the new caching proxy where multiple crawlers cache one view of the data).

To me, if you have a huge space problem you solve it in 2 ways - more space, less data. Now then, if you come up with an amazing technique to store less data (i.e. duplicate content removal, canonical improvements) then you have saved a huge amount of cash, but.... if you underestimate what the implications are of your assumptions, you removed millions of web pages by accident. And better yet, if you do it "live" you have a bigger problem - but you have no choice but to do it live because you are running out of space.

But, you have loads of other services that need data storage - which ones do you sacrifice (the free ones or the ones that generate your whole income). Thats right, it is a better business decision to compromise your free search product (nobody may notice after all if you do it right) than your paid one (adsense and adwords need the space and no compromises can be made on this).

And there is the difference between Google of 1997 and the Google of today.

rohitj




msg:725474
 7:55 pm on May 3, 2006 (gmt 0)

I was under the impression the Big daddy was done to save bandwidth--not disk space. As for a machine crisis, it isn't hard to purchase datacenters--there are quite a few for sale at basement prices from companies going under.

I bet this problem could be solved by more efficient compression? In any case, I did read somewhere that the Google file system stores 7 copies of any one document for backup and load-balancing sake. Mabe they can reduce that to 4 or 5 as a very temporary solution...

narsticle




msg:725475
 7:55 pm on May 3, 2006 (gmt 0)

google stock up 18 cents amid problems.

Swanson




msg:725476
 7:58 pm on May 3, 2006 (gmt 0)

rohitj, yes but if it was as simple as increasing compression then surely the answer would be just to do it and not state in an interview that you are having a machine crisis. Especially as a public traded company.

ecommerceprofit




msg:725477
 8:00 pm on May 3, 2006 (gmt 0)

I wonder if they are dropping quotes like this to the press to scare off potential rivals. Also, I know gigablast claims they can index the web at a much lower cost than Google.

longen




msg:725478
 8:10 pm on May 3, 2006 (gmt 0)

Google want to “Organise the Worlds Information”.
Does that mean just indexing, or do they want to STORE the worlds info. What would it take to electronically store all the books in the British Library, Library of Congress, etc. plus all the newly released books.
With the increase in storage density it might be possible to plan for it.

Swanson




msg:725479
 8:13 pm on May 3, 2006 (gmt 0)

I think they are doing it to deflate the current problems and deflect any criticisms that may arise if and when a public story occurs about what is being found by webmasters.

By "bigging up" the scale and cost etc. they justify any problems that may be faced in that "gigantic mission" they have to index the web and therefore find sympathy and understanding from the community by presenting the challenge they face. Which, there is no doubt is a big challenge, but if you have hit this sort of crisis in 2006 - don't run a search engine.

Phil_Payne




msg:725480
 8:18 pm on May 3, 2006 (gmt 0)

Would it help if the very largest sites used <META NAME="robots" CONTENT="nocache">, one wonders?

Even large sites doing it would probably make little difference, but some of the megasites doing it might have an effect.

I've been thinking of doing it by default myself, but mainly to prevent people accessing old information, now that Google's update cycle seems to have slowed so much.

mcavic




msg:725481
 8:18 pm on May 3, 2006 (gmt 0)

if you don't want your pages to end up in Google cache

Yes, I meant the stale entries, and I was speaking philosophically. I personally don't have any problem.

Swanson




msg:725482
 8:21 pm on May 3, 2006 (gmt 0)

Phil_Payne, what you said there is another signal of the problems they are facing "update cycle slowing".

When faced with these sort of storage problems you need to keep a check on the refresh rate of your data - especially if your algo has a "time" factor i.e. link rate, domain history etc. because then your data storage requirement explodes.

jtoddv




msg:725483
 8:22 pm on May 3, 2006 (gmt 0)

And to think... one day the entire current Google database will be able to fit on a thumb drive.

Phil_Payne




msg:725484
 8:29 pm on May 3, 2006 (gmt 0)

> When faced with these sort of storage problems you need to keep a check on the refresh rate of your data - especially if your algo has a "time" factor i.e. link rate, domain history etc. because then your data storage requirement explodes.

True. I was for quite a time involved in large systems performance (Vice-Chairman of UK CMG) and nasty things happen in queueing theory when resources get tight - the "knee of the curve" effect. If Google is tight on storage, it will be spending ever more time and resources finding places to stick things. Rules of thumb (crude, I know) suggest 70% is about the best utilisation of a storage resource you'll ever get if you want the system to perform.

I've commented in other places about a site of mine where Google's response to my sitemap updates has been odd. I've now realised exactly what's happening - my Googlebot is faithfully downloading changes I notified via new sitemaps around five weeks ago. It's just worked its way through March 28 and started on March 29 - in order.

The corollary is that Google is delivering results to users based on old data. At least five weeks old, on this one site. It won't take long for Joe Public to start to realise that Yahoo, Ask Jeeves, etc., deliver more pertinent results than Google. In some consumer markets, five weeks is the lifetime of a product.

[edited by: Phil_Payne at 8:47 pm (utc) on May 3, 2006]

This 183 message thread spans 7 pages: 183 ( [1] 2 3 4 5 6 7 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved