Forum Moderators: open

Message Too Old, No Replies

What drives Google - Hardware "secret"

cheap & fast - juiced up PCs

         

amznVibe

8:30 am on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interesting article in PCWorld today:

By using commodity PC hardware, which is similar to that of home PCs, Google buys cheap and builds high levels of redundancy into its system in an effort to compensate for the fact that one full day of Google use on a server is the equivalent of 40 machine years, Nevill-Manning said.

[pcworld.com...]

percentages

9:18 am on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>Speaking at the 13th annual IBM Centers for Advanced Studies Conference....."Cheap and fast" hardware is the way to go.

I bet that went down well with IBM salespeople ;)

>more than 10,000 servers
I thought it was 40,000 more than 10,000 servers?

>among 4 billion Web documents
I do wish they would keep their home page up-to-date;)

>The system is based on algorithms that are used to search for common links to Web sites

hmmmm....that one is a gem. More because of the lack of explanation/emphasis on other components;)

Nice find amznVibe, enjoyed :)

amznVibe

1:51 pm on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google should go into the hardware certification business with that kind of load (or at least server certification)! Just imagine the confidence you'd have in a "Google Thrash Tested" sticker :)

What's strange is I thought they used networked "Google Search Appliances [google.com]" but I guess not.

claus

1:59 pm on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> it was 40,000 more than 10,000 servers?

I also stumbled on that, the explanation seems to be that: "Each server has many twins,"

>> The system is based on (...) common links to Web sites

hmm...common...

This quote is also interesting, although it's been said before:

>> "Search in five years will be even more accurate and more user-centered."

/claus

[edited by: claus at 2:01 pm (utc) on Oct. 11, 2003]

amznVibe

2:00 pm on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



gawd, I am so nosey, but you can find anything on google, or anyone [sequence.rutgers.edu]
Google sure steals, I mean hires-away, some bright ones: [sequence.rutgers.edu...]

[edited by: amznVibe at 2:02 pm (utc) on Oct. 11, 2003]

lazerzubb

2:00 pm on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Last quote i have is 54.000 from "The Week"

amznVibe

2:09 pm on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow, another article from the presentation:
[itbusiness.ca...]
and another [itworldcanada.com]

and more Google hardware background from another presentation by Nevill-Manning:

Google’s infrastructure: Google uses consumer-level hard disks and “really cheap, unreliable memory.” (“If something fails, it’s not you, it’s probably the memory.”) They have around 10,000 commodity-level Linux computers set up in a parallel network (“the largest Linux cluster in the world”), and anticipate the death of “a few machines every day.” Their network is set up to be able to route around a failed machine instantly.

[q.queso.com...]

Allergic

2:25 pm on Oct 11, 2003 (gmt 0)

10+ Year Member



"Because the system is built this way, if a machine goes down, it doesn't have to be repaired right away," he said. "We can save money by doing this in a lazy fashion."

Slightly different talk by Eric Schmidt in Red Herring [redherring.com] in february 2003:

"We aren't interested in getting maximum power for a high price," he says. "What we're looking for is maximum functionality and that's a whole different thing." Each of Google's thousands of motherboards (a computer's main circuit board) are designed for the quick switching of components. Even the power supply is held on with Velcro straps: if it burns out, it can be replaced quickly. Recently, when the expensive top-end disk drives used by the motherboards proved inadequate, Google tossed out thousands and replaced them with cheaper, better models."

amznVibe

2:29 pm on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The question is, is it average 10,000 per datacenter? For 54,000 total? (Some more, some less?) Or is it 10,000 worldwide.

Dat's a whole-lotta-pc's. I bet that room gets warm.

amznVibe

2:42 pm on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Guess what, Craig helped with adwords and google glossary too:
Nevill-Manning is also a vital contributor to the technological infrastructure used to support AdWords, Google's self-service advertising program, and is developer of the Google Glossary, a tool for finding definitions to words, phrases and acronyms available through Google Labs
http://www.cas.ibm.com/cascon/speakers/index.shtml
This page [research.ibm.com] also has him listed as "leads the development team for Froogle, a product search engine"

Hey did we know that the Google database is 20 terabytes?

claus

3:53 pm on Oct 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Now, that's some one liner [webmasterworld.com]:

"when asked how the 64-bit Itanium, the new megaprocessor from Intel and Hewlett-Packard, would affect Google, Mr. Schmidt replied that it wouldn't." From Red Herring article, Allergic, post #8

More on Google, these are good quotes as well:
Both from "itbusiness.ca", amznVibe, post #7

"Nevill-Maning showed a pyramid diagram to illustrate how Google organizes its searches. At the bottom is "main," where there tends to be higher latency for pages that don't change very much. "Fresh," in the middle, includes portals that need more up-to-the-minute checks, like e-commerce sites during the December holiday shopping season. On top is "News," for CNN.com and other sites that change all the time."

"Google's main challenge right now is to handle the billing and syndication of its advertising, which often includes transactions that are only worth pennies each. On the R&D side, Nevill-Manning said the company is hoping to extend its portfolio with a number of new services, including a Google Glossary of hard-to-find terms, and Google Sets, which would bring up related searches."

This one is also interesting:

"As a result, there are many identical data centres around the world, Nevill-Manning added"
From "and another", amznVibe, post #7

Excacly how identical is "identical", one might ask...

/claus

amznVibe

1:09 am on Oct 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google definitely has the right approach. I wonder if Microsoft will try to mimic them for their new search service or if they are going to try to use fewer, more powerful boxes. If they are going to run IIS they will definitely need more power for the same results.

wifi on the fly

3:56 am on Oct 12, 2003 (gmt 0)

10+ Year Member



I don't know if IIS could even come close to handling search like that.

claus

10:10 am on Oct 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Netcraft.com [uptime.netcraft.com] reports
search.msn.com
as running Microsoft-IIS/5.0 and being rebooted every two weeks or so.

/claus

amznVibe

10:20 am on Oct 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



claus that is just too funny... you'd think they use at least use IIS 6.0 with http compression and such (Google uses http compresion). But you can't just trust response headers, it might not be that at all.

MS will probably try to use the 2003 server product for the new search service and overpower the hardware to compensate. We know they have the money to burn.

claus

10:52 am on Oct 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> you can't just trust response headers, it might not be that at all

That's a very valid argument. I recall that hotmail initially was running FreeBSD - i don't even know if the migration to win2k/iis5 is over yet even though the headers say so. Typically, when you request info on such large services, you will only hit the front end layer that is doing load balancing and such - you can't really know for sure what the (thousands of) servers in the back are using.

/claus

plasma

12:10 pm on Oct 12, 2003 (gmt 0)

10+ Year Member



They have around 10,000 commodity-level Linux computers set up in a parallel network

Isn't it ironic?
They build their business on Linux but the PR-Feature (Toolbar) is only available for Win****.

amznVibe

12:59 pm on Oct 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They build their business on Linux but the PR-Feature (Toolbar) is only available for Win****

Actually to be more accurate, they only make it for IE on Windows. They are obviously going with the percentages (80%+ vs everything else) but this topic has been beat to death around here elsewhere. For my 2 cents, I'd love to see it cross-platform on Mozilla at least so they don't appear to be playing favorites.

vitaplease

8:39 am on Oct 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is a bit older but related: WEB SEARCH FOR A PLANET:THE GOOGLE CLUSTER ARCHITECTURE

[computer.org...]

Has some nice cost comparisons in it on different server outfits.

As posted by Markus here: [webmasterworld.com...]

amznVibe

3:48 am on Oct 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



vitaplease, ah I forgot about that one... good one... HTML version [216.239.57.104] for those that prefer it
(their conversion still needs some tweaking me thinks)