Forum Moderators: open
By using commodity PC hardware, which is similar to that of home PCs, Google buys cheap and builds high levels of redundancy into its system in an effort to compensate for the fact that one full day of Google use on a server is the equivalent of 40 machine years, Nevill-Manning said.
[pcworld.com...]
I bet that went down well with IBM salespeople ;)
>more than 10,000 servers
I thought it was 40,000 more than 10,000 servers?
>among 4 billion Web documents
I do wish they would keep their home page up-to-date;)
>The system is based on algorithms that are used to search for common links to Web sites
hmmmm....that one is a gem. More because of the lack of explanation/emphasis on other components;)
Nice find amznVibe, enjoyed :)
What's strange is I thought they used networked "Google Search Appliances [google.com]" but I guess not.
I also stumbled on that, the explanation seems to be that: "Each server has many twins,"
>> The system is based on (...) common links to Web sites
hmm...common...
This quote is also interesting, although it's been said before:
>> "Search in five years will be even more accurate and more user-centered."
/claus
[edited by: claus at 2:01 pm (utc) on Oct. 11, 2003]
[edited by: amznVibe at 2:02 pm (utc) on Oct. 11, 2003]
and more Google hardware background from another presentation by Nevill-Manning:
Google’s infrastructure: Google uses consumer-level hard disks and “really cheap, unreliable memory.” (“If something fails, it’s not you, it’s probably the memory.”) They have around 10,000 commodity-level Linux computers set up in a parallel network (“the largest Linux cluster in the world”), and anticipate the death of “a few machines every day.” Their network is set up to be able to route around a failed machine instantly.
"Because the system is built this way, if a machine goes down, it doesn't have to be repaired right away," he said. "We can save money by doing this in a lazy fashion."
Slightly different talk by Eric Schmidt in Red Herring [redherring.com] in february 2003:
"We aren't interested in getting maximum power for a high price," he says. "What we're looking for is maximum functionality and that's a whole different thing." Each of Google's thousands of motherboards (a computer's main circuit board) are designed for the quick switching of components. Even the power supply is held on with Velcro straps: if it burns out, it can be replaced quickly. Recently, when the expensive top-end disk drives used by the motherboards proved inadequate, Google tossed out thousands and replaced them with cheaper, better models."
Nevill-Manning is also a vital contributor to the technological infrastructure used to support AdWords, Google's self-service advertising program, and is developer of the Google Glossary, a tool for finding definitions to words, phrases and acronyms available through Google Labshttp://www.cas.ibm.com/cascon/speakers/index.shtml
Hey did we know that the Google database is 20 terabytes?
"when asked how the 64-bit Itanium, the new megaprocessor from Intel and Hewlett-Packard, would affect Google, Mr. Schmidt replied that it wouldn't." From Red Herring article, Allergic, post #8
More on Google, these are good quotes as well:
Both from "itbusiness.ca", amznVibe, post #7
"Nevill-Maning showed a pyramid diagram to illustrate how Google organizes its searches. At the bottom is "main," where there tends to be higher latency for pages that don't change very much. "Fresh," in the middle, includes portals that need more up-to-the-minute checks, like e-commerce sites during the December holiday shopping season. On top is "News," for CNN.com and other sites that change all the time."
"Google's main challenge right now is to handle the billing and syndication of its advertising, which often includes transactions that are only worth pennies each. On the R&D side, Nevill-Manning said the company is hoping to extend its portfolio with a number of new services, including a Google Glossary of hard-to-find terms, and Google Sets, which would bring up related searches."
This one is also interesting:
"As a result, there are many identical data centres around the world, Nevill-Manning added"
From "and another", amznVibe, post #7
Excacly how identical is "identical", one might ask...
/claus
search.msn.com as running Microsoft-IIS/5.0 and being rebooted every two weeks or so. /claus
MS will probably try to use the 2003 server product for the new search service and overpower the hardware to compensate. We know they have the money to burn.
That's a very valid argument. I recall that hotmail initially was running FreeBSD - i don't even know if the migration to win2k/iis5 is over yet even though the headers say so. Typically, when you request info on such large services, you will only hit the front end layer that is doing load balancing and such - you can't really know for sure what the (thousands of) servers in the back are using.
/claus
They build their business on Linux but the PR-Feature (Toolbar) is only available for Win****
Actually to be more accurate, they only make it for IE on Windows. They are obviously going with the percentages (80%+ vs everything else) but this topic has been beat to death around here elsewhere. For my 2 cents, I'd love to see it cross-platform on Mozilla at least so they don't appear to be playing favorites.
[computer.org...]
Has some nice cost comparisons in it on different server outfits.
As posted by Markus here: [webmasterworld.com...]