Google Hardware

Forum Moderators: open

Message Too Old, No Replies

Google Hardware

Article: "The magic that makes Google tick " provides insight for SEO

apfinlaw

4:11 pm on Dec 2, 2004 (gmt 0)

Multi-page article re hardware provides insight into Google algo

and provides an interesting read
re the task at hand every second for G

Over four billion Web pages, each an average of 10KB, all fully indexed.
Up to 2,000 PCs in a cluster.
Over 30 clusters.
104 interface languages including Klingon and Tagalog.
One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
Sustained transfer rates of 2Gbps in a cluster.
An expectation that two machines will fail every day in each of the larger clusters.
No complete system failure since February 2000.
It is one of the largest computing projects on the planet, arguably employing more computers than any other single, fully managed system (we're not counting distributed computing projects here), some 200 computer science PhDs, and 600 other computer scientists.

zdnet article [zdnet.com.au]

[edited by: Brett_Tabke at 4:51 pm (utc) on Dec. 2, 2004]
[edit reason] fixed link [/edit]

nutsandbolts

5:13 pm on Dec 2, 2004 (gmt 0)

"We have thought of having a button saying 'give me less commercial results'," but the company has shied away from implementing this yet.

Nah, it's called the sandbox! :)

webhound

5:41 pm on Dec 2, 2004 (gmt 0)

"some 200 computer science PhDs, and 600 other computer scientists."

And they still can't make it work? lol

adfree

5:52 pm on Dec 2, 2004 (gmt 0)

"We have thought of having a button saying 'give me less commercial results'," but the company has shied away from implementing this yet.

They answered the call two years too late.

donpps

6:16 pm on Dec 2, 2004 (gmt 0)

I thought it was 8 billion web pages >> My take is GG Guys.. looks like its time to double and possibly quadruple all system and human resources..

That might help..;)

Good find!

dmedia

6:17 pm on Dec 2, 2004 (gmt 0)

Really interesting article. VERY interesting. And on a side note, how fun it would be to see some of those "broken Google servers" show up on Ebay .. (gotta be a back door at the Plex for some entrepreneurial Googlers to, um, take advantage of. Ha.)

dvduval

6:29 pm on Dec 2, 2004 (gmt 0)

I would like to know the CPU usage when calculating PageRank for 8 Billion pages. I'm starting to wonder if we are now in an era of "PageRank Lite", where only a partial calcualtion is made, because there is just not enough power/time to make the calculation.

grelmar

6:33 pm on Dec 2, 2004 (gmt 0)

Very interesting article.

I must say, I do wonder about the cheap hardware philosophy, though. It just strikes me as something that would add complexity and maintenance costs that would outweigh the cost benefit of getting cheap hardware in the first place.

The Farming analogy to cheap equipment:

I know a manager for a large agribusiness outfit, with a couple dozen farms of over 10,000 acres each. They have a simple way of measuring the cost of competing equipment. Because they have so many farms, they can set 4 of them aside to run equipment from four major vendors. They've been doing this on a running basis since the mid 70s, and they've discovered that, over time, cheap equipment is to expensive to run. The highest priced equipment on the market (you've all seen it, with that green paint job), is actually much, MUCH cheaper to run in the long term, because it breaks down less, requires less ongoing maintenance. This not only reduces basic maintenance costs, but reduces costs by preventing lost man-hours, better "on-time" delivery of results (in farming, you have to do certain things within very specific and narrow time frames - if you're a little bit off, sometimes by as little as a day, with seeding, harvesting, spraying, etc., your yield drops, and costs you money).

Google is easily big enough to run real time comparitives on this sort of thing. From the sounds of that article, a lot of time is needlessly wasted just dealing with issues relating to cheap equipment.

Mind you, I'm sure greater minds than mine have pondered the issue. But from my own experience in a few different fields, I've learned the hard way that cheap equipment just doesn't pay.

fashezee

7:10 pm on Dec 2, 2004 (gmt 0)

I thought our new external 160 GB Seagate HD was neat buy!

rogerd

7:11 pm on Dec 2, 2004 (gmt 0)

I agree, grelmar. In the early days of a company, buying cheap can work because the resources just aren't there. Being able to build a supercomputer-equivalent out of a bunch of cheap boxes is a big competitive advantage under those conditions.

Now, with its strong income stream and cash hoard, it might behoove Google to revisit its philosophy.

Kirby

7:19 pm on Dec 2, 2004 (gmt 0)

Its not like they cant cut a killer deal based on volume. The bragging rights alone would have the computer makers cutting each other at the knees.

Chndru

2:23 pm on Dec 1, 2004 (gmt 0)

* Over four billion Web pages, each an average of 10KB, all fully indexed
* Up to 2,000 PCs in a cluster
* Over 30 clusters
* 104 interface languages including Klingon and Tagalog
* One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue
* Sustained transfer rates of 2Gbps in a cluster
* An expectation that two machines will fail every day in each of the larger clusters
* No complete system failure since February 2000

[zdnet.co.uk...]

ciml

3:21 pm on Dec 1, 2004 (gmt 0)

Some interesting information on the GFS.

"We have thought of having a button saying 'give me less commercial results'," but the company has shied away from implementing this yet.

I like the idea of an option.

Hanu

3:48 pm on Dec 1, 2004 (gmt 0)

H�lzle believes the PageRank algorithm is 'relatively' spam resistant, and those interested in exactly how it works can find more information here.

Regardless of whether this is actually true, it tells us one thing: PR isn't as dead as many claim.

eyezshine

9:46 pm on Dec 1, 2004 (gmt 0)

It just sounds like google is more simple than most webmasters think.

BReflection

10:54 pm on Dec 1, 2004 (gmt 0)

It just sounds like google is more simple than most webmasters think.

It sounds like that's exactly what they want the webmasters to think.

sun818

11:03 pm on Dec 1, 2004 (gmt 0)

A petabyte is equal to 1,024 terabytes.

I wonder what brand of hard drive they use on those clusters.

claus

11:23 pm on Dec 1, 2004 (gmt 0)

>> "more than" 30 clusters @ "up to" 2,000 PCs

If we take the raw figures, that's "around or more than" 60,000 PCs, so i assume that the old debate about the figure being "around" 10,000 will not continue much further.

It's funny to contrast the inherent lack of precision of these statements with the precision on google.com - i mean, the company states publicly that it has indexed exactly one (1) page more than 8.058.044.650.

That page must have been a very important one.

sun818

1:05 am on Dec 2, 2004 (gmt 0)

Was the figure of 10,000 referring to one data centre, or the count for all data centres? One could assume that since the cost of IDE drives keep dropping while the upper storage limit increases, the storage need for a specific cluster could be met with less computers. My thought is that the PC count will change as the cost of a server rack, motherboard, and hard drive(s) changes.

I was surprised to read Google runs on Intel CPUs. I would have thought if their goal is cheap hardware, AMD would be the better choice since it has a better price / performance ratio than Intel.

eyezshine

7:51 am on Dec 2, 2004 (gmt 0)

Maybe we think we got banned or blocked, but maybe the PC our site was on failed? wait till they replace the PC and update the index and 6 months later you're good to go!

conroy

1:18 pm on Dec 2, 2004 (gmt 0)

>maybe the PC our site was on failed?

In reality, he said, Google probably has "50 copies of every server".

sonny

2:03 pm on Dec 2, 2004 (gmt 0)

>"We have thought of having a button saying 'give me less commercial results'," but the company has shied away from implementing this yet.

That would be a tough one.

Brett_Tabke

7:33 pm on Dec 2, 2004 (gmt 0)

There were several threads on this one that we combined - times may be out of order...

superpower

7:37 pm on Dec 2, 2004 (gmt 0)

Note that it says they have 2 failures per day per 2000 servers. Thats .1%. When the node fails it is automatically replicated like a folder in a file system. It grows back.

This is why what they are doing is smart by using huge number of servers/replicated clusters.

Consolidate that server power or storage and you get what they said happened when somebody unplugged an 80 server rack... a slower, longer failover wich could cause additional problems.

Also with more expensive and/or more exotic or proprietary components you can have other problems related to supply, servicing etc. Keeping it vanilla insures that the hardware is always available and there are not any unusual servicing needs.

Sure, buy high quality components but it sounds like there distributed file-system is working just fine.

Rugles

7:48 pm on Dec 2, 2004 (gmt 0)

Grelmar

Our company used to be a 100% agricultural company. It is approaching only 10% of our business because very few farmers put into practice the analogy that you described. They want everything on the cheap, very short-sighted. For every one Ag customer we have that is progressive, we have 50 that are the opposite.

However, this does not seem to be hindering Google at all. In fact the low cost hardware combined with the low cost OS is what allowed them to grow so quick and remain a private company. They spent the start up cash on brainpower and not hardware and software.

gopi

8:04 pm on Dec 2, 2004 (gmt 0)

Grelmar ,What you have said is true if you see the cheap servers on an individual basis

But what google does is running redundant cheap equipments tied together by a super smart software which expects that the equipment goes down .The end result is a system with the power of multiple super computers but made up of cheap machines .

The best analogy would be an ant colony . Each individual ant is week and can fail but as a group they are powerful and efficient

lajkonik86

8:10 pm on Dec 2, 2004 (gmt 0)

also note that when using more expensive equipment.
Overal system reliability will drop dramaticly.

Sure a single pc will be more reliable but the "hive" will not be as stable.

SlyOldDog

9:31 pm on Dec 2, 2004 (gmt 0)

I prefer the BORG analogy myself. After all, we will soon all be assimilated.

grelmar

11:22 pm on Dec 2, 2004 (gmt 0)

Cheap cluster proponents:

You may be right. But given Google's current cash situation, it might be worthwhile for them to set up a "sweet" cluster of high quality compnents to test against the others for long term costing/reliability.

Even the professional Beowulf community is moving towards higher quality components, for the simple reason that they're seeing better stability with more expensive equipment. The architecture remains the same, and they still get massive cost benefits over custom racks. But the individual PCs in the cluster, and their drives, are far superior than run-of-the-mill COTS equipment.

Mind, maybe Google has set up a sweet cluster, and just aren't talking about it. Google is, after all, the master of selective dissemination of information, when it comes to what they're doing in-house.

sun818

11:43 pm on Dec 2, 2004 (gmt 0)

True, you can buy higher quality components. But how long does a part really need to last? And is it worth the extra cost to have it last?

in a cluster of 1,000 PCs you would expect, on average, one to fail every day.

If one fails every day, a cluster would take three years to change. Hardware prices on server racks and drives drop so quickly, within six months to a year, you could easily double your storage with an improved processor for the same price you are paying now.

This 32 message thread spans 2 pages: 32

Google Hardware

Article: "The magic that makes Google tick " provides insight for SEO

apfinlaw

nutsandbolts

webhound

adfree

donpps

dmedia

dvduval

grelmar

fashezee

rogerd

Kirby

Chndru

ciml

Hanu

eyezshine

BReflection

sun818

claus

sun818

eyezshine

conroy

sonny

Brett_Tabke

superpower

Rugles

gopi

lajkonik86

SlyOldDog

grelmar

sun818

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week