homepage Welcome to WebmasterWorld Guest from 54.227.12.219
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 32 message thread spans 2 pages: 32 ( [1] 2 > >     
Google Hardware
Article: "The magic that makes Google tick " provides insight for SEO
apfinlaw




msg:174182
 4:11 pm on Dec 2, 2004 (gmt 0)

Multi-page article re hardware provides insight into Google algo

and provides an interesting read
re the task at hand every second for G

Over four billion Web pages, each an average of 10KB, all fully indexed.

Up to 2,000 PCs in a cluster.

Over 30 clusters.

104 interface languages including Klingon and Tagalog.

One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.

Sustained transfer rates of 2Gbps in a cluster.

An expectation that two machines will fail every day in each of the larger clusters.

No complete system failure since February 2000.

It is one of the largest computing projects on the planet, arguably employing more computers than any other single, fully managed system (we're not counting distributed computing projects here), some 200 computer science PhDs, and 600 other computer scientists.



zdnet article [zdnet.com.au]

[edited by: Brett_Tabke at 4:51 pm (utc) on Dec. 2, 2004]
[edit reason] fixed link [/edit]

 

nutsandbolts




msg:174183
 5:13 pm on Dec 2, 2004 (gmt 0)

"We have thought of having a button saying 'give me less commercial results'," but the company has shied away from implementing this yet.

Nah, it's called the sandbox! :)

webhound




msg:174184
 5:41 pm on Dec 2, 2004 (gmt 0)

"some 200 computer science PhDs, and 600 other computer scientists."

And they still can't make it work? lol

adfree




msg:174185
 5:52 pm on Dec 2, 2004 (gmt 0)

"We have thought of having a button saying 'give me less commercial results'," but the company has shied away from implementing this yet.

They answered the call two years too late.

donpps




msg:174186
 6:16 pm on Dec 2, 2004 (gmt 0)

I thought it was 8 billion web pages >> My take is GG Guys.. looks like its time to double and possibly quadruple all system and human resources..

That might help..;)

Good find!

dmedia




msg:174187
 6:17 pm on Dec 2, 2004 (gmt 0)

Really interesting article. VERY interesting. And on a side note, how fun it would be to see some of those "broken Google servers" show up on Ebay .. (gotta be a back door at the Plex for some entrepreneurial Googlers to, um, take advantage of. Ha.)

dvduval




msg:174188
 6:29 pm on Dec 2, 2004 (gmt 0)

I would like to know the CPU usage when calculating PageRank for 8 Billion pages. I'm starting to wonder if we are now in an era of "PageRank Lite", where only a partial calcualtion is made, because there is just not enough power/time to make the calculation.

grelmar




msg:174189
 6:33 pm on Dec 2, 2004 (gmt 0)

Very interesting article.

I must say, I do wonder about the cheap hardware philosophy, though. It just strikes me as something that would add complexity and maintenance costs that would outweigh the cost benefit of getting cheap hardware in the first place.

The Farming analogy to cheap equipment:

I know a manager for a large agribusiness outfit, with a couple dozen farms of over 10,000 acres each. They have a simple way of measuring the cost of competing equipment. Because they have so many farms, they can set 4 of them aside to run equipment from four major vendors. They've been doing this on a running basis since the mid 70s, and they've discovered that, over time, cheap equipment is to expensive to run. The highest priced equipment on the market (you've all seen it, with that green paint job), is actually much, MUCH cheaper to run in the long term, because it breaks down less, requires less ongoing maintenance. This not only reduces basic maintenance costs, but reduces costs by preventing lost man-hours, better "on-time" delivery of results (in farming, you have to do certain things within very specific and narrow time frames - if you're a little bit off, sometimes by as little as a day, with seeding, harvesting, spraying, etc., your yield drops, and costs you money).

Google is easily big enough to run real time comparitives on this sort of thing. From the sounds of that article, a lot of time is needlessly wasted just dealing with issues relating to cheap equipment.

Mind you, I'm sure greater minds than mine have pondered the issue. But from my own experience in a few different fields, I've learned the hard way that cheap equipment just doesn't pay.

fashezee




msg:174190
 7:10 pm on Dec 2, 2004 (gmt 0)

I thought our new external 160 GB Seagate HD was neat buy!

rogerd




msg:174191
 7:11 pm on Dec 2, 2004 (gmt 0)

I agree, grelmar. In the early days of a company, buying cheap can work because the resources just aren't there. Being able to build a supercomputer-equivalent out of a bunch of cheap boxes is a big competitive advantage under those conditions.

Now, with its strong income stream and cash hoard, it might behoove Google to revisit its philosophy.

Kirby




msg:174192
 7:19 pm on Dec 2, 2004 (gmt 0)

Its not like they cant cut a killer deal based on volume. The bragging rights alone would have the computer makers cutting each other at the knees.

Chndru




msg:174193
 2:23 pm on Dec 1, 2004 (gmt 0)

* Over four billion Web pages, each an average of 10KB, all fully indexed
* Up to 2,000 PCs in a cluster
* Over 30 clusters
* 104 interface languages including Klingon and Tagalog
* One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue
* Sustained transfer rates of 2Gbps in a cluster
* An expectation that two machines will fail every day in each of the larger clusters
* No complete system failure since February 2000

[zdnet.co.uk...]

ciml




msg:174194
 3:21 pm on Dec 1, 2004 (gmt 0)

Some interesting information on the GFS.

"We have thought of having a button saying 'give me less commercial results'," but the company has shied away from implementing this yet.

I like the idea of an option.

Hanu




msg:174195
 3:48 pm on Dec 1, 2004 (gmt 0)

Hölzle believes the PageRank algorithm is 'relatively' spam resistant, and those interested in exactly how it works can find more information here.

Regardless of whether this is actually true, it tells us one thing: PR isn't as dead as many claim.

eyezshine




msg:174196
 9:46 pm on Dec 1, 2004 (gmt 0)

It just sounds like google is more simple than most webmasters think.

BReflection




msg:174197
 10:54 pm on Dec 1, 2004 (gmt 0)

It just sounds like google is more simple than most webmasters think.

It sounds like that's exactly what they want the webmasters to think.

:)

sun818




msg:174198
 11:03 pm on Dec 1, 2004 (gmt 0)

A petabyte is equal to 1,024 terabytes.

I wonder what brand of hard drive they use on those clusters.

claus




msg:174199
 11:23 pm on Dec 1, 2004 (gmt 0)

>> "more than" 30 clusters @ "up to" 2,000 PCs

If we take the raw figures, that's "around or more than" 60,000 PCs, so i assume that the old debate about the figure being "around" 10,000 will not continue much further.

It's funny to contrast the inherent lack of precision of these statements with the precision on google.com - i mean, the company states publicly that it has indexed exactly one (1) page more than 8.058.044.650.

That page must have been a very important one.

sun818




msg:174200
 1:05 am on Dec 2, 2004 (gmt 0)

Was the figure of 10,000 referring to one data centre, or the count for all data centres? One could assume that since the cost of IDE drives keep dropping while the upper storage limit increases, the storage need for a specific cluster could be met with less computers. My thought is that the PC count will change as the cost of a server rack, motherboard, and hard drive(s) changes.

I was surprised to read Google runs on Intel CPUs. I would have thought if their goal is cheap hardware, AMD would be the better choice since it has a better price / performance ratio than Intel.

eyezshine




msg:174201
 7:51 am on Dec 2, 2004 (gmt 0)

Maybe we think we got banned or blocked, but maybe the PC our site was on failed? wait till they replace the PC and update the index and 6 months later you're good to go!

conroy




msg:174202
 1:18 pm on Dec 2, 2004 (gmt 0)

>maybe the PC our site was on failed?

In reality, he said, Google probably has "50 copies of every server".

sonny




msg:174203
 2:03 pm on Dec 2, 2004 (gmt 0)

>"We have thought of having a button saying 'give me less commercial results'," but the company has shied away from implementing this yet.

That would be a tough one.

Brett_Tabke




msg:174204
 7:33 pm on Dec 2, 2004 (gmt 0)

There were several threads on this one that we combined - times may be out of order...

superpower




msg:174205
 7:37 pm on Dec 2, 2004 (gmt 0)

Note that it says they have 2 failures per day per 2000 servers. Thats .1%. When the node fails it is automatically replicated like a folder in a file system. It grows back.

This is why what they are doing is smart by using huge number of servers/replicated clusters.

Consolidate that server power or storage and you get what they said happened when somebody unplugged an 80 server rack... a slower, longer failover wich could cause additional problems.

Also with more expensive and/or more exotic or proprietary components you can have other problems related to supply, servicing etc. Keeping it vanilla insures that the hardware is always available and there are not any unusual servicing needs.

Sure, buy high quality components but it sounds like there distributed file-system is working just fine.

Rugles




msg:174206
 7:48 pm on Dec 2, 2004 (gmt 0)

Grelmar

Our company used to be a 100% agricultural company. It is approaching only 10% of our business because very few farmers put into practice the analogy that you described. They want everything on the cheap, very short-sighted. For every one Ag customer we have that is progressive, we have 50 that are the opposite.

However, this does not seem to be hindering Google at all. In fact the low cost hardware combined with the low cost OS is what allowed them to grow so quick and remain a private company. They spent the start up cash on brainpower and not hardware and software.

gopi




msg:174207
 8:04 pm on Dec 2, 2004 (gmt 0)

Grelmar ,What you have said is true if you see the cheap servers on an individual basis

But what google does is running redundant cheap equipments tied together by a super smart software which expects that the equipment goes down .The end result is a system with the power of multiple super computers but made up of cheap machines .

The best analogy would be an ant colony . Each individual ant is week and can fail but as a group they are powerful and efficient

lajkonik86




msg:174208
 8:10 pm on Dec 2, 2004 (gmt 0)

also note that when using more expensive equipment.
Overal system reliability will drop dramaticly.

Sure a single pc will be more reliable but the "hive" will not be as stable.

SlyOldDog




msg:174209
 9:31 pm on Dec 2, 2004 (gmt 0)

I prefer the BORG analogy myself. After all, we will soon all be assimilated.

grelmar




msg:174210
 11:22 pm on Dec 2, 2004 (gmt 0)

Cheap cluster proponents:

You may be right. But given Google's current cash situation, it might be worthwhile for them to set up a "sweet" cluster of high quality compnents to test against the others for long term costing/reliability.

Even the professional Beowulf community is moving towards higher quality components, for the simple reason that they're seeing better stability with more expensive equipment. The architecture remains the same, and they still get massive cost benefits over custom racks. But the individual PCs in the cluster, and their drives, are far superior than run-of-the-mill COTS equipment.

Mind, maybe Google has set up a sweet cluster, and just aren't talking about it. Google is, after all, the master of selective dissemination of information, when it comes to what they're doing in-house.

sun818




msg:174211
 11:43 pm on Dec 2, 2004 (gmt 0)

True, you can buy higher quality components. But how long does a part really need to last? And is it worth the extra cost to have it last?

in a cluster of 1,000 PCs you would expect, on average, one to fail every day.

If one fails every day, a cluster would take three years to change. Hardware prices on server racks and drives drop so quickly, within six months to a year, you could easily double your storage with an improved processor for the same price you are paying now.

This 32 message thread spans 2 pages: 32 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved