homepage Welcome to WebmasterWorld Guest from 54.166.10.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google Finance, Govt, Policy and Business Issues
Forum Library, Charter, Moderators: goodroi

Google Finance, Govt, Policy and Business Issues Forum

    
Google Publishes Hard Drive Study
Brett_Tabke




msg:3257038
 5:44 pm on Feb 19, 2007 (gmt 0)

[labs.google.com...]

Our analysis identifies several parameters from the drive's self monitoring facility (SMART) that correlate highly with failures. Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures. Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported.

 

AlexK




msg:3257211
 8:42 pm on Feb 19, 2007 (gmt 0)

Obviously, one of the interests of this paper is the sneak insights that it will give into the actual hardware that G uses.

I confess that I was surprised. Though I knew that G had already stated that it uses "cheap" hardware, I had still assumed that that would mean SCSI drives. Not at all:

More than one hundred thousand disk drives were used for all the results presented here. The disks are a combination of serial and parallel ATA consumer-grade hard disk drives, ranging in speed from 5400 to 7200 rpm, and in size from 80 to 400 GB. All units in this study were put into production in or after 2001. The population contains several models from many of the largest disk drive manufacturers and from at least nine different models ... They were deployed in rack-mounted servers

AlexK




msg:3257219
 8:53 pm on Feb 19, 2007 (gmt 0)

No point in looking at this paper to find the best HDD manufacturers:

Failure rates are known to be highly correlated with drive models, manufacturers and vintages. Our results do not contradict this fact ... in this paper, we do not show a breakdown of drives per manufacturer, model, or vintage due to the proprietary nature of these data.

jtoddv




msg:3257264
 9:39 pm on Feb 19, 2007 (gmt 0)

I wish Google would release their burn-in test software so that the public could test their drive prior to full installation.

AlexK




msg:3257279
 10:06 pm on Feb 19, 2007 (gmt 0)

snapshots:
  • Young drives (less than 2 years) prefer it hot (above 35C), whilst older drives like it mild (below 40C).
  • Get rid after the first scan error (39 times more likely to fail within 60 days than if none).
  • Get rid after the first sector reallocation (14 times more likely to fail within 60 days than if none) (21 times for offline reallocations).
  • Get rid after the first sector Probational Count (16 times more likely to fail within 60 days than if none).
  • More than 56% of failed drives do not have either SMART-reported scan errors, sector reallocations, offline reallocations nor sector Probational Counts.
  • More than 72% of all drives report seek errors.
  • 36% of failed drives have no error signals of any kind.
OK. I'm off to watch Prison Break now. Interesting article.
Brett_Tabke




msg:3257332
 11:37 pm on Feb 19, 2007 (gmt 0)

Ya, I thought it a touch chicken that they would not publish failure rates of specific drive manufactures. Apparently afraid of ticking off suppliers? lol

> scsi

no - According to numerous and varied past reports, G uses the barest of barebones consumer grade commodity motherboards and components. You can buy a better pc at walmart than Google uses.

Content_ed




msg:3257341
 11:54 pm on Feb 19, 2007 (gmt 0)

Failure rates are known to be highly correlated with drive models, manufacturers and vintages

Amen to that. Was in the industry off and on for years. Only predictor of hard dive failure I recall was multiple engineering changes manifested in little wires tack soldered to the circuit board. A hot drive in and of itself isn't so much a problem as long as the case and the room are well ventillated. A hot drive in a badly ventillated case will probably kill the CPU before the drive.

justageek




msg:3257348
 12:06 am on Feb 20, 2007 (gmt 0)

According to reports, G uses the barest of barebones consumer grade boxes

Same here. I use disks to store the data for the long haul but for selects from the db the results are almost always in memory. The only time they are not is when it is a query that has never been run before. I'm guessing Google does the same because when you do a search that Google would not have cached in memory you can get the seek time to 0.5 seconds or so and then immediately do the search again and it will be 0.06 seconds.

JAG

Gomvents




msg:3257356
 12:24 am on Feb 20, 2007 (gmt 0)

I wish they did let us know which brands worked best. I've had SATA drives from both Western Digital and Seagate and both have had problems sometimes... I'm debating now Raptor drives vs going to SAS. Anyone have any thoughts if reliability and cost are the only concerns (not speed or interface per se).

kaled




msg:3257369
 12:51 am on Feb 20, 2007 (gmt 0)

Google may use "consumer grade" components, but I think describing them as "consumer grade boxes" is misleading. I doubt that they have thousands of desktop PCs sitting on benches, I rather suspect they have hundreds of cabinets with dozens of boards/drives in each (per data center). Also, I imagine their boards each have four interfaces rather than two (supporting a maximum of eight drives).

SCSI hasn't offered any significant performance advantages (over ATA) for a very long time - at least ten years. Manufacturers have traditionally sold their newest/largest drives as SCSI only (at a price premium) simply to milk the gullible.

Gomvents, unless they sponsor the event, I don't think either of the manufacturers you mention is ever likely to win an award for reliability.

Kaled.

AlexK




msg:3257499
 5:03 am on Feb 20, 2007 (gmt 0)

kaled:
SCSI hasn't offered any significant performance advantages (over ATA) for a very long time

I've got a 3xSCSI RAID 5 array on my server, and same-speed ATA drives on my (home) desktop computer. I'm constantly rsync-ing between the two--mostly up, but sometimes down--the identical 43GB of data. The difference in speed of response in these totally disk-bound operations (ie at the start of the rsync) is astounding. It has been so significant that I've seriously considered moving my home computer to SCSI. Only the cost has, so far, kept me back from doing it.

Angelis




msg:3257590
 9:59 am on Feb 20, 2007 (gmt 0)

SCSI hasn't offered any significant performance advantages (over ATA) for a very long time

Are you nuts? The seek time and max data transfer rate in a U320 SCSI drive will out do almost all ATA and SATA drives you can buy.

kaled




msg:3257645
 12:07 pm on Feb 20, 2007 (gmt 0)

The seek time and max data transfer rate in a U320 SCSI drive will out do almost all ATA and SATA drives you can buy.

You've just associated a mechanical characteristic (seek time) with an electronic interface!

Consider this... there is no fundamental reason why SCSI interfaces should be more expensive than ATA, therefore, if SCSI were faster, motherboard manufactures would have ditched ATA years ago. Also, most super-fast PCs submitted for magazine review/testing use ATA disks.

There has been much nonsense spoken about IDE/ATA/SCSI interfaces over the years. I remember a review of a SCSI CDROM drive that had a fast access time due its SCSI interface (nonsense) and I remember many, many people who said that you shouldn't place a CDROM drive and a hard disk on the same cable - this was because the lambs just repeated what the sheep said and the sheep remembered something about master disks slowing down if a slow slave drive was fitted, however, disk synchronization ceased in the 1980s!

However, if you still don't believe me consider this, maximum transfer (burst) speed is only achieved when retrieving data from the cache, otherwise the maximum speed is determined by mechanical components (rotational velocity, data density and number of heads/platters). Arguably, SCSI did have an advantage years ago in multitasking operating systems due to DMA (direct memory access - CPU is not required to copy the data across) but ATA has supported DMA for at least ten years.

AlexK,
In your particular example, copying data from one disk to another, there might be a benefit but only if the SCSI controller itself has a large memory buffer and the operation is optimised so that motherboard memory/CPU is hardly used. If you are using specialist software, this is a possibility but I do not believe a bog standard Windows installation would not do this (Linux might possibly).

Kaled.

nippon




msg:3257655
 12:19 pm on Feb 20, 2007 (gmt 0)

Perhaps the unfathomable "my site dropped out of google" questions can be answered...

Crummy drives!

relicx




msg:3257663
 12:32 pm on Feb 20, 2007 (gmt 0)

kaled, you're simply wrong. Take a look at the benchmark comparing Raptor's (the fastest SATA drive on the market) multi-user performance vs. SCSI drives: [storagereview.com...] . It's not even in the same class. Do you run any popular websites? If you did, you'd know why webmasters might need SCSI drives.

Angelis




msg:3257671
 12:51 pm on Feb 20, 2007 (gmt 0)

If more people bought SCSI drives the price would drop, supply and demand as always.

Most people go for ATA drives because of the price, nothing else...

kaled




msg:3257813
 3:30 pm on Feb 20, 2007 (gmt 0)

Webmasters are often sci-fi fans. Now who was it that said "I cannay change the laws of physics." If you accept that data transfer rates are limited by the mechanical design (i.e. the ATA interface can squirt out data as fast as it comes off the disk platter) then the argument is largely dead. So, the question is this...

What is the sustained transfer rate of your SCSI drive? SCSI may be faster at transferring data out of into the drive's onboard cache, so, when writing data blocks up to the size of that cache, theoretically, there should be a performance gain (provided you don't want to access the disk again until the data has been written to the disk) but, in practice, this will not be noticeable. If you want to test this, in Windows, try disabling write-behind caching and run a few tests (not quite the same thing but near enough).

Clearly, Google don't feel that SCSI represents value for money. As for performance figures, do you also believe the speeds claimed by printer manufacturers?

Comparing SCSI and ATA is a bit like comparing Intel and AMD chips CPUs - Intel chips rarely achieve the figures claimed of them, but there are still people out there that believe they are faster.

Question
Why do manufacturers release their newest and fastest drives with SCSI interfaces (I assume they still do but I haven't checked)?
1) Because people believe SCSI means faster and will pay more.
2) To perpetuate the myth that SCSI really is faster (in order to achieve 1)).

Question
On the same motherboard, have any scsi-philes actually conducted realistic benchmark tests with identical ATA and SCSI drives? My guess is that Google have.

Kaled.

relicx




msg:3257837
 3:58 pm on Feb 20, 2007 (gmt 0)

Google needs to store huge amounts of data, so obviously IDE drives are better suited to their needs. Most webmasters don't have the same needs and are more concerned about speed, so bringing up Google is irrelevant. All you need to do is look at the most obvious parameter: RPMs. The SCSI drives spin at 15k compared to 7.2k for almost all IDE drives (except for Raptor at 10k RPM and it maxes out at 150GB). Whether they are worth the extra money is dependent on the situation, but to make claims that they are no different is clearly incorrect and misleading.

BillyS




msg:3257870
 4:25 pm on Feb 20, 2007 (gmt 0)

Based on my old non-scientific experience SCSI drives are much faster than IDE drives. I had an older machine with two newer IDE drives an old SCSI drive. When backing up to both the spare IDE and SCSI drives at the same time, the SCSI always finished way ahead of the IDE.

Today I own a Raptor and I'm getting a second one too. But I'd bet if I threw in that old (8 years old now...) SCSI it would still beat the Raptor.

Anyway, I think it's unfair to say that Google uses cheap hardware. I'd rather we call it "inexpensive." For the cost of $800 on a SCSI drive, you could RAID a lot of inexpensive SATA drives. And at the scale Google has, their's probably a lot of ways to optimize the use of SATA drives.

jcmoon




msg:3257875
 4:30 pm on Feb 20, 2007 (gmt 0)

Google may use "consumer grade" components, but I think describing them as "consumer grade boxes" is misleading. I doubt that they have thousands of desktop PCs sitting on benches, I rather suspect they have hundreds of cabinets with dozens of boards/drives in each (per data center).

Actually, you're on to something here. The Xooglers posted about hardware a few times last March about a tour of Google's host at one time, Exodus. It seems most companies who were hosted there (Yahoo, Inktomi, etc) would have a large cage with a few dozen servers, and plenty of space to move about. Generally the servers inside were expensive, sleek machines. Google, on the other hand, had a 600-square-foot cage with 1500 servers inside it. And these "servers" weren't the boxes we think of ... they were naked motherboards & drives on corkboard, sitting on metal trays. No casing, open to the (controlled) elements within the host's datacenter. That and power cords & network cables everywhere, like spaghetti.

And this was 1999. Chances are their philosophies have evolved some (they now have many datacenters of their own), but this is their roots.

(a tangent thought here is how much power consumption G was responsible for, and how good G was at moving from one host to another as hosts would go bankrupt ... open question as to whether G had anything to do with many hosts going bankrupt)

kaled




msg:3257983
 6:13 pm on Feb 20, 2007 (gmt 0)

The SCSI drives spin at 15k compared to 7.2k for almost all IDE drives

Given that I have clearly stated that hard disk performance is limited primarily by mechanical design, it is hardly surprising that a SCSI drive that spins twice as fast as an IDE drive achieves faster transfer rates. However, can the SCSI drive sustain a transfer rate that is beyond the limits of the ATA interface?

If the answer to this question is no, then my point is proven (that manufacturers use SCSI interfaces on their newest/fastest/largest disks to achieve premium prices rather because the ATA interface isn't quick enough).

Just to be clear, I don't doubt the experiences of people that say their SCSI drives are faster than their ATA drives, but that extra speed is more likely to be due to mechanical design than the SCSI interface.

Kaled.

relicx




msg:3258072
 7:44 pm on Feb 20, 2007 (gmt 0)

We're talking about SCSI drives, not the theoretical limits of the interface itself. SCSI itself may not be better than SATA, but that's a theoretical discussion that means nothing to webmasters if SCSI drives are faster than SATA drives. Maybe steam cars can be just as fast as gasoline cars, but if I'm looking for my next car I couldn't care less, since nobody makes them as fast as gasoline cars. There might be some unjustified price differences, but you're only guessing about them without any facts, because there are no SATA drives that spin at 15k RPM or have access times of SCSI drives or are marketed mostly to businesses.

AlexK




msg:3258354
 12:41 am on Feb 21, 2007 (gmt 0)

I certainly bought SCSI for the server (2003) because of the perception that SCSI was the only way to go for a professional setup so--to that extent--the hype/reputation sure worked on me.

Thing is, my experience has confirmed the hype/reputation. Work on my (Linux, SCSI) server is blisteringly fast on disk-bound operations, whilst the same operations on my (Windows, ATA) desktop stumble along. And of course I'm not comparing like-for-like:

Colo Server:

  • 3 x 74.3GB 10,000rpm 80-pin hot-plug SCSI
  • Adaptec zero-channel RAID
  • 2GB main-memory

Home Desktop:
  • 1 x 160GB 7,200rpm Maxtor 8MB ATA
  • 1 x 80GB 7,200rpm Seagate ATA
  • 512MB main-memory
...hardly any comparison.

It seems to me that for someone in my position, with a single machine in a remote Colo, I would go for SCSI because of it's reputation for speed and reliability (which has, so far, been confirmed).

BTW: does anyone know whether SCSI drives report scan errors, sector reallocations, offline reallocations or sector Probational Counts and--even more important--how to pick up the info under Linux?

Comment: with hindsight, and in G's position, I would have gone for the cheapest discs also. They have simply taken the idea of RAID (remember: "Redundant Array of Inexpensive Discs") to it's ultimate and run with it. Good on 'em.

kaled




msg:3258746
 12:13 pm on Feb 21, 2007 (gmt 0)

Here's something to think about...

Acme Drives Inc. produces two almost identical drives. One runs at 7,200 RPM and has 4 independent heads and one runs at 14,400 RPM and has 2 independent heads. Both achieve precisely the same sustained data transfer rates but the 7K2 drive uses less power and runs cooler. Both drives cost $100 but which drive will fly off the shelves and which will gather dust?

Kaled.

disgust




msg:3260285
 2:27 pm on Feb 22, 2007 (gmt 0)

saying the SCSI interface isn't faster than S/ATA is not the same as saying SCSI drives aren't faster than S/ATA drives.

you don't pay a premium price for the interface anyway. the price premium you're paying is because you're buying higher quality drives, interface aside.

kaled




msg:3260661
 7:33 pm on Feb 22, 2007 (gmt 0)

My orginal comment was
SCSI hasn't offered any significant performance advantages (over ATA) for a very long time

I was then accused of being "nuts". Is anyone saying that SCSI hard disks can sustain transfer rates beyond the capacity of ATA? If not, then I think my original comment is valid.

Having said that, I could actually be wrong, I haven't bothered to check the specs of the fastest drives (or run tests) for several years. Nevertheless, no one has provided figures that contradict what I've said.

Kaled.

disgust




msg:3260753
 8:49 pm on Feb 22, 2007 (gmt 0)

the discussion was talking about SCSI drives, not the SCSI interface. it certainly seemed like you were implying "there's no need to go SCSI, S/ATA is just as fast."

the standard may be as fast but the drives aren't.

SCSI drives are faster, more reliable, etc. not because of the interface-- you're right. but if you want the fastest, enterprise-class drives, S/ATA is simply not going to be an option for heavy IO random reads & writes; SCSI drives blow them away.

again, you don't pay the premium for the interface. you pay the premium because the enterprise level drives just happen to only come in SCSI formats.

kaled




msg:3260970
 1:20 am on Feb 23, 2007 (gmt 0)

And the specification of an "enterprise level drive" is precisely what?

I've never seen figures that support your claim that SCSI drives are more reliable (nor have I ever read such a claim in any reputable publication).

I don't believe that many drives perform mostly random IO (unless horribly fragmented) but, again, there would be no advantage to a SCSI interface.

the enterprise level drives just happen to only come in SCSI formats.

I've explained why that is (several posts ago) but if you don't believe me, that's fine.

Kaled.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google Finance, Govt, Policy and Business Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved