|Building a SCSI Hard Drive RAID Array|
Are they worth it?
I'm toying with the idea of building a very nice computer for myself. As part of the setup, I wondered about two 15K RPM SCSI drives in a striped RAID array. As I started looking at the drives, I realized I don't really know anything about what I'm looking for.
The most basic question is, are SCSI drives really any faster than, say, a 7200 or 10000 RPM SATA drive?
Assuming the answer is yes, then the following questions also become relevant...
- What are the differences between 80-pin and 68-pin drives?
- What are the differences between SCSI and SAS?
- Are there certain characteristics I should look for in a SCSI controller card?
- Are SCSI (or SAS) drives the same physical size as IDE and SATA drives, e.g., will they fit into the same drive bays in my case?
I think that will do it for now...I tried to research some of this on my own, but the more I read, the more I realized I needed someone to point me in the right direction before any of the research would make sense! ;)
|1. What are the differences between 80-pin and 68-pin drives |
80-pin drives use 80-pin connector cables, 68-pin drives use 68-pin connector cables. But you probably already figured that out, right? :)
|2. What are the differences between SCSI and SAS? |
SAS is Serial Attached SCSI. My understanding is that it's the next generation SCSI. Faster? Standardized conenctor cables (no more 68-/80-pin mess).
|4. Are SCSI (or SAS) drives the same physical size as IDE and SATA drives, e.g., will they fit into the same drive bays in my case? |
Yes and no... Some newer SAS drives are 2.5" instead of 3.5" (planning to get the 2.5" ones for my new DB server).
From my understanding and somewhat limited experience, if you're not building a server, you probably don't need SCSI. SCSI tends to be better for multiple users accessing the drive(s) simultaneously.
>>The most basic question is, are SCSI drives really any faster than, say, a 7200 or 10000 RPM SATA drive?
If you're building a server that involves a lot of I/O operations then SCSI has an advantage. Personally, I'd rather raid a couple of 10,000 rpm SATA drives and spend the money saved on SCSI somewhere else.
If you're trying to figure out what to build look at the hardware that VooodooPC, Falcon or Alienware is putting in their machines.
There was a time when SCSI was better for multi-tasking and multi-user applications but that was a long time ago. The advantage SCSI had was that is supported DMA (Direct Memory Access) so that the CPU could do something else while the requested data was supplied by the hard disk, whereas without DMA, the CPU was required to copy the data in chunks out of the disk buffer.
However, IDE/ATA drives have supported DMA for well over a decade. Both IDE and SCSI interfaces can keep up with fast drives so there is usually little or no advantage to be gained from a SCSI drive. (However, plenty of people at WW will disagree with me.)
A fast hard disk helps with performance, but in most cases, any potential advantages will be swamped by other factors such as memory, OS version, choice of anti-virus software, etc.
68-pin vs. 80-pin:
68 pin is older, used for "internal" drives attached to a cable. (There is an external extension for the 68-pin bus, using different connectors). 68-pin uses a separate 4-pin power connector.
80-pin is designed for "hot plugging", power is feed through the 80-pin connector - no separate power connector. The connector is intended to plug directly into a backplane, rather than using a cable. (Although the use of a cable is also possible.) For 3.5" drives, form factor is standardized. Hot-plug "ears" are proprietary to each enclosure/cage manufacturer, but the ears screw on to the standardized mount points of any 80-pin drive.
Both 68-pin and 80-pin are basically obsolete, though. Parallel SCSI is now a dead-end technology - there is no plan for updated standards, AFAIK.
SAS = Serial Attached SCSI. Uses same connectors as SATA. A SAS host controller can connect to both SAS and SATA drives. A SATA host controller cannot connect to SAS drives.
SCSI drives are MUCH more expensive than SATA drives. But they are designed specifically for the server market, and are very high reliability drives. Not a SCSI feature per-say, but a side-effect of the market specialization. They are also typically high performance, and the only 15K drives remain SCSI.
The main difference between SCSI/SAS and IDE/SATA is not DMA (most ten year old IDE controllers had already DMA, most software drivers just didn't use it and did everything in PIO mode), but that the first two have better provisions for sorting and deliberately delaying disk actions. SCSI/SAS devices can calculate the optimal sequence of disk actions with minimal head movement and maximum overall data throughput. Newer versions of SATA have some provisions for it, but it is often not well supported by drivers because the SATA drivers are backwards compatible with older less intelligent drives and therefore often don't use the newest command sets.
If you have a multi-user/multi-tasking system with many small reads and writes to the disks, SCSI/SAS is certainly an advantage. For a single user system with just a few parallel tasks running you probably won't see any difference.
|As part of the setup, I wondered about two 15K RPM SCSI drives in a striped RAID array. |
Are you sure you want striped and not mirrored? In a stripe set the chance of a disk failure increases with the number of disks in the array. And one disk failure makes the data on all disks unusable. Striping should only be used for data which is easily replacable, or for which an up-to-date off-line copy is available. With current large disk sizes regular backups on off-line media like DVD or tape is realy time consuming and expensive. If you go for RAID, choose at least some sort of failure-resistant setup like RAID1 or RAID5.
I know what I am talking about because I have setup a new local server at home with hardware RAID1 (mirroring) just two weeks ago. After one week one of the four harddisks started generating S.M.A.R.T. errors and died a day later. I only had to replace the disk and chose the rebuild option for the RAID controller to repair the configuration. While the computer was performing his normal tasks, the RAID controller copied the original contents from the existing disk to the new empty one. If it would have been a RAID0 (striped) volume, about one week of installation, configuration, filling and tuning of the computer would have been lost.
|most ten year old IDE controllers had already DMA, most software drivers just didn't use it and did everything in PIO mode |
From memory, Windows 95 protected-mode drivers for hard disks used DMA.
I presume that "striped" arrays potentially offer twice the data transfer speed - perhaps that's what MatthewHSE is looking for.
I've never been a great fan of RAID technology myself - I think it breeds over-confidence. About ten years ago, a friend told me of a near disaster at his company when a power-supply cought fire and nearly destroyed all the company data stored on a RAID system. There is simply no substitute for offsite backups.
Definitely worth it, not a doubt. They make an amazing difference both in server and workstation applications. If the price were lower I'd insist on them exclusively.
Windows NT 4.0 used PIO mode only. In the time I was system administrator of a 70+ nodes NT cluster the main performance boosting action was to install motherboard specific IDE drivers that supported DMA. But that is history now fortunately.
But ATA drivers not supporting the newest command sets of SATA disks with native command queing is still common. SCSI has always been for the high-end market where performance is an issue and therefore the drivers always support the latest hardware features. IDE/SATA is for the widespread use on all kinds of systems and drivers therefore tend to use only the commands available on all/most devices.
RAID itself is not a guarantee, it is a first line defence. But I don't want to count the number of smaller web developers who will have severe problems if their current development environment suffers severe technical problems. And then RAID might be better than nothing. With current x00 GB disks, making regular backups on DVD or tape media is a really expensive task.
When building my SATA based server a few weeks ago, I discovered that disk manufacturers often have a desktop and server version of their SATA disk which from the outside look identical, but differ in price. As my server was shipped with Barracudas, I checked at the manufacturer's site which difference there is beteen the ES and 7200.10 series. Both series have the same speed and capacity specifications, and look the same from the outside. Interesting was that there is not much of a mechanical difference, but that the server version of the disk has slightly different firmware and better sensors to detect and compensate vibrations. It seems that when you put multiple disks of the same brand/capacity/speed in one computer frame (which is common case in server/RAID environments), vibrations of one disk may induce resonations in other disks. The sensors detect and compensate for these resonations which causes less soft-seek errors on the server version. Both the desktop and server versions of the disks have these sensors, but the sensors in the server version are more accurate. An interesting fact and it was an eye opener for me. It seems that the main benefit of server versions of these disks is when they are really used in server/RAID environments and that it doesn't matter so much in single-disk desktop installations. SCSI/SAS versions have the better sensors by default.
Thanks for the feedback everyone, I really appreciate it. At least now I know where to head with further research before making a final decision - and please feel free to keep the information flowing in this thread too! ;)
Regarding my plan for a striped RAID system, my understanding is that reads and writes are approximately twice as fast on a setup like that. Therefore, my plan was to keep the OS and program installations on the striped array, and have a mirrored array of SATA's for my data. If my understanding is correct, this will allow the operating system and programs to work very fast, while the data, although accessed slower initially, will be stored in RAM while I'm actually working on it and therefore the speed of the drive it's stored on isn't so vital.
Is that correct thinking, or do I have something wrong?
>>Regarding my plan for a striped RAID system, my understanding is that reads and writes are approximately twice as fast on a setup like that. Therefore, my plan was to keep the OS and program installations on the striped array, and have a mirrored array of SATA's for my data.
It sounds like you're setting up two sets of RAID 0 drives (two for programs / O/S and the other for data).
Keep in mind that you've got a set up now where if anyone one of the four drives goes you're losing data.
If you have the need for four drives. You might want to look into RAID 0 +1 or RAID 1 +0. This way you've got some fault tolerance. With the set up you're suggesting, you're 4 times as likely to lose data than with a single hard drive. So if you have a problem with a hard drive once every four years, then you can expect to have a problem once per year.
Striping indeed increases transfer speed proportionate to the number of drives - 2 drives = 2x speed, 4 drives = 4x speed, etc.
But as pointed-out there is a corresponding reduction in reliability.
Striping is appropriate for swap, temporary storage, or OS if you are willing to re-load the OS in case of trouble. But beware of data files hiding in places where they shouldn't be - not every app follows the OSs recommendations on use of the file system heirarchy!
But, really, with the speed of hard drives today, it is overkill unless you have a very specific need. Years ago, I used a striped array of 5 drives for capturing uncompressed video at 60fps (a feat at the time - it was a special camera for the 60fps, BTW). This was "throw-away" data - analysis of golf swings - which would be overwritten on the next session.
Mirroring can be useful if you can only afford to use two drives. I wouldn't use it with more than two, as RAID5 can be done with 3 or more drives, and is much more space-efficient.
There is no read penalty for any kind of RAID. There is a write penalty for RAID5. No write penalty for RAID 0 (striping) or RAID 1 (mirroring).
Keep in mind that without a real RAID controller, the write penalty might be very significant.
Frankly, I wouldn't bother with any kind of RAID today.
Its a cost argument.
The SCSI stripped arangement - say with 2 * 15k rpm U320 80pin Seagate / Hitachi on either a standard PC/peripheral SCSI card (SATA/SCSI both supported), or a workstation/server with SCSI/XEON only will be very fast - but the cost implications are not to be sneared at
If its a gaming machine then mimick as one of the other posters stated Alienware, or overclockers configs. If its server level - then really you need to start looking at IBM, HP, Compaq - due to the building of the XEON chip / motherboard costs that support the latest SCSI RAID specs.
For Speed in the past we went with HP DL servers - Xeon - 2 * 15K 320U 144Gb Mirrored SCSI (per server) - though we want speed - we also need backup redundancy (hence, mirrored). We have each server back up with standard SATA pc's just incase the processor or the motherboard fails - to get us by for a day or two until HP come and sort it.
However i am going to be building a simple network in a month, in which i will go for SATA raid (mirrored) on standard AMD/intel processors (which ever is offering the best value CPU/MB at purchase time) becuase of compatibility and costs issues.
This thread should be moved to the hardware forum I'd imagine.
Heh what would really be nice...
Two 32GB (4x8GB) DDR2-800 RAM-drives in raid 0. ;)
Seriously though if you want to spend the extra money for performance look for a DDR2 ram drive that supports 4GB sticks for at least 16GB total capacity. The windows directory is usually over bloated with junk you can delete to keep it around 3GB or less.
Right now I'm running two Raid 1s on a dedicated raid card. I'm not sure if we're allowed to mention brands/models so I'll skimp on the details but...
C:\ = Two 120GB/8MB/7200RPM in Raid 1 for Windows XP
D:\ = Two 320GB/16MB/7200RPM in Raid 1 for personal files.
Granted with raid if you have a virus you have two copies though my main concern was loosing data by 1.) hardware failure and 2.) Windows somehow locking me out on the same drive. I can without a second thought format my C:\ if I want to though it's a pain to reinstall windows and about 40 browsers to test with considering the configuration.
Any way the performance is about what I expect and altogether it was less then $400 for piece of mind and my drives come with a five year warranty.
I've had nothing but trouble trying to get onboard raid to work on nForce4 motherboards (I've had about five or six now). So a dedicated raid controller is a must in my view, less load on the CPU I imagine any way.
If I had the money though there was no high capacity ram-drive and ram speed/capacity to construct a decent sized ram-drive I'd probably construct a raid 0+1 (or 1+0?) Any way you take a raid 0 (for performance) and pair it with another raid 0 and you strip them as a raid 1. Always refer to it with a + (0+1 or 1+0, never raid "10" or raid "01" as there is currently no raid "10" or "01").
Happy raiding. :)
I had quite a time coming up with a "most secure" data server architecture. I had originally opted for RAID 10 (losing 75% of my utilisable disk space), but instead opted for a mirrored bank of RAID 5 disks (thus losing only 60% of my usable disk space). What are the chances of the same disk at the same spot (and serving the same function for the same data) in two different arrays failing at the same time?
RAID 1+0 (to bow to the above comment) is significantly more secure than raid 0+1 - have a look at wikipedia for confirmation on that one, no need to get into the details here.
|RAID 1+0 (to bow to the above comment) is significantly more secure than raid 0+1 - have a look at wikipedia for confirmation on that one, no need to get into the details here. |
WebmasterWorld is a place of sharing knowledge, not a place of redirecting people to other places for their knowledge.
Given the original poster's question about using RAID in a personal desktop system mainly to increase performance, we are drifting away if we start to look at the minor differences between stripe-mirror vs. mirror-stripe sets which will only be visible in configurations with at least six disks. In the "low-cost" 4 disk RAID 0+1 and RAID 1+0 configurations one disk failure is tolerated and a second may be tolerated if it happens on the disk that carries the data not present on the first failed disk. With four disks looking from the point of fault tolerance both RAID configurations are equal.
If using six disk RAID 1+0 or mirrored RAID5 or RAID6 becomes an issue for an application, there are more things to consider, including separate RAID controllers, redundant power supplies, hot swapable fans and a fault tolerant operating system. Not to mention the bomb proof shelter and the procedures that must prevent spilling of coffee on the keyboard. But I think that wasn't what MatthewHSE was thinking about when he talked about building a nice computer for himself.
If you'd like to build a REALLY fast machine, then go with RAIDed SSD (Solid State Disks) rather than SCSI. The latest Flash disks from Memoright and Mtron have pulverised the performance that cheaper, mainstream Flash parts from the likes of Sandisk, Samsung etc. can manage.
I moved to a single SSD drive (a very fast one) for my main OS drive and it's night and day faster than my old setup (10,000 RPM SATA). Windows boots maybe twice as fast, and "suspends" and "resumes" in seconds. So 2+ such drives in RAID formation would be insane!
[edited by: Edwin at 6:01 pm (utc) on Mar. 29, 2008]
What OS are you planning on using? If Linux I recommend having a look at "Managing Raid on Linux"
<aside>I'm so happy to see such a geeky discussion homepaged</aside>
You are FAR better off going with 10k SATA drives in a workstation environment. Going with SCSI is simply just wrong. Put the money you'll save towards a GOOD RAID card with a battery backup unit and a good amount of cache and you will blow away a SCSI setup with some average card. The place to spend your money is on the RAID card, and the place to spend your time is on figuring out the best drive setups that work for the work you do. PCI-e is a must for the RAID card.
I assume you're doing video encoding or gaming, in which case, you'll want to mirror your OS drive, and put your applications on this. Create either a single drive or a RAID0(if budget allows) drive for your temp files, and create an array of some sort to host your video files. Typically it's RAID5 or RAID6 works best here as it is the best cost to loss ratio while still maintaining solid throughput. Also, multiple RAID cards with this setup is ideal so that you can dedicate express lanes to each array.
And RAID is as important today as it ever was. It is cheap insurance against downtime on the single component that is most likely to fail. Do you really want to have to reload your OS and your applications because a $150 part broke? How much time will it take you to do all that, 3-10 hours? Unless you bill our at minimum wage, it's definitely in your best interest to keep with RAID.
SCSI: older technology, mostly replaced by SAS in new systems.
The drives are *much* more expensive, more intelligent, yield better performance if you have drivers and OS capable of using the features, and they are built to last, not built to be cheap (which is what SATA (and IDE) are mainly built for.).
0: striping: used to increase performance as you have multiple heads that each can seek independently, or can read/write in parallel. This reduces the reliability as any failure of any drive makes you loose all content.
1: mirror: makes a copy to increase reliability. Might give you a performance hit (if done in software, much less if done in hardware). Note it does nto protect against software errors or stupidity.
4: a system (not often used) where a number of drives are used to store data and where an additional disk is used to store a checksum. A loss of any disk can be repaired by using the redundancy of the checksum. This does read as fast as a stripe, has only one disk on N others as overhead to gain the ability to survive a single disk failure (no double failures). In writing there is a big problem as any write always must calculate and write the checksum (and that disk becomes quickly a bottleneck).
5: same as raid4, but the choice of which disk is used to hold the checksum is switched for each sector, removing the write bottleneck somewhat. Still read throughput would remain much better than write throughput, even when using hardware controllers.
You can combine some of the above.
Well managed raid arrays should have a "hot spare" disk: a disk that is immediate used upon detection of a failed drive in the array and used to rebuild the redundancy.
Monitoring of raid arrays is critical: one the rebuild is done (or a drive fails and no rebuilt succeeds, the redundancy needs to be restored: replace the failed drive with one of the same size (can be very hard, esp. if they give you a slightly smaller drive!)
Raid controllers in hardware cost a lot of money, but if you seek performance, they are the way to go. If you have an OS that understands/needs the concept of waiting for critical writes to be performed before moving on (commit to disk), there might be added benefit in letting the hardware controller confirm the guaranteed write to the OS without having performed it, and giving the controller a battery backup so it can perform it even if power would get lost (loosing the still to be written sector). These batteries wear out, so take care if buy a used controller.
In mid level storage SAS is the choice for performance, reliability in heavy use arrays.
SATA is either for low end requirements, of for backup/archival. The reason there being that SATA is so cheap that it is cheaper/faster/more reliable to make backups on SATA than on (high end) tape.
Old SCSI comes in a number of flavors:
- single ended / differential (even more expensive)
- narrow (50 pin)/wide (68 pin)
- optionally low voltage (e.g.: LVD=Low Voltage Differential))
Make _sure_ you match your controller and your drives, and learn about termination of the bus (you need to actively do that properly or it will not work reliably).
The "80pin" connectors are "SCA" connectors: they also carry power it's just to make it easier to wire/make removable drives. Last I checked they were more expensive, but it's while since I checked.
While a SCSI bus will support up to 7 or 15 connected devices, I'd not recommend using them all for intensively used drives, the bus itself will saturate after more than 4 actively used drives (add controllers or channels)
Some will argue that raid in software makes no sense at all, and from experience I tend a bit in that direction myself.
While I've used SCSI extensively in the past (I've placed orders where the factory asked me what I was going to do with all those special order only drives. Seems the wonder that if you order 1000 of their very top models in one go, and have serial number demands to go along with it).
I've never used SCSI on a wintel box. It doesn't make much sense there anyway AFAIK.
At my own home I use a hardware controller based array of 8 SATA disks, I'm not the most intensive user of my storage. My choice for SATA was price based, and I needed space and security, not so much performance.
|I've never used SCSI on a wintel box. It doesn't make much sense there anyway AFAIK. |
I think every other sysadmin would disagree with that statement, plus it is contradictory in that if it's something you've never used, tested or experimented with, how can you blanket disregard SCSI as an option on a Windows machine?
The RAID card is going to be the thing that does all the I/O communication and Windows has no problem saturating a hard drive's bandwidth.
Technical fine points aside, the safety and convenience factor should be a primary consideration.
I am writing this on a box with a battery backed scsi raid card and *mirrored* drives. Both are brand name server grade.
This has been mentioned here before, but let me recap.
The system was first installed in 1999. It has *never* been reinstalled or reformatted. The only thing I have ever done is reapply service packs because some fool of a juniour release engineer could not figure out the installshield manual or the windows certified rules for software installation.
But, it has had multiple drive failures, and has been migrated to three different systems.
If a drive fails, it is replaced and the system rebuilds the mirror automatically.
For an upgrade, I set the video to 640x480, shutdown, rip the drive subsystem out of the old box, install it in the new box, boot, set video to normal resolution. Done.
So, the system has carried my working environment over three machines for 9 years with basically no maintenance other then changing a failed drive. No data restore needed, unless *I* overwrote a file and need to retrieve it from tape.
As far as the post claiming that other factors outweigh drive performance. Um, no. The drive subsystem is the slowest link in the chain. Everyting else waits for data i/o. So, anything you can do to improve it will make your computing experience less frustrating.
The OS? NT4 server sp6a. Enterprise Edition.
It consumes 18MB when first loaded and is idle after login.
No automatic updates, and passes the "Genuine Windows Advantage" test when I need to download something for W2K3.