# RAID5 vs. RAID10?

## eno2001

I've been weighing the pros and cons of the different RAID levels for a mass storage server I'm building for home.  This server is going to be used to store virtual machine images for the Xen virtual servers I have running various services for the house (one of which will be the home file server).  In setting up my system, I've read the RAID5 tends to incur a performance hit depending on the stripe size vs. the data that you're working with on the RAID.  My plan currently is this:

1. 8 x 400 gig drives (total raw storage = 3200 gigs)

2. Configure them as RAID10 for 1600 gigs of mirrored+striped storage for maximum performance and redundancy

3. Assuming that this will be presented as /dev/md0 I will use the entire /dev/md0 as a physical volume in an LVM volume group

4. I will then carve out space from the volume group as needed for the various VMs and export them to the network either as network block devices or possibly iSCSI (the jury is still out as I like NBD since it's more mature)

However, something occurred to me...  Would there be a benefit to using RAID5 instead of RAID10?  As far as I can tell, if one drive fails in RAID10, I can replace that drive and the array stays intact.  If more than one drive fails, I'm hosed.  In RAID5 with one spare (don't know if I can have more than one spare), if one drive fails, I'm OK.  But if more than one drive fails, I'm hosed.  So it would seem that with that arrangement, they're both comparable.  I keep reading that RAID5 is slower than RAID10 if your data is smaller than the stripe size.  But seeing that I'll be hosting LVMs which are then exported to the network and remotely partitioned/mounted, does the size of the files on the remote file systems have the same restriction?  Will RAID5 still have a performance hit if I'm running a mail server that has tons of small files being read and written to?  Am I wrong in my thinking about RAID10?  And... can RAID10 have spares?  If so, which would be the better choice?

My goal is to essentially build a poor man's "SAN" that works similarly (at a low level) to the HP VA7400 series SAN here at work.  Yeah, yeah.... I know it's not the same.  But I said, "essentially" the same:  RAID1+0 sliced into whatever size logical devices I need and exported over a dedicated gigabit network.  (Heheh... I have four cat6 drops per room in my house)

----------

## alex.blackbit

wow, long post.

you write you want to export storage with nbd or iscsi. how many servers, and which do you have for running the actual services?

on a hardware raid controller you can define a spare (or hot spare) disk in raid 10 that automatically jumps in, if an other drive fails.

if raid5 is fast enough (judge yourself) i would use this option, because it's a bit cheaper.

just my $.02

----------

## NeddySeagoon

eno2001,

You need to do some data flow analysis and some reliability analysis.

The data flow analysis may well show that performace is not limited by the disk data rate. It may actually be the useful data rate on the network.

Reliability analysis may show you have several single point of failures that can take out your entire raid set, regardless of its raid level. raid offers some protection against drive failure. What about RAM, CPU, PSU, or the network causing rubbish to be written to a perfectly good raid set?

Raid is not a substitute for backups.  

```
rm -rf <very-important-data>
```

removes the data with no hope of recovery.

Raid 10 will tolerate one drive fails and some combinations of two drives.

Raid 5 will tolerate a single failure and gives you more usable space.

Raid 10 gives n/2 drives useful space. raid 5 gives n-1 drives useful space.

Think about raid 6, that tolerates any two drives failing and provides n-2 drives useful space.

They can all have hot spares.

With that sort of space, maybe you want a pair of redundant servers, so everything is duplicated, except the primary power supply. That approach takes care of most (not all) of the other single point of failures.

You still need validated backups - backups and raid solve different problems.

----------

## pilla

http://en.wikipedia.org/wiki/RAID

With RAID5+spare disk, "The array will have data loss in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive." However, RAID5 has the advantage of not requiring twice as much disks as the data space you want. You'll need just the equivalent of an extra HD for the parity bits. In your scenario, you could have 6 disks for data, 1 to account for the extra parity (that will be distributed through the other disks too), and an extra disk for hot spare. That would make for 2400 gigs of data, instead of 1600 for RAID10 without the spare drive. 

However, RAID10 can be faster, as you can read from the mirrored disks in parallel.

----------

## NeddySeagoon

 *pilla wrote:*   

> However, RAID10 can be faster, as you can read from the mirrored disks in parallel.

 

but the kernel doesn't seem to. Raid 1 appears to be the same speed as a single drive however, raid0 is almost twice as fast as a single drive.

You should see a read speedup due to the raid0 and a write slowdown due to writing everything twice.

----------

## eno2001

As always, thank you very much for the helpful suggestions.  My main reason for trying to set this up isn't to replace backups, but to support Xen's live migration of virtual machines between the physical machines hosting them.  Currently I have two AMD64 machines acting as Xen servers.  I'm hoping that with centralized storage and the ability to have the VM running completely off of that storage, that I can keep my virtual machines up and running even in the face of impending hardware failure.  Some of it has actual useful utility, but some of it is just to gain the experience working with the technologies in a real environment that has some real uses.

Do you have any specific recommendations for data flow analysis?  I think you are quite correct in your assumption that my storage speed is limited by the network speed.  Even with 1Gb copper, the SATA controllers are rated for 3Gb and I suspect that since this is low-end equipment, it's probably a bit slower.  So it may be that RAID5 or RAID6 would be just fine for this application.  One thing that influenced my choice for RAID10 is that the VA7400 I use at work is RAID1+0.  This seems to be an ideal arrangement for a larger set of disks and my only exposure to RAID5 has been on individual servers.

My expectation is that I will be using the virtual machines (with their storage living on this machine) to run the following services for my home:

1. External DNS

2. Internal DNS

3. Zimbra mail server (Jetty, MySQL, Postfix, and a few others)

4. DHCP

5. Apache to host two or three web sites (probably a typical Apache, MySQL, PHP/Perl configuration)

6. NX based application server (Gnome Desktop) and NFS for the entire house

7. Possibly a separate NFS server just for the Gentoo based media center

8. OpenVPN for virtual private networking

9. Asterisk for private VoIP (SIP only over the VPN.  No POTS)

Currently that's about all I have planned.  I've already been running almost all of these services on Xen using the local storage in each Xen server.  The only limitation is that if I have to down a physical box I have to take the VMs down too.  I am hoping to overcome that with centralized storage since Xen allows for migrating a live virtual machine ("domain" in Xen parlance) from one physical host to another.  I've used it at work through the VirtualIron product (to host Zimbra for about 2000 user) and it really is something to behold.  No hiccups on the user end at all...

So that's more where I'm headed with this.  Backup is still an issue and for now, I'm thinking that the old file server with 1 TB of space could be pressed into a remote rsync server.  I currently have about 400 gigs of data of which perhaps only 100 gigs is what I would term critical (family photos, videos, etc...).  That information is currently backed up to DVD periodically.  As far as really good backups to more reliable media, I'm not sure what to do.  I've considered a SATA DLT drive, but in the event of my demise, who would restore it?  I'm the only *nix user in the family.  But that's for a different thread.  Thanks again.

----------

## NeddySeagoon

eno2001,

Consider the following element typical numbers.

Hard drive head/platter data rate - 50Mb/sec

32 bit 33 MHz 100Mb/sec (theoretical max is 133Mb/sec but yu can't get close in practice)

1Gbit network 100Mb/sec 

Hard drive to RAM ... 50Mb/sec  The disk bus can be ignored apart from bursts, irs well above the 50 Mb/sec

RAM to Network card (over PCI bus) 100Mb/sec

Now is the drive to RAM and RAM to network over the same bus ?

If so, your PCI bus is maxed out carrying the data twice. It can just cope with the data from a single drive.

With separate PCI buses you have some headroom, as each will carry the 50Mb/sec once.

With Raid0 (two drives) you max the PCI busses out again.

PCI-X is faster of course.

What will the CPU in the server be doing?

Just setting up DMA tranfers to move data around?

If thats true, the CPU load will close to zero in normal operation.

Neither raid0 nor raid1 (nor raid10) has any redundant data to calculate. Raid0 is the physical organisation accross a number of drives and raid1 makes an exact copy.

You may as well make the CPU work for its living doing raid5.

I don't know what data rates you expect to see over the network to provide more details.

----------

## HeissFuss

Just adding a couple of comments to this.

Since RAID10 is mirrored pairs of disks with striping across the pairs, it can tolerate multiple failures as long as two paired disks don't fail together.

Reading is faster from RAID10 than RAID5 for the same volume due to increased disk count for the RAID10.  However, if you have the same amount of disks for both it's likely that the read speeds will be almost the same.

Writing to RAID5 is slower due to parity, and in a lot of situations it can affect read request response times/throughput.  If you're going to have a lot of read requests while writes are occurring, RAID10 may be the better option.  For home use though you probably won't notice most/any issues unless you're doing a lot of disk hits (small read/writes.)  For large files/sequential operation I doubt there would be a noticeable difference in any of this with the same spindle count.  RAID5 is the more economical option for home use.

----------

