# RAID 5: how reliable is it? [VERY!]

## Guinpen

Hello,

I have a Nvidia 680i motherboard and a WD500YS 500GB hard drive. I'm thinking of buying 2 more of these and setting up a hardware-assisted software RAID 5. I need my data to be safe against hardware failure.

I have some questions about the reliablity of such a scheme:

1) When a drive fails, how will I be notified, and how difficult / reliable would be to rebuild the RAID on a replacement drive without losing data (downtime is not an issue)?

2) What happens when one drive relocates sectors or the RAID array becomes inconsistent in some other way? Will my data be destroyed? Is this an issue at all? In other words, how likely is a RAID 5 array to self-destroy?

3) How stable is Linux support for software RAID 5? I read some very bad stuff about combining RAID 5, ext3, and cryptsetup (LUKS), or a combination thereof (which has RAID 5). I would love to hear about your experience.

4) Is RAID 5 a better scheme than just getting a monster drive and using rsync to maitain a daily backup? It should be in theory, but what about in practice?

Thanks!Last edited by Guinpen on Wed Apr 25, 2007 10:02 pm; edited 1 time in total

----------

## liber!

I would advise you if you store critical data to use a hardware raid card, I can recommed Areca. They have much better error correction.

Also raid is NOT a backup method, so still daily or weekly you should rsync, copy your data to somewhere else (preferable to another building).

----------

## fangorn

To 4) With RAID5 if you do an accidental rm, all data are lost. With the daily backup you loose 1 day work at most. So the basic rule is

RAID5 IS NO BACKUP! RAID5 just saves you from a harddisk failure.

To 1) I would go for pure software kernel RAID5 for maximum portability (hardware independence) or for pure hardware RAID5 for most performance/least problems/best compatibilty with other features.

Edit: too late, too busy  :Rolling Eyes: 

----------

## pdr

I have software raid 5 running on 4 160 GB drives on my server. On top of that I run LVM2. For a few logical volumes, above that I run luks. All partitions are running ext3 filesystem. One of the luks partitions I have resized (grown) twice. Have had absolutely no problems. Accept for an occaisonal reboot (new kernels, etc) have been running it 24/7 since last august.

----------

## Guinpen

Thanks everyone!

Hardware RAID is not an option for me, because it is much too expensive. I looked at Areca, they make some good hardware, and price it accordingly  :Sad: .

pdr, I'm very interested in your kernel version(s) and architecture. Please describe your environment. Also, what motherboard and hard drives do you have?

Questions 1 and 2 still stand unanswered  :Smile: 

----------

## Cyker

This is assuming you are using Software RAID.

'Accelerated' RAID, which uses the motherboard hardware, is slower and less reliable and generally crapper than Linux software RAID. The ONLY reason to use it is if you have a Windows partition which you want to share the data with.

1) Depends; You have a basic kernel notification but unless you have something checking that you might not even know. I personally use mdadm and smartd to monitor the drives - If drive hardware fails or something weird happens to the array, I have configured smartd and/ormdadm to complain loudly by spamming all consoles in utmp until it's fixed  :Smile: 

I already had a crappy Seagate 7200.10 (What happened to their reliability?!) unit fail 1 month after I put it in my array. Only noticed when I got home 8 hours or so later. Went through the horrid RMA procedure on Seagate's website (System was still working; Didn't really notice any performance loss, which was cool  :Smile:  Admittedly I wasn't doing anything heavy...), then shut down the system, bought a WD to replace it (I couldn't wait 2 weeks+ for Seagate's rubbish RMA process), slotted it back in.

System booted up in degraded mode, so I cfdisk'd the replacement drive and gave it the filesystem type FD (Linux RAID autodetect) and then ran 'mdadm /dev/md0 --add /dev/sdc'. It started re-building the array in the background and I went off to trawl the forums  :Razz: 

2) If the disk doesn't service a request for too long, it may get dropped from the array. This includes bad-sector relocations, re-cals, self-checks etc. There is a command to re-add a disk, and mdadm will automatically fix any consistency problems (Can take a while to execute 'tho).

Most hard disks don't do that kinda stuff these days. One of the drives in my array is a WD SE16 and it says it should not be used in a RAID5 for this reason - It's working well so far, and is a damned lot quieter and cooler than the Seagates.

Re-reading your question; If a hard disk relocates a bad sector, the actual data on the disk should be still the same. RAID doesn't care how or where the disk puts its data, just that its all the same when read back.

If it screws up the data when it relocates it 'tho, the checksum will be different and the array will probably kick it out on account of it corrupting the data.

3) I also heard lots of horror stories about RAID5 and journalling file-systems.

It seems to be stable now 'tho - This array uses ext3 in 1.5TB and has been through about 4 months of use and a powercut, and is still working. I ran a huge diff on a large chunk of files that I still had on another box about a month ago, and they still read out correctly. Haven't had any crashes or other obvious corruptions, so I reckon it's pretty smooth.

I'm not using LVM 'tho, and it seems more a mixture of LVM, MDADM and Journalling fs that seems to be at odds... you'd need to get more feedback than just me on that  :Wink: 

4) Depends on your goal - If its protection of data against hardware failure, RAID5 is 'better' in the sense that it doesn't force you to do a restore when a disk dies - You can cary on using it in degraded mode, so less down-time, and you won't loose any work at all, whereas with a backup system you'd loose everything upto the last backup in a worst-case scenario.

However, RAID5 won't protect against file corruption, e.g. due to a bad kernel release or an accidental rm -rf (!!).

Also, unlike with single disk failure, where you can often run some sort of disk recovery attempt like using dd, a full array failure (i.e. with RAID5, 2+ disks kick the bucket at the same time) means all that data is GONE. There is no software tool on earth that can recover that array.

The only hope of recovery at all is a data recovery company, but they charge significantly more to recover a RAID5 array than a single one.

Luckily, the chances of such a thing happening are a lot lower than a single-disk fail, but the consequences are a lot worse if it does  :Sad: .

So, depending on your paranoia level, it might be a good idea to at least have a backup drive as well for critical stuff stored on the array!  :Shocked: 

Suggestion:

If you don't actually need 1TB of storage (Which is what 3x500GB HDs in RAID5 will get you), you could buy those 2 hard disks, RAID1 one of them with the existing and keep the third as a day/week/monthly backup drive.

That would give the maximum protection of data and minimum down-time.

It may feel like a waste (3 drives for the same amount of data!), but that is the nature of redundancy and backup  :Wink: 

It won't feel like such a waste if the worst happens!

----------

## Guinpen

Wow! Thanks for the EXTENSIVE reply. Based on what you said, I've decided to go for the RAID 5.

 *Quote:*   

> This is assuming you are using Software RAID. 
> 
> 'Accelerated' RAID, which uses the motherboard hardware, is slower and less reliable and generally crapper than Linux software RAID. The ONLY reason to use it is if you have a Windows partition which you want to share the data with. 

 

Hm, I didn't know that. I do want to dual-boot with Windows, but it's not critical that Windows can access the data in the RAID. So I suppose I will have the Windows partition actually contain Windows on one drive, and contain some extra space for Windows on the other drives, while a different partition will hold myLinux RAIDed data. This should be possible, if I read the RAID HOWTOs correctly.

 *Quote:*   

> System booted up in degraded mode, so I cfdisk'd the replacement drive and gave it the filesystem type FD (Linux RAID autodetect) and then ran 'mdadm /dev/md0 --add /dev/sdc'. It started re-building the array in the background and I went off to trawl the forums

 

That sounds good enough!  :Smile: 

 *Quote:*   

> Most hard disks don't do that kinda stuff these days. One of the drives in my array is a WD SE16 and it says it should not be used in a RAID5 for this reason - It's working well so far, and is a damned lot quieter and cooler than the Seagates.

 

I'm glad I did some forward-thinking and got the WD RE2, which is mechanically the same drive, except that it was designed for RAID (terminates operations quickly) and has a better warranty. The WD drives are amazing! In my Antec P180 case (which has special anti-vibration mounts) the drive is absolutely silent. I highly recommend WD to everyone.

 *Quote:*   

> I also heard lots of horror stories about RAID5 and journalling file-systems... 

 

I don't care about LVM so I should be fine. I guess I'll do some tests before I put my data on it. My idea is to have RAID 5, with LUKS on top, with ext3 in data=journal mode on top. We'll see how that goes.

 *Quote:*   

> However, RAID5 won't protect against file corruption, e.g. due to a bad kernel release or an accidental rm -rf (!!). 

 

That's not an issue. I have a small (<20 GB) amount of very critical data, which is replicated over several computers in several buildings. The rest is large and important, but not as critical, and I'm ready to take the risk of only protecting it against single-drive failure.

 *Quote:*   

> If you don't actually need 1TB of storage (Which is what 3x500GB HDs in RAID5 will get you), you could buy those 2 hard disks, RAID1 one of them with the existing and keep the third as a day/week/monthly backup drive. 

 

That would be a cool setup  :Smile: . But no, I will need the storage. Thanks again, Cyker, I owe you one.

----------

## Cyker

One last thing (Assuming you haven't started!  :Wink: )

Even if you're following a guide, it's a good idea to keep notes of your own on what you're doing and why.

It makes replication easier later if you do it again, or retracing your steps if you add/change something or do something that you're not sure about.

It already saved me once (I messed it up on my first go  :Wink: ), and its also allowing me to annoy everyone on here with my overly verbose posts on RAID5  :Mr. Green: 

This is good practice when doing any thing really, but IMHO espescially important with big things like this!

----------

## Guinpen

Of course. I am currently waiting for the new drives to arrive. As soon as I'm done with some pressing university work, I will play with the RAID setup and then document everything I've done in a nice article/HOWTO. I will post the link here, once that happens.

----------

## Guinpen

OK, I have a nicely running reliable RAID5 with cryptsetup-LUKS on top with ext3 on top. However, now one of my 3 drives is dying - it's new, but probably defective.

Question: should I fail the drive and run in degraded mode until I get a replacement or should I keep running like this until I have a replacement sitting in from of me? The drive shows read errors, smart is not happy with it, and it will probably fail soon.

----------

## fangorn

Fail it now.

If you have read errors and write the false data to the parity data on another disk, the data are gone. 

Minimal read access and no write access at all if possible.

And the less time with a degraded aray the better. So dont wait for the replacement drive. Buy another one and put the replacement drive in storage for further problems.

----------

## Guinpen

Thanks, I will fail it tonight. Unfortunately, it will be hard to buy a new drive right away because I'm on a student budget. Besides, what use would I have for a 4th drive?

To the point: What is the right command to fail the drive as safely as possible? (Different google results say different things.) What is the right command to execute when a new blank drive is in?

----------

## Cyker

 *Godji wrote:*   

> Thanks, I will fail it tonight. Unfortunately, it will be hard to buy a new drive right away because I'm on a student budget. Besides, what use would I have for a 4th drive?
> 
> To the point: What is the right command to fail the drive as safely as possible? (Different google results say different things.) What is the right command to execute when a new blank drive is in?

 

The man pages (man mdadm) are very informative; I highly recommend  :Smile: 

From the man page, the fail command would be something like:

```
mdadm /dev/md0 --fail <device>
```

To fail it and remove it from the array, you could also do:

```
mdadm /dev/md0 --fail <device> --remove <same device!>
```

NOTENOTENOTE: If the drive has already been marked as FAILED by the mdadm watchdog/kernel, you don't need to mark it as --fail again because it is already  :Wink: 

You can just --remove it and then unplug it.

Alternatively, ignore it until you get a replacement - It won't be used while in FAILED mode anyway. This is what I did, but my RAID array is in my server which I almost never turn off.

If you are a 'normal' user and turn your machine on and off every day, this is a waste of power and the array will probably try to rebuild itself needlessly every time you boot up - In this case you're better off --remove'ing it and taking it out.

When you get the new drive, whack it in (Make sure you plug it IN THE RIGHT SOCKET!  :Shocked: ) then cfdisk it to give it the right partition type like you did when you built the array.

Then to poke mdadm to tell it about the new drive, it's just:

```
mdadm /dev/md0 --add <device>
```

...and you're away with the sound of a bunch of disks going mental  :Smile: 

Then you can 'watch cat /proc/mdstat' or 'watch mdadm -D /dev/md0' and then wait a few hours for the rebuild to finish  :Smile: 

----------

## Guinpen

OK, so I failed, removed, and then physically removed the drive, and the RAID5 was running fine in degraded mode.

Then, as I was copying a file on it, the whole system hung (it has happened before, so it's likely unrelated). Upon reboot, mdadm refused to assemble a degraded AND dirty RAID5 until I used the --force option. I did so, and it appears normal. I haven't run fsck.

So, having ordered yet another drive to replace the RMAd one (because that one will take a while), I'm wondering how to fix the dirtyness issue. Should I do anything now (what?) or should I wait for the replacement drive and add it to the dirty array?

Thanks again for all your help.

----------

## Cyker

In all honesty, unless you have a backup of the entire array, I would recommend you try to use it as little as possible until you get a new drive in!

If one of the other drives goes, everything on the array will be lost and will be impossible to recover unless you have a few hundred thousand of your local currency lying about...

As it stands, you are in a very dangerous situation - Dirty means Linux thinks the parity calculations for some bits of the array may be wrong/incorrect.

This is normal after something like sudden power-loss, and under normal circumstances Linux will automatically run a re-sync pass on the array to make sure everything is in order (Or, if you turned that off, you'd have to run something like 

```
echo check >> /sys/block/mdX/md/sync_action
```

 to make it to re-sync the whole array.)

Since yours is degraded, I'm not sure what that will do so I'm not going to recommend trying it until the new disk arrives. There is a good chance some of your data is corrupted as it is...

BTW, from what I read as I was checking the man pages, I think you are supposed to run mdadm with --run instead of --force to tell it to bring a degraded array on-line...

----------

## Guinpen

 *Quote:*   

> In all honesty, unless you have a backup of the entire array, I would recommend you try to use it as little as possible until you get a new drive in! 
> 
> If one of the other drives goes, everything on the array will be lost and will be impossible to recover unless you have a few hundred thousand of your local currency lying about... 

 

Point taken. Given that I have cryptsetup (LUKS) sitting between the RAID and the filesystem, I think that will be more like a few million.

 *Quote:*   

> Since yours is degraded, I'm not sure what that will do so I'm not going to recommend trying it until the new disk arrives. There is a good chance some of your data is corrupted as it is... 
> 
> BTW, from what I read as I was checking the man pages, I think you are supposed to run mdadm with --run instead of --force to tell it to bring a degraded array on-line...

 

But if the array is dirty and I try to add a drive (thus rebuilding the parity, will it cope? BTW I think the only corruption would be on the files I was writing (which I deleted after the forced assembly), so the files that were alrady there should be fine.

The message that refused to run the array explicitly said to use "--force". Maybe they are equivalent in this case?

----------

## Guinpen

I have replaced the defective drive and rebuilt the array (using the isntructions above) successfully. Despite running in degraded mode for over a week and having at least 2 dirty reboots, the array reports no mismatches, the filesystem is fine, and no data appears corrupted in any way.

Needless to say, I'm exremely happy with the Linux software RAID subsystem. Once again, bug thanks to everyone who helped me out.

----------

## Cyker

Yay!  :Mr. Green: 

Glad it all went okay  :Smile: 

----------

## Guinpen

I'm glad too! Believe me, I am.  :Razz: 

----------

