# RAID & backup questions

## Beetle B.

OK, so here's the background.

I'm planning on building a new computer soon. Primarily will run Gentoo (over 95% easily). Definitely will use LVM2 - except for the root partition /boot and /etc. Stuff like /usr will be on LVM2.

I was also thinking of going RAID-1.

I was considering software RAID. I take it that the performance hit is quite minimal?

I intend to put everything under RAID-1 - except the swap partition. Is that OK? Someone in another forum is insisting that RAID-1 should only be for "data" and the "OS" should not be under RAID-1. He tends to know his stuff, but I just don't understand why.

Let's say I go the LVM2 + software RAID-1 route. Will it be easy for me to disable RAID-1 at any point in the future (overall or for a certain partition) without headaches?

Now the reason I want RAID-1 is simply peace of mind. I know it won't protect me from my own stupidity (e.g. if I delete a file accidentally, or do rm -rf /). I do have scripts that backup important files and upload them to my server, but I really want to protect from hardware failure (i.e. HD crashing).   

His suggestion is to forget RAID-1, and simply do a daily backup/clone to another drive. I'd like your opinions on the pros and cons of both approaches. One obvious pro is that it will save my ass if I delete something I shouldn't (and realize it before the day is over). However, I suspect it has other pains. 

What software would you recommend for cloning partitions? Will rsync be sufficient?

Here are some overall concerns:

1. Ensuring that the backup system doesn't try to backup, say, /mnt/dvd. Note that it will need to backup other partitions under /mnt. 

2. Ensuring it backs up partitions that are not currently mounted (e.g. /boot, etc).

3. When I create new partitions later on, remembering to add them to my script/system. 

4. Ensuring it doesn't consume too many resources when it's active. I don't want it to affect my compile times, for example. 

5. Ensuring it handles hard + soft links, etc. 

6. If I'm using a (complex) software to do the clones, being confident that it will continue to work well for, oh, the next 5 years. 

To be honest, if the performance hit of software RAID-1 is not much, and there are no other cons to RAID-1, I'd rather go that route. Still, if there is an elegant cloning solution that's both simple and takes care of all the bullets above, I may opt for that just to have a "real" backup. 

Thoughts?

----------

## avendesora

Hi,

First thing first: RAID is not backup. Backup is not RAID. Don't discuss the two things at once, they are not linked at all.

RAID can provide some protection against hardware failure (depending on actual setup).

backup can allow you to go back to a point in time in the past and restore file "versions" from that time (depending on if it's done right - a backup "solution" that you don't test by doing restores is usually worthless).

i.e. backup = go back in time, raid = keep going forward if something "mechanical" goes crash.

You seem to be implying that RAID1 will hurt performance. That's not generally the case. RAID1 can be very fast for reads (theoretically up to the combined speed of the underlying devices). Writes should not be slower than the slowest disk in the group. (i.e. worst case RAID1 is just as slow as the slowest disk).

Now if you do things like take a pair of disks, map some parts of them in raid1, others with raid0, others still used 'directly' - yes there you could end up with something that doesn't work very well. Keep it as simple as possible.

Concerning your point 4, you have to make choices. Either your data is valuable or it's not. If it's valuable, then you should realize that the backup process is important. Reading and saving all your data takes resources (mostly disk io though, not much cpu unless you're compressing). So it will affect your machine's performance while it's running. So just schedule your backups when it affects you less.

----------

## Beetle B.

 *Quote:*   

> First thing first: RAID is not backup. Backup is not RAID. Don't discuss the two things at once, they are not linked at all. 

 

Never said I planned to use RAID as backup. However, I did say that cloning HD's daily would be a better option as it has the benefits of RAID-1 and serves as a simple backup.

To be frank, it's getting irritating that wherever I go, people keep telling me that RAID-1 is not backup, even though I pretty much said so in my original post!

So since there may have been confusion, let me emphasize: The ideal goal is that if one HD fails, I just want to be able to boot straight from the other HD with virtually no work. If there's a good cloning solution that allows for this easily, I'd prefer it over RAID-1.

 *Quote:*   

> backup can allow you to go back to a point in time in the past and restore file "versions" from that time (depending on if it's done right - a backup "solution" that you don't test by doing restores is usually worthless). 

 

OK - I see what you mean. Well, in my case, cloning is a fairly "trivial" backup. It will let you go back in time by one period (day, week, whatever) only.  :Wink: 

 *Quote:*   

> Writes should not be slower than the slowest disk in the group. (i.e. worst case RAID1 is just as slow as the slowest disk). 

 

All the benchmarks for software RAID-1 show a small degradation in write performance (i.e. longer write times for same HD's compared to no RAID). 

 *Quote:*   

> Concerning your point 4, you have to make choices. Either your data is valuable or it's not. If it's valuable, then you should realize that the backup process is important. Reading and saving all your data takes resources (mostly disk io though, not much cpu unless you're compressing). So it will affect your machine's performance while it's running. So just schedule your backups when it affects you less.

 

OK - Fair enough. Let's forget point 4. Do you have any easy + good cloning solutions that is not software RAID-1? Keep in mind that the ultimate goal is that if one HD fails, I can simply boot off the other HD. I don't want a backup in the sense that I have multiple copies of some data (going back a day, week, month and year, or whatever). It's only backup in the sense that if I stupidly delete something, and realize it before the next scheduled syncing, then I can recover the file. 

One headache I see is manually having to maintain the same partition structure on the other HD - coupled with LVM2. 

One simple solution would be to periodically reboot into a DVD that can simply clone byte-for-byte. Somehow, that doesn't appeal to me. I'll consider it, though. It would be much better if I could do the syncing/cloning without needing to reboot.

At the moment, software RAID-1 seems the simplest solution. I did just read a horror story from a guy who lost everything because his RAID controller malfunctioned. In principle, it could happen with software RAID as well.

----------

## Beetle B.

You know, I'm sure most people will get confused. So let me ask a simpler question:

Is it possible to use rsync to mirror a HD easily? Should handle hard + soft links, should be easy to omit stuff like /mnt/dvd and /sys. Let's forget about the MBR. Assume that I'll have to manually manage the creation of partitions in the new HD, and that the script will handle mounting/unmounting. 

Goal is that, after "fixing" the MBR, I can simply boot into the other HD and have everything like it was in the original HD. How much of a pain is it to automate this with rsync?

----------

## energyman76b

I would go Raid5 not 1. 

The reason, after an unclean shutdown you can end with blocks containing different data - called a mismatch. In case of a mismatch the kernel has to guess which one holds the correct data. Statistics say it guesses correctly in 50% of mismatches. Murphy dictates it is never right....

----------

## Beetle B.

 *Quote:*   

> I would go Raid5 not 1.

 

Raid 5 requires 3 drives. I'll have only 2. 

After looking into Rsync, it seems that just doing a daily or weekly rsync should take care of all my needs. The minor headache will be maintaining an identical partition table on the other drive (identical volume groups, etc). But since those rarely get modified, I don't think it's a big deal. 

I guess at this point I don't see the need for RAID-1.

----------

## frostschutz

 *Beetle B. wrote:*   

> Is it possible to use rsync to mirror a HD easily?

 

rsync can't mirror HDs. It can copy files though. It's actually the best copy tool around.

 *Beetle B. wrote:*   

> should be easy to omit stuff like /mnt/dvd and /sys

 

The best way to avoid such problems is to make a clean mount for copying. For example the root partition can be mounted to / and /mnt/root/ at the same time. With no additional mounts under /mnt/root/* you can copy that without fearing to traverse to filesystems you don't like. This way you will also be able to access files that are otherwise hidden by mounts.

 *Beetle B. wrote:*   

> Assume that I'll have to manually manage the creation of partitions in the new HD

 

traditional ms-dos partition layout can be easily cloned with sfdisk

for everything else parted should be good

 *Quote:*   

> How much of a pain is it to automate this with rsync?

 

The main problem with rsync is that the copy does not occur at the same time. So if you make a copy while there is some disk activitiy, the result on the other side may not be consistent.

You can work around that problem using LVM snapshots, however if you go there, you could just as well use LVM for replication altogether.

If you want to be simply able to boot from the other disk, RAID 1 is still the best choice. Neither rsync nor RAID really counts as a back up in this case though. At least I prefer to have several more copies for that, and the backup itself incremental, so I can go back to older states if need be.

----------

## gentoo_ram

I installed a combination of RAID-1 and LVM on my latest home server install.  I have 2 1-TB drives I partitioned the same:

md1 = sd[ab]1   150M  RAID-1   /boot

md2 = sd[ab]2  4G  RAID-1       /

sd[ab]3  4G   two swap partitions (non-RAID1)

sd[ab]4, extended

md5 = sd[ab]5  (the rest)  RAID-1  LVM

Then in that LVM set up areas for:

/usr

/opt

/home

/tmp

/var/tmp

/var/log

I separated those because I have different settings on the partitions.  I kept the root partition off of LVM so I wouldn't need an initramfs to boot.

Then every night I backup using rsync to an external drive.  The key to my rsync backups is to use the parameter "--link-dest=".  That makes rsync do hard links on the backup drive to files that don't change.  Emulating the Apple Time Machine functionality.  The script boils down to this:

```
curr_date=`date +%Y-%m-%d-%H:%M:%S`

back_root="/mnt/backup/"

mkdir $back_root/$curr_date

rsync -av --exclude-from=/root/backup-excludes --link-dest=$back_root/current / $back_root/$curr_date

rm $back_root/current

ln -s $curr_date $back_root/current

```

The /root/backup-excludes file contains stuff like "sys/*"  and "proc/*"   Exclude whatever you would like.

This creates a separate directory for each backup that contains your entire directory tree.  The files that don't change are hard-linked back to the previous versions and don't take up more space.  If you want to restore, you just go to $back_root/current and pull out any/all files you want.

The more backups you keep, the more space it takes.  But this design allows you to do restores from previous days.  This will be useful if you discover you deleted a file by accident a couple days later.  You have a history.

----------

