# alternative (daily) backup solution to rsync?

## saphear

Hey all,

quick question:

Are there any other, faster ways to do a daily backup of 200GB-250GB of data?

I have a hd with 222GB of data right now, a nightly cronjob running a "rsync -avc --delete" to another hd.

it takes about 3-5 hours every night..doesn't really bother me, just wanted to ask if there is a better way to let the harddrives work less.

I don't know how save it is to turn of -c, since I really need to be sure that everything edited is copied. The files are usually edited from windows boxes via samba access...

any ideas?

thanks in advance

----------

## bunder

tar?

----------

## saphear

how do you mean? 

taring everything every night? I don't know how long that takes with 222GB and it's not the best if I actually need the backup, especially if I need it fast.

----------

## jlh

 *saphear wrote:*   

> Are there any other, faster ways to do a daily backup of 200GB-250GB of data?

 

Aren't fast and 250GB oxymorons?   :Smile:   Apart from buying faster HDs, maybe some RAID solution might also do what you want to keep the data redundant.  But you didn't mention what the backups are for exactly.

 *saphear wrote:*   

> I don't know how save it is to turn of -c

 

I'd say it's safe provided the modification times get modified upon modification.  By disabling this you probably can safe a lot of time for the backups.

Also, the type of files you have and the type of changes that happen to them may be releveant for choosing a backup solution.  (As in 'few large files with small changes to all of them' vs. 'lots many small files with heavy changes to few of them')

----------

## saphear

well... 

lots, looooooots of small files

oxymoron? more or less yes I guess  :Wink: 

both hds are normal sataII hds. didn't want to go for a raid solution, since there are several people working with the data/files. a redundant raid mirroring means if a stupid user deletes a file on sda1 its gone on the mirror as well...with the nightly backup (including --delete) I got at least a few hours to do something about it. Usually when a user deletes something by accident they realize it pretty fast...(and there is a software on one of the windows boxes as well, that causes some trouble in deleting some stuff sometimes..but you see it the moment you start the software, and it is started every day.)

besides the fact, that a fast raid solution for 222gb isn't the cheapest. But that is not really the deal, as I said before - my problem is not the 3-5h every night, since noone is working on our server at night, but trying to reduce disk abrasion / attrition (don't know the right word). The disks got changed just a few days ago, 2x 250 out and 2x 750 in...so they are pretty new right now...just to be safe, you know  :Smile: 

the data contains mostly picture stuff, since the server is running in a graphic/design company..means small jpg things, big eps files, photoshop, indesign etc

----------

## HeissFuss

 *Quote:*   

> 
> 
> Aren't fast and 250GB oxymorons?  Apart from buying faster HDs, maybe some RAID solution might also do what you want to keep the data redundant. But you didn't mention what the backups are for exactly. 
> 
> 

 

RAID is not backup...snapshots are close though.

Turn off -c.  You're doing a checksum on ALL of the data, which means reading every file completely to run the checksum (thus the time it takes and the HDD work.)

----------

## nobspangle

If this data is important to you and you are worried about loosing it, you should also have a RAID solution, at least on your server system. Software RAID 1 would be fine and protects you against drive failure.

----------

## nephros

While I'd agree a RAID-1 is always advisable for important user data, remember RAID != backup.

RAID only protects against drive failure.

NOT against data corruption.

NOT against user error.

That's what backups are for.

So it would be advisable to have the backup device on RAID, but even then we're back at the original problem.

As you mention lots of small files, maybe you would gain sone speed if you tarred them up[1] in nice little chunks[2]and then rsynced THOSE tarballs to your backup drive?

Snapshotting sure is nice (gotta love those netapps at work), but has anyone here played with it on linux?

[1] oh, and remember:  thou shalt not compress backups, for the single corrupt byte destroyeth everything!

[2] e.g. into a tmpfs in 300-600 MB chunks

----------

## bunder

 *saphear wrote:*   

> how do you mean? 
> 
> taring everything every night? I don't know how long that takes with 222GB and it's not the best if I actually need the backup, especially if I need it fast.

 

you can do incremental backups with tar... not sure what kind of backup rotation you wanted...

----------

## pdr

I use rsnapshot. It does an rsync for the first (full) backup, and subsequent backups only rsync files that changed; if they did not change it makes a hard link to the previous backup. This way you can completely delete a previous backup and all the files are still available in the last backup; downside is that if you delete a previous backup, the only files you will actually delete are ones that existing in the previous backup but were not in the latest backup.

I don't back up my multimedia (can re-rip them), so my full backup for my server was 18GB. The next backup was about 400M (mainly inodes).

----------

## kimmie

If you're using xfs you can use xfsdump, it's very fast. You can run it directly from cron, you can even set the +d attribute on files/directories to indicate they should be excluded from backups. I've never had a problem and I've used my backups to clone machines. It's great.

To give you an idea of speed, /home on one of my machines is 43G (mainly software & .mp3). A full backup (to a file on a dedicated disk) took 21 minutes, that's an average of 34M/sec. A daily incremental backup of 650M took 21 seconds; one of 11G took 5 minutes. The machine is dual pentium III 1.2GHz with ATA-100 disks. The disks themselves get about 50M/sec using hdparm -t, so I'm getting more than 2/3 of the hdparm speed of my disks running a backup with the machine loaded   :Very Happy: 

The downside is that it's not all that user friendly. Takes a while to get your head around. Restoring is a little clunky, although you can restore single files if you need to.[/b] And, of course, you have to be using xfs (I consider that a plus, really) and you can only backup from, and restore to xfs partitions, although you can store the backup files however you like.... 

BTW I once had a dodgy ATA cable - it took out my RAID 1 array, both disks. You could get the same thing with a controller, software, or operator failure. At the time, I thought RAID 1 was my backup, so I didn't have a backup....

----------

## HeissFuss

He's looking for an incremental solution, not a full nightly backup.  rsync should work well for that.  Personally I've used rdiff-backup for incremental rsync backups, and it works fairly well.

----------

## kimmie

 *Quote:*   

> He's looking for an incremental solution, not a full nightly backup. rsync should work well for that.

 

As I said, xfsdump provides an incremental (actually, it's differential) backup. And he's already using rsync, it seems to be too slow.

Hey saphear, have you considered doing what your doing now, but removing the -c option from rsync? Is there a reason you need to check files by checksum, because that will be slowing down your rsync a lot.[/quote]

----------

## alligator421

I don't know if it will suit your needs but I am using dar for backups.

----------

## nobspangle

I never said RAID was a substitute for backups that's why I said  *I wrote:*   

> you should also have a RAID solution

  by that I meant you should have RAID as well as a backup.

 *kimmie wrote:*   

> BTW I once had a dodgy ATA cable - it took out my RAID 1 array, both disks. You could get the same thing with a controller, software, or operator failure. At the time, I thought RAID 1 was my backup, so I didn't have a backup....

 

Were both drives on the same ATA cable? That's just asking for trouble.

----------

## Akkara

 *Quote:*   

> "rsync -avc --delete" to another hd

 

Might try giving the --whole-file option as well so it doesn't build each destination file out of deltas.

Perhaps cp -au might be faster, to get the changed files over, and then follow that with a rsync --delete to remove whatever should be.

I'm not sure why it is so slow however.  Even copying all 222 GB to a empty drive at a relatively leisurely rate of 30 MB/sec should only take (222*1024 MB) / (30 MB/sec) / (3600 sec/hr)  ~= 2 hours.

----------

## kimmie

nobspangle, I wasn't querying what you said, just emphasising that RAID isn't a backup.

And no, the drives were on separate cables.

----------

## jlh

 *Akkara wrote:*   

> Might try giving the --whole-file

 That's the default already if source and destination are local paths.

-c is the main source of slowness, remove it!  If you're really really worried or paranoid, you could re-enable -c once per week or so.

----------

