# How to compress offsite backup?

## tholin

I have an offsite backup of my private files at rsync.net. I pay for it so it is in my interest to reduce the size of the data as much as possible. I'm trying to come up with a way to compress my data but I can't think of any useful way.

The backup is taken by rsyncing files into an encrypted loopback filesystem. That loopback file is then rsynced to offsite. Compressing the encrypted file is futile, so is compressing the block device. Compression and write support can't be combined in the block level. That only leaves the filesystem and files on the filesystem.

There doesn't seem to be any stable, writable filesystems with compression support for linux. By stable I mean actively maintained, large userbase and used in enterprise systems for several years. Windows, BSD and Solaris got this but not Linux  :Mad:  The last hope is compressing the files themselves in a tarball or individually. Tarballs use solid compression so that won't work, and I can't think of any practical way I could compress the files individually. Fuse filesystems are out of the question.

Can anyone come up with some other solution?

----------

## John R. Graham

 *tholin wrote:*   

> Tarballs use solid compression so that won't work, and I can't think of any practical way I could compress the files individually.

 Can you elaborate on that a little bit? Why wont it work?

- John

----------

## tholin

I should probably also have mentioned that only the filedata that has changed since the last sync should be uploaded to offsite. It's not possible to add or modify files in a solid archive without recreating the entire thing and modifying almost every diskblock in the process.

If I'm going to compress the files myself I'll basically have to reimplement the functionality of rsync. Uncompress the old file, check that the filesize and stat hasn't changed, if it has the old file must be deleted and the new file compressed and added. I could implement it but I would probably never trust that it actually do what I think it does. This is backup we are talking about.

----------

## John R. Graham

I think the real issue is that you're already trying reimplement the functionality of a good backup program, namely the full/incremental paradigm. Many backup programs will produce a compressed incremental. This is perfect to rsync to your offsite. For those backup programs that don't also support encryption, use gpg or openssl or your encryption method of choice on the resultant file before you rsync it offsite. An encrypted filesystem is overkill.

I use flexbackup and DLT tape but the full / incremental files that I write to tape could easily be encrypted and rsynched offsite.

- John

----------

## tholin

Incremental backups in separate files are not practical for me. Each incremental backup would put more data into the offsite until I run out of space. Then I would have to clear the offsite and do a new full backup. I estimate that each incremental file would be at least 2G and I'm not willing to pay for the extra space and effort.

----------

## John R. Graham

Then, my friend, differential backups are your friend.   :Wink: 

- John

----------

## BitJam

I agree that compressed differential (incremental) backups are the answer.  I use the dar program for this.  It is solid and stable.  I'm very happy with it.  I do on-site backups to an external drive.  If I wanted off-site backups I would "backup my backups".   I already do this by making copies of my backups along with md5 checksums to a 2nd external drive.

I make a backup (full or differential).  Make an md5sum.  Verify the backup.  Copy the backup and md5sum to the 2nd backup device then verify the md5sum.   You could make an offsite backup of the dar backup files and md5sums (via rsync) rather than copy them to a 2nd device.

It seems like you want the full flexibility of rsync combined with compression of the archive.  AFAIK this does not exit.  I'm involved in making LiveCDs and LiveUSBs.   They (almost) all use a read-only compressed file system together with a union file system (typically aufs) which combines the read-only compressed fs with a read-write file system to give the appearance that the entire file system is read-write but only the *changes* are recorded on the read-write fs so it is small enough to fit in RAM.

There do exist read-write compressed file system such as fusecompress and compFUSEd.   I don't know if they are under active development.  I don't think they would solve your problem though (unless you got your offsite host to use them).

IMO, you have two good solutions available: (1) rsync, like you are doing now, and (2) compressed differential backups.  One way to go would be to use rsync locally and then make differential compressed backups of the rsync backup that you store off-site.

----------

## tholin

 *John R. Graham wrote:*   

> Then, my friend, differential backups are your friend.   

 

If you mean doing diffs of diffs of diffs, then no. If any single diff in the chain gets corrupt all later diffs are also corrupt. The most important thing is to guarantee the data integrity of the most resent backup and using this method is an unacceptable risk IMO.

Any backup system that continuously increase the size of the backup is unusable for me unless it can also intelligently keep track of the backup size and remove old backups when the quota starts runs out.

----------

## John R. Graham

No, that's not what differential backup means. You might want to do a little Googling or actually look at the features set of a well established backup program. One of the ways you can use a differential backup paradigm is to throw away all but the latest differential as all that is needed to do a complete restore is the full backup and the latest differential. As you might imagine, these problems aren't new. Neither are the solutions.

- John

----------

