# HOWTO: The poor man's differential backup

## VinzC

Hi.

Just wanted to share this. As I sought the Internet for backup solutions, especially about differential backups, I found lots of scripts and tools all over the place.

The context

I have a big drive that I must backup every day. No complicated solution is involved, just the good old tar command, it does wonders. Until a few days ago I was doing only full backups. Each archive file is named after the date of the backup, allowing me to go back in time for a certain amount of days. But the storage space on the backup drive is fixed and as the full backups grew in size I needed to decrease the history length. I ended up keeping only a few days and I wanted more. Another issue is the backup process takes so much time that it still runs during work hours.

Logically, I had to create full backups less frequently, outside work hours or during week-ends. Every other day, just do either differential or incremental backups.

A little history.

Differential backups are easier to manage than incremental: only restore the latest full backup then the latest differential backup. With incrementals, you need to restore *all* of them in sequence after the last full.

Typically differentials and incrementals do rely on the fact that a file was changed after the last backup. That's the purpose of the archive bit on Windows. Only full and incremental backups do reset this bit. The catch is — someone prove me wrong there — there's no such bit in GNU/Linux filesystems. But that's no big deal, really.

So the question is how do I create a differential backup?

There are scripts. There are tools. I want neither  :Very Happy:  .

The [easy] solution

Of course, tar cannot create differential backups. But it can restore and skip newer files (tar -xp --keep-newer-files)! It's all we want, isn't it? Say we want to do full backups every week. All we need to do is find all files that have been changed since the last full backup. Simply put: we just need to find the files that have changed in the last 7 days. The restore process will take care not to overwrite the newer files, i.e. files in the full backup that are newer than those in the differential. Better more than not enough files in the archive, right?

So, in this scenario, here's the principle:

```
tar -cjvf full.tar.bz2
```

```
find -ctime -7 -type f | tar -czvf differential.gz -T -  ...
```

Use find to select only files that were changed during the last 7 days, pipe the list through tar -czv -T - and you're done.

On the server machine I run this script, it takes several hours for a full but only minutes for a differential and the result takes gigs for a full while only a few megs for differentials. A simple cron will do: when date +%w equals 6 it is Saturday, time for a full backup otherwise it's a differential.

```
if [ $(date +%w) -eq 6 ]; then

   # Full backup

else

   # Differential backup, 7 days behind

fi
```

If I wanted full backups once a month, say the first of each month, I'd have tested

```
if [ $(date +%d) -eq 1 ]; then

   # Full backup

else

   # Differential backup, 31 days behind

fi
```

which would allow for an even longer period back in time for the same storage space.

Tips & tricks

I named all my backups after the date the script runs. Full backups are named $(date +%F).tar.bz2 and differentials are named $(date +%F).d.tar.gz . A full is bigger (noooo?) so I used bzip2. Diff's are much smaller so gzip is enough.

```
...

-rw-r--r-- 1 root root 43596099590 mar 2 12:31 2011-03-02.tar.bz2

-rw-r--r-- 1 root root 43636684415 mar 3 12:33 2011-03-03.tar.bz2

-rw-r--r-- 1 root root 43636684415 mar 4 12:21 2011-03-04.tar.bz2

-rw-r--r-- 1 root root     4599518 mar 5  3:12 2011-03-05.d.tar.gz

-rw-r--r-- 1 root root 43636684415 mar 5 12:21 2011-03-05.tar.bz2
```

Files are sorted in natural order, you can immediately spot which of these is a full and which is not.

Simple. Neat. Standard.

Enjoy!

----------

## x22

 *VinzC wrote:*   

> 
> 
> Of course, tar cannot create differential backups. 
> 
> 

 

It can, using the -g option (5.2 Using tar to Perform Incremental Dumps  and 5.3 Levels of Backups )

----------

## VinzC

 *VinzC wrote:*   

> Of course, tar cannot create differential backups. 

 

 *x22 wrote:*   

> It can, using the -g option (5.2 Using tar to Perform Incremental Dumps  and 5.3 Levels of Backups )

 

Indeed tar handles incremental backups, differential not. The difference is more practical rather than technical in that you need to restore *every* incremental in sequence since the last full backup, like 32 restores at most with a monthly full backup. With a differential, you need only two restores at most.

Here's an example:Full backup

Change file A

Backup 1

Change file B

Backup 2With an incremental, backup 1 would copy file A and backup 2 only file B. With a differential, backup 1 would still copy file A but backup B would save both files A and B. So if you want to restore, you need to restore the full, plus *all* of the subsequent incremental archives. You only need the *last* differential though.

----------

## Sven Vermeulen

I use rsync with the --link-dest option. It allows you to do rsync's, but when files are already available at the link-dest location, it uses a hardlink rather than a copy, saving you the space of the files that aren't modified (although even hardlinks have file system impact).

----------

## x22

 *VinzC wrote:*   

> 
> 
> Indeed tar handles incremental backups, differential not. 

 

It can be used for differential backups, too. It requires careful handling of the extra snapshot file which tar uses with -g option:

 *GNU tar manual wrote:*   

> Notice that ‘/var/log/usr.snar’ will be updated with the new data, so if you plan to create more ‘level 1’ backups, it is necessary to create a working copy of the snapshot file before running tar.

 

5.3 Level of Backups describes the same strategy as in your original post: 

 *GNU tar manual wrote:*   

> 
> 
> A typical dump strategy would be to perform a full dump once a week, and a level one dump once a day. This means some versions of files will in fact be archived more than once, but this dump strategy makes it possible to restore a file system to within one day of accuracy by only extracting two archives—the last weekly (full) dump and the last daily (level one) dump. The only information lost would be in files changed or created since the last daily backup. (Doing dumps more than once a day is usually not worth the trouble.) 
> 
> 

 

----------

## John R. Graham

There's a neat Perl script that drives the traditional *nix archiving tools (find, tar, cpio, and their brethren) called flexbackup. It provides a management layer on top of those tools that creates full, incremental, or differential backups and supports a strong regular expression based exclusion mechanism.  *Flexbackup Home Page wrote:*   

> flexbackup is for you if you have a single or small number of machines, amanda is "too much", and tarring things up by hand isn't nearly enough... 

 It's in Portage: app-backup/flexbackup. Looks like it does exactly what you're doing plus handles a lot of the administrative tasks. Recommended.   :Smile: 

- John

----------

## depontius

 *VinzC wrote:*   

> 
> 
> ```
> find -ctime -7 -type f | tar -czvf differential.gz -T -  ...
> ```
> ...

 

I'd take another look at this, and think if you want to use "-ctime" or "-mtime".  Most likely "-ctime "works pretty well because most applications don't generally update in-place - they manipulate the data in a newly-named file, then swap that for the original file.  That changes the "file status", tripping the ctime for the file.  If some application were to change the data in-place the ctime would not be updated, only the mtime would.  I'm pretty sure that any ctime update also updates the mtime.

----------

## XQYZ

 *Sven Vermeulen wrote:*   

> I use rsync with the --link-dest option. It allows you to do rsync's, but when files are already available at the link-dest location, it uses a hardlink rather than a copy, saving you the space of the files that aren't modified (although even hardlinks have file system impact).

 

Same. I've actually blown this out of proportion by making it into a apple-time-machine-like backup solution last year: http://dump.domindthegap.co.uk/backup/ (backup is called via hourly cron, bjanitor is just a python script which cleans out old backups - manually so far). If only I ever found the time to finish it properly. Still missing a couple of features I'd like (not to mention the horrible python script - my first with more than 50 lines back in the days).

And yeah, hardlinks have quite an impact: Like 20 MB on my home directory   :Twisted Evil:  . But what's 20Mb nowadays with 2 TB drives for way under 100 euro/dollar.

----------

## SlashBeast

you guys should check rsnapshot and rdiff-backup.

----------

## VinzC

 *John R. Graham wrote:*   

> There's a neat Perl script that drives the traditional *nix archiving tools (find, tar, cpio, and their brethren) called flexbackup. It provides a management layer on top of those tools that creates full, incremental, or differential backups and supports a strong regular expression based exclusion mechanism.  *Flexbackup Home Page wrote:*   flexbackup is for you if you have a single or small number of machines, amanda is "too much", and tarring things up by hand isn't nearly enough...  It's in Portage: app-backup/flexbackup. Looks like it does exactly what you're doing plus handles a lot of the administrative tasks. Recommended.  
> 
> - John

 

Thank you very much, John. Will look at that.

 *x22 wrote:*   

> 5.3 Level of Backups describes the same strategy as in your original post: 
> 
> 

 

 *GNU tar manual wrote:*   

> A typical dump strategy would be to perform a full dump once a week, and a level one dump once a day. This means some versions of files will in fact be archived more than once, but this dump strategy makes it possible to restore a file system to within one day of accuracy by only extracting two archives—the last weekly (full) dump and the last daily (level one) dump. The only information lost would be in files changed or created since the last daily backup. (Doing dumps more than once a day is usually not worth the trouble.) 
> 
> 

 

Thanks for clarifying. I hadn't understood it that way. The one thing that bothers me with that solution is you need to keep a trace file permanently, if I guessed it right. I preferred using no extra file, log or trace (well, of course, except the backup log that gets sent by email). That's where find comes handy.

 *SlashBeast wrote:*   

> you guys should check rsnapshot and rdiff-backup.

 

Of course. But the main thing is I wanted no script, no tool, just tar [and find]. And most of all, a tar archive has the main advantage of being portable and you may copy your archive to any destination without losing security nor anything. You may of course tar a directory created by rdiffbackup but it's just one more [time and resource consuming] step.

 *VinzC wrote:*   

> 
> 
> ```
> find -ctime -7 -type f | tar -czvf differential.gz -T -  ...
> ```
> ...

 

 *depontius wrote:*   

> I'd take another look at this, and think if you want to use "-ctime" or "-mtime".  Most likely "-ctime "works pretty well because most applications don't generally update in-place - they manipulate the data in a newly-named file, then swap that for the original file.  That changes the "file status", tripping the ctime for the file.  If some application were to change the data in-place the ctime would not be updated, only the mtime would.  I'm pretty sure that any ctime update also updates the mtime.

 

Thank you very much for the hint! Indeed I didn't spot that. The server on which the script runs is a Samba server. I have just tested the difference between both and it looks like find -mtime returns less results than with -ctime. I suppose I can combine both?

----------

