# Rsync : synchronizes the entire file

## kortex-

Hello,

I have a problem with rsync...

I have an AutoFS mountpoint (nfs protocol) which is created in /mnt/backup.

I also have a local directory "/home/backup".

I run rsync every day to sync the contents of "/mnt/backup" in "/home/backup".

Command is : /usr/bin/sudo -u nobody /usr/bin/rsync -av  --stats /mnt/backup/* /home/backup/.

I have a lot of large binary files (1TB).

The problem is that rsync synchronizes all content and not the difference.

I tried several settings found on the internet but every time I have the same worries.

I read that to use "--no-whole-file" and / or "--inplace" but with rsync parameters take a long time to calculate the checksum for large files ...

Does anyone have a solution to transfer the binary difference quickly ?

Thank you in advance.

----------

## eccerr0r

Don't rsync over nfs.

The problem is that when you do rsync over nfs, it has to grab the contents over the network in order to even tell what the differences are.  So ideally you have something like bittorrent that does block hashes and transfers individual blocks that changed...

... which is something that rsync can also do, but only when using a rsync server or over ssh/rsh.  What you want is a fast checksum compute which you want to do of as little as possible over the network.

----------

## kortex-

Thank you for your answer.

But there are two things I do not understand :

   - why on smaller files rsync does not sync the entire file every time ?

   - what's the difference between NFS and SSH ? In both cases it's on network

----------

## Atom2

 *kortex- wrote:*   

>    - what's the difference between NFS and SSH ? In both cases it's on network

 

You are right that it both is over a network, but the difference is that for an NFS mounted filesystem all data from the NFS server needs to be transfered to the local rsync process running on the NFS client.

In contract to this the ssh solution starts a remote rsync process on the NFS server (which is thus running locally on the NFS server) which then is able to read data directly from its attached disks (i.e. on the NFS server) and not through a network connection. In this case only the two rsync processes (i.e. the one on the client and the one on the server) communicate with each other and there is no need to transfer the complete file.

Ths ssh solution is conceptually identical to having an on-demand rsync server on the NFS server (i.e. one that does not constantly listen for incoming connections but is only started on demand through ssh's remote command execution on the target system).

I hope that helps Atom2

----------

## kortex-

Hello,

I tried RSYNC over SSH but I have the same problem : the whole file is transferred.

I tried two commands :

/usr/bin/sudo -u nobody /usr/bin/rsync -av --stats -e "ssh -o StrictHostKeyChecking=no -i /rsync/id_rsa" user@source:/home/backup/* /home/backup/.

/usr/bin/sudo -u nobody /usr/bin/rsync -av --stats --no-whole-file -e "ssh -o StrictHostKeyChecking=no -i /rsync/id_rsa" user@source:/home/backup/* /home/backup/.

I will try this command tonight :

/usr/bin/sudo -u nobody /usr/bin/rsync -av --stats --no-whole-file --inplace --checksum -e "ssh -o StrictHostKeyChecking=no -i /rsync/id_rsa" user@source:/home/backup/* /home/backup/.

----------

## eccerr0r

I think it's block based and not a true diff.  If you delete one byte from the beginning of the file, the checksum of all blocks will now fail and the whole file gets transferred.

Not sure what the nature of the changes you have...

----------

## kortex-

The files are MongoDB's database (binary files).

We make no suppression and no update; just inserts.

----------

## kortex-

Hello,

The problem is the same with these arguments: the whole file is transferred.

If someone has an idea I'm interested  :Sad: 

----------

## kortex-

There is nobody who has a solution?

Thank you in advance.

----------

## WWWW

A few things I don't understand.

Do you have large binaries of 1TB each? Or large binaries that the total of them amount to 1TB?

I think rsync falls short for incremental syncs. Some filesystems get around this problem though.

----------

## szatox

 *Quote:*   

> 
> 
> The files are MongoDB's database (binary files).
> 
> We make no suppression and no update; just inserts.

 

Do those inserts move following data?

I mean, if you append your file to some kind of header (insert header on position "0"), all the data inside is pushed forward by the size of header you prepended, right?

So, perhaps rsync doesn't recognize the file anymore because the data has been moved inside the file, so it doesn't match block end position anymore?

Does the same issu ocur when you append data to file?

 *Quote:*   

>  something that rsync can also do, but only when using a rsync server or over ssh/rsh.

 

Actually you can also use rsync over rsync. It requires you to setup rsync daemon though, so the process you start will be able to connect.

And one thing more, if you want to call something bakcup, you usually want to keep several versions. You can run rsync from location A against reference location B (e.g. your last backup) to let it create new target C. Rsync should then copy stuff new in A to C and create hard links from C to B for files that hasn't changed since last run.

If you don't want to keep several versions, perhaps you just want to have a mirror?

----------

## EmaRsk

I don't know if this can be useful, but rsync's man page says:

 *man rsync wrote:*   

>  -B, --block-size=SIZE       force a fixed checksum block-size

 

----------

