# Filesystem replication. Which? [SOLVED]

## MageSlayer

Hi all

I'd like to ask the community for help.

My aim is to introduce some mechanism to keep a pile of large binary files (0.5-2Gb typical size) synchronized over the Internet.

Currently this job is done by rsync daemon + rsync clients constantly synchronizing.

The problems with current approach are:

- low sync speed. Rsync checks each and every file even though is not changed/renamed/moved.

- lacks proper metadata replication. I mean if file is moved/renamed, it starts to copy it once more. Renames/moves are quite frequent.

- poor scaling. Current total file size is ~550Gb and growing.

Additional requirements:

- stable, open-source  :Smile: 

- be able to survive network disconnects, etc.

- support under Gentoo and Ubuntu, I guess with open-source it's not a problem.

- ideally it should have two-way replication, but it's not a crucial requirement.

Thanks.Last edited by MageSlayer on Fri Sep 24, 2010 6:09 am; edited 2 times in total

----------

## John R. Graham

Sounds like you could use a version control system.  Have you checked out subversion or git?

- John

----------

## MageSlayer

 *John R. Graham wrote:*   

> Sounds like you could use a version control system.  Have you checked out subversion or git?
> 
> - John

 

Actually, I considered such thing, but the problems with vcs:

- they are solutions to developers (require using special tools for moving/renaming, not just posix file api)

- store history of changes (I don't need this as it more than doubles required space)

- do not deal well with binary files (I don't need merging).Last edited by MageSlayer on Mon Sep 20, 2010 8:03 pm; edited 1 time in total

----------

## John R. Graham

You just said you did need at least components of a  history of changes, namely, if a file is merely renamed or copied, recognize this fact and don't store redundant copies.  Every VCS I know of handles binary files.  Git does this particularly elegantly.

- John

----------

## MageSlayer

 *John R. Graham wrote:*   

> You just said you did need at least components of a  history of changes, namely, if a file is merely renamed or copied, recognize this fact and don't store redundant copies.  Every VCS I know of handles binary files.  Git does this particularly elegantly.
> 
> - John

 

Maybe I should have made my task clearer.

I added one extra bullet. It's size. 

I am not sure if it is possible for vcs to handle such volumes.

----------

## malern

I personally use rsync to sync large filesystem trees, I haven't found anything better yet. There's a few things you can do to get better performance out of it.

 *MageSlayer wrote:*   

> Rsync checks each and every file even though is not changed/renamed/moved

 

By default it only checks files when the modtime or size has changed, unless you've specified the --checksum option to force it to checksum every file each time (which will be very slow).

 *MageSlayer wrote:*   

> I mean if file is moved/renamed, it starts to copy it once more. Renames/moves are quite frequent.

 

rsync has a --fuzzy option which is quite good at detecting when files have been renamed.

Also, the --inplace option can result in better update speeds if you modify large files often, but check the man page for warnings.

----------

## MageSlayer

 *malern wrote:*   

> I personally use rsync to sync large filesystem trees, I haven't found anything better yet. There's a few things you can do to get better performance out of it.
> 
>  *MageSlayer wrote:*   Rsync checks each and every file even though is not changed/renamed/moved 
> 
> By default it only checks files when the modtime or size has changed, unless you've specified the --checksum option to force it to checksum every file each time (which will be very slow).
> ...

 

Yes. I found a problem. Now if client you --archive option everything is sync'd very fast.

Unfortunately fuzzy matching is still based lexical coincidence, so I doubt it will work efficiently in my case.

----------

