# Event driven rsync?

## jsosic

Hi!

I have one web server farm with frontend and lots of nodes in the cluster. The problem I have is - when the developers and web admins upload new web content to the frontend, I need to sync all the nodes. Running rsync over ssh in cron is not acceptable solution, because of the potential lag (developers waiting half an hour to see the real changes) or the load (rsync every 5 minutes).

So, the idea is, to rsync after end of each FTP session. Things get uploaded by FTP, so when the ftp connection terminates, rsync pops in.

Another idea is FAM (File Alteration Monitor), but there are no tools as I am aware that watch for FAM notifications and react upon them.

The problem sounds interesting, so I would appreciate all comments.

----------

## krinn

load for an rsync? If there's no changes, there will be no charge on the server...

and the FAM idea is a bad one, if someone upload 100 files to the webserver, all computers will try to rsync 100 times, as every files upload will trigger a change.

a 5 minutes rsync will gave you 10 minutes lag max.

----------

## jsosic

I would still prefer to rsync after an FTP session and not in cron every 5 minutes. ...

There's another problem - for example... If web admin uploads 10 files, and rsync starts when 5 files are up, and another 5 are still uploading, nodes will have inconsistent web pages... Site is pretty complex, and this kind of updates have to be performed like a atomic operations - eg. all or nothing  :Sad: 

So, I guess I'll have to script something that monitors FTP log and reacts upon end of transaction......

----------

## nOw2

Without knowing your specifics, it sounds like you should be looking at shared storage.

Replicating a single image needs a control mechanism and deployment plans, and atomicity can be a big problem as you note. It sounds like you use adhoc changes which means either changing the way your developers work or centralising where the code is - both of which can be difficult  :Smile: 

Of course, while solving one problem this produces some interesting problems regarding scalability and reliability.

----------

## krinn

 *jsosic wrote:*   

> 
> 
> There's another problem - for example... If web admin uploads 10 files, and rsync starts when 5 files are up, and another 5 are still uploading

 

yep, that's why i said 10 minutes lag.

Still your ftp session will not be valid. If I upload anything to the webserver, on disconnect, i will rsync with an already sync webserver, and others computers that need to be sync, will simply do nothing, as they didn't do any ftp session  :Very Happy: 

you can then try to push the sync from the webserver (sending order via ssh to others computers when webserver ftp session close), but disconnected/offline ones will not get the sync order.

----------

## cgill27

Look into inotify, its in the kernel and is good for monitoring file/directory changes.

There's an inotify tool I use called inotifywait, which I have setup in a script, so when a certain file changes the script automatically does its job.

http://inotify-tools.sourceforge.net/

----------

## kiss-o-matic

 *Quote:*   

> I would still prefer to rsync after an FTP session and not in cron every 5 minutes. ...

 

How does your trigger know when a specific event is done?  What if someone wants to change/make 100 files.  Should it trigger after the first file?  A user uploads 1 file, rsync starts... when the rsync is done, it will check again, and it's going to have to do (at least) one more sync.  I don't think rsync is going to be your answer in such a case.

EDIT: Dang, didn't read Krinn's post.  Well, I'll leave it here to illustrate the issue.  :Smile: 

----------

## krinn

yeah, in my case i would solve it like that:

- on ftp session close, webserver ssh XX to query an rsync (calling XX the slave computer that will do rsync)

- all computer rsync cron every minute with a XX

this way, when the webserver get updates, it ask XX to rsync with him and all computer rsync with XX every minutes, giving you a 2 minutes lag to sync them all.

Webserver will be charge on ftp disconnect

XX will be charge every minute by all computers (witch is still small, and specially as XX has nothing else to do).

... As you said you have plenty computers to handle, you can certainly grab one to assume the XX task.

----------

## jsosic

 *nOw2 wrote:*   

> Without knowing your specifics, it sounds like you should be looking at shared storage.
> 
> Replicating a single image needs a control mechanism and deployment plans, and atomicity can be a big problem as you note. It sounds like you use adhoc changes which means either changing the way your developers work or centralising where the code is - both of which can be difficult 
> 
> Of course, while solving one problem this produces some interesting problems regarding scalability and reliability.

 

Well, the problem are requests of the writers of the application... They expect too much, and application income can't justify it. Because if it could, of course there would already be storage in place.

Another solution is GFS over GNBD, but I am not willing to deploy that solution on a production system, with no margin for errors - e.g. no affordable downtime... And to maintain that complex solution is another problem....

So, I solved it by distributing the code from load balancer to the nodes via rysnc over ssh. Script runs in cron every 10 minutes, and if it sees ftp session open, it aborts. We'll see how is this functioning in a week.

----------

