# Unsynchronized NFS despite 'sync'

## jyoung

Hi Folks,

I'm having some trouble setting up synchronous read/writes on an NFS folder. The situation is as follows:

client1 reads a file in the NFS folder

client1 commits changes to the file

several seconds pass

client2 opens the file, and discovers that the contents differ from what client1 wrote; in some cases, the contents are what client1 received

The server is exporting the folder as such:

/export/cluster *(insecure,rw,sync,no_subtree_check,root_squash,insecure,all_squash,anonuid=1001,anongid=1001,no_wdelay)

And from the clients' /etc/fstab:

<server's IP>:/export/cluster   /cluster nfs  rw,hard,_netdev 0 0

I've also got an NTP daemon running to keep the clocks synced, and I've manually checked that both the clients and the server have the same time.

I've also coded program accessing the file top open and close the parent directory before and after opening the file --- I've read that that can flush an NFS cache.

I am at a loss as to why this would be an issue with this configuration; if any has any ideas, please let me know!

----------

## jyoung

Just an update, I've identified at least one case where the contents of the file that client2 receives matches the last time that client2 had access to the file, roughly 15 seconds before client1 accesses it.

----------

## mike155

 *Quote:*   

> And from the clients' /etc/fstab:
> 
> <server's IP>:/export/cluster /cluster nfs rw,hard,_netdev 0 0
> 
> 

 

Why don't you specify the 'sync' option?

 *man nfs wrote:*   

> The NFS client treats the sync mount option differently than some other file systems (refer to mount(8 ) for a description of the generic sync and async mount options).  If neither sync nor async is specified (or if the async option is specified), the NFS client delays sending application writes to the server until any of these events occur:
> 
> Memory pressure forces reclamation of system memory resources.
> 
> An application flushes file data explicitly with sync(2), msync(2), or fsync(3).
> ...

 

----------

## jyoung

I'm trying it with the sync option on the clients now; I'll report back shortly. However, I was already opening and closing the file before and after the critical data was read and written to trigger a flush.

----------

## jyoung

In my earlier test, there were some cases where multiple clients would read the file and find results consistent with the first client's write, and then the second client would find results that were consistent with it's own last write, but not the most recent write. That seems like client-side caching, as if the second client isn't bothering to get an updated version of the file. Is that possible?

----------

## mike155

 *Quote:*   

> then the second client would find results that were consistent with it's own last write, but not the most recent write

 

I would expect exactly this if the NFS share was mounted on the second client without the 'sync' option.

----------

## jyoung

The code is running with the sync option on all clients, but it's glacially slow. This may not be practical. With async in the clients' fstab, is there any way to trigger a client-side flush before a read, the way close() triggers a flush after a write?

----------

## mike155

 *Quote:*   

> is there any way to trigger a client-side flush before a read, the way close() triggers a flush after a write?

 

Sure! Please read the snippet from the man page I posted above.  :Smile: 

----------

## jyoung

Okay, I think something might be going over my head. The snippet *seems* to just refer cases where the client is writing to the server, and strategies to ensure that the client's write is complete. My situation seems to be a case where one client's write is success and complete, but a second client doesn't pick up the new data.

----------

## mike155

I'm sorry. I probably misunderstood your question. So you are talking about

 *Quote:*   

> then the second client would find results that were consistent with it's own last write, but not the most recent write. 

 

So you want to make the second client look for newer data on the server although it has not transferred data of the last write to the server?

I don't think that this is possible. You could try to use record locking, but then again: it will be slow.

It seems that NFS is not the right technology to solve your problem.

----------

## jyoung

Not quite. I *think* that the first client successfully transfers its data to the server, but the second client is picking up old data anyway. The reason I think that the first client's write is successful is that other clients are able to read the first client's write without issue.

----------

## jyoung

I think I have a solution. Previously, I was using a custom file locking mechanism using link/unlink to lock a file, and open/close to flush the NFS cache. I encapsulated my code in fcntl calls to lock and unlock the file, and there's no evidence of inconsistencies after a fairly rigorous test. This works even without 'sync' in the clients' fstab (but with 'sync' in the server's /etc/exports). In the next day or so I'll deploy this solution on a larger scale and report back.

This is very strange. The link/unlink + open/close scheme should have worked, but didn't even with 'sync' in the clients' fstab. Also, the data in the logs strongly implies that the custom file locks were being honored, since no client reported access to the file that overlapped in time with another client's access. It seems like fcntl does a much more rigorous flush than open/close, although that shouldn't have mattered with 'sync' in the clients' fstab.

----------

## Hu

As I read your earlier reports, the observed results are perfectly reasonable.  With your ad-hoc locking scheme, the second client had no reason to know that its locally cached data was stale, so it had no reason to reread the data from the server.  The sync mount option guarantees timely writes to the server, but is not documented to guarantee no client-side caching of previously read data.  If I were to speculate, although the documentation is silent on this point, it also seems reasonable that acquiring a read lock on the file might encourage the kernel to revalidate with the server.

----------

## jyoung

I think we can marked this thread as SOLVED, but I wanted to leave a few notes and solicit any opinions. Thanks for the comment, Hu, that explains why the custom locks with link/unlink were giving me that issue.

When I first reported back a successful test with fcntl, I'd just added fcntl calls to my code, without removing the custom locks. When I deployed the code on a larger scale, I removed the custom locks. The rate of data overwrites became far worse. So, the custom locks seem to have ensured file ownership successfully, but weren't flushing the client-side cache. fcntl ensured the cache was flushed, but weren't ensuring ownership.

This seemed strange since this is what fcntl is meant for, but then I found this article:

www.0pointer.de/blog/projects/locking.html

"...POSIX locks are automatically released if a process calls close() on any (!) of its open file descriptors for that file." Between locking and unlocking the file I'm opening it using a library (cfitsio, if anyone's interested). The library opens it by name; I can't just pass it a file descriptor. Which means that when I subsequently call the library function to close the file, it's internally calling close() on a file descriptor pointing the same file, invalidating the lock created by fcntl.

I just changed the code to call fcntl after the file is opened by the library, and then again to unlock the file before the library closes it. That works; no signs of overwrites. But, there's not guarantee that a library won't internally open and close the file. In fact, there are some cfitsio functions which seem to do exactly that. I'm not totally sure what a portable solution is. And it seems like this isn't that weird of a situation -- the need for applications to have exclusive access to a file while functions open and close them.

----------

## Hu

As I read the documentation, flock may behave a bit better in the presence of file closure - but the documentation also suggests that it interacts poorly with NFS, which is a major requirement for you.  I think it is a bit of an odd use case to say you want to repeatedly open and close a file, but retain a lock on it the whole time.  I disagree with the decision to make fcntl drop locks like that, but it is much too late to fix that now.

The most portable solution in my opinion would be to fix the library to accept a prepared file descriptor that your code manages, and get it out of opening and closing the file other than as convenience wrappers for simple programs that don't need to keep the file open.

----------

