# umount -fl fails to unmount stale NFS v4 share

## Bircoph

Hello,

I can't unmount NFS v.4 share with stalled connection to a server. Before you'll send me to read

man mount or man nfs, you need to know that I tried:

1) umount -f, umount -l, umount -fl with wait time up to several hours;

2) soft and intr mount options (though intr is useless for NFS ver.4 and is ignored);

3) a firewall (iptables, ebtables) was disabled on both server and client hosts.

In order to reproduce this problems one needs to imitate unexpected broken connection between

the client and the server (no clean shutdown of the server), assuming server has 10.0.0.1, client

has 10.0.0.2 ip:

1) Run NFSv4-capable server with the following exports:

/some/test/dir 10.0.0.2(rw,subtree_check,no_root_squash)

Export options shouldn't matter, but ip restriction does.

2) Mount share on the client. You may run some processes using files on this mount.

3) Change client ip from 10.0.0.2 to 10.0.0.3 (or change server ip).

4) Try to unmount  :Smile: 

What I got:

With hard mount option (default one) umount -f, -l or -lf stalls forever. umount process may be

killed with -9. Any process using files or dirs on staled mount point can be killed using -9. But

mount point itself can't be unmounted until server became reachable again.

With soft mount option all is the same, except for umount and process using files on staled share

doesn't hang forever, they report "Stale NFS file handle" error times retrans mount option and

exit with an error. However, staled mount point still cannot be unmounted:

subaru ~ # umount -f /mnt/cube

umount2: Stale NFS file handle

umount.nfs: /mnt/cube: Stale NFS file handle

umount2: Stale NFS file handle

umount.nfs: /mnt/cube: Stale NFS file handle

subaru ~ # umount -l /mnt/cube

umount.nfs: /mnt/cube: Stale NFS file handle

umount.nfs: /mnt/cube: Stale NFS file handle

Mount process exits itself, but share remains mounted.

I disabled firewalls on both hosts with no behaviour changes observed.

I tried different kernel (2.6.28-3.2.0) and nfs-utils (1.2.3-1.2.5) versions with no luck too.

I tried also to kill with SIGTERM and then SIGKILL all nfs-related process as described here,

because this helped me earlier with NFS stall under somewhat similar conditions:

http://kovyrin.net/2007/08/29/how-to-unmount-nfs-share-mounted-with-hard-option/

But this does nothing. And there are always two kernel processes that can't be killed even

with -9: rpciod and nfsiod. Also tcpdump shows that client host indefinitely tries to reach server

even with only this two process left.

I remembered one thing that helped me to found workaround: NFS v4 uses a lot of side-band

protocols unlike NFS v3, and these protocols may cause this problem. So I tried -o vers=3 and

now the same conditions umount -f works like a charm.

For our current needs we do not need ver.4 features and this solution is acceptable, but I want to

find solution for ver.4 for any future contingencies.

I googled a really lot for this issue and no acceptable solutions were found. One guy proposed on

this forums to add server ip to one of the client network interfaces, setup a fake nfs server and thus

umount a stalled share. This works, but is a very dirty hack and having even temporary two host

with the same ip on the same network may cause serious problem. Anyway this approach is not

acceptable in production environment. I am amused how many people over the world are fucking up

with stalled NFS shares and despite that this issue is still not fixed in the kernel.

----------

## audiodef

I used to have trouble shutting down the host machine or unmounting NFS shares if I turned off the shared machine before unmounting, but not lately. I didn't do anything to fix it, so I'm guessing something got updated in my regular monthly updates that fixed it for me. 

What's your emerge --info? I can take a look and compare it to mine and see if anything stands out.

----------

## Hu

What is the output of strace -tt umount -l /path/to/nfs/mount?

----------

## Bircoph

 *audiodef wrote:*   

> I used to have trouble shutting down the host machine or unmounting NFS shares if I turned off the shared machine before unmounting, but not lately. I didn't do anything to fix it, so I'm guessing something got updated in my regular monthly updates that fixed it for me. 
> 
> What's your emerge --info? I can take a look and compare it to mine and see if anything stands out.
> 
> 

 

I don't think this is the issue. We tried on multiple hosts, only two of them are Gentoo, some of them haven't been updated for a year, some other are just the latest in their distribution. And all of them has this issue.

The server is Debian GNU/Linux wheezy/sid, 3.2.0-2-amd64 kernel, userspace nfs utils are 1.2.5-4.

On Gentoo client is net-fs/nfs-utils-1.2.4[caps elibc_glibc ipv6 kerberos nfsv3 nfsv4 tcpd -nfsidmap -nfsv41].

emerge --info is available here: http://paste.pocoo.org/show/574061/

 *Hu wrote:*   

> 
> 
> What is the output of strace -tt umount -l /path/to/nfs/mount?
> 
> 

 

For the case with -o soft output from

strace -f -tt umount -l /path/to/nfs/mount

is available here: http://paste.pocoo.org/show/574066/

In short, umount system call fails:

umount("/mnt/cube", MNT_DETACH) = -1 ESTALE (Stale NFS file handle)

----------

## Hu

Soft is not as interesting as hard.  Could you provide the strace with a hung hard mount?

----------

## Bircoph

As you wish, but it is essentially the same: http://paste.pocoo.org/show/574369/

from /proc/mounts:

 *Quote:*   

> 
> 
> 10.4.41.57:/srv/hosts/test /mnt/cube\040(deleted) nfs4 rw,relatime,vers=4,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.4.41.89,minorversion=0,local_lock=none,addr=10.4.41.57 0 0
> 
> 

 

The only point I can't understand is why umount doesn't stall now, though hard option is used.

----------

## mva

Is it still no solution for that?  :Sad: 

----------

## Bircoph

There is no proper solution to my knowledge.

ATM we're using NFSv3. Also you may try to setup a fake NFS server.

----------

