# Stale NFS File handle and other problems

## venquessa2

I've recently had NFS shares from my workstation fail.

I tried to back up my XP C:\ drive with partimage to a network share on my workstation.

Several times, when mounting the share I got

"mount: stale NFS file handle"

and the directory was not mounted.

Restarting /etc/init.d/nfs on the workstation allowed the share to mount, but several times again when ls'ing the directory on the laptop I would get "NFS Stale File Handle" or "Permission denied"

The output of "ls -l" showed the directory with all ?????'s instead of the details.

The partimage backup could get to write for about 10 minutes and then the NFS mount would die.

I've tried recompiling nfs-utils on both machiens and restarting all related services, but the problem persists.

Any ideas?

----------

## alex.blackbit

ouch, sounds bad.

i had these ???? before on a jfs filesystem that was not unmounted correctly (jfs is quite sensitive on that...).

maybe you need to do a fsck on that filesystem?

at least you can try.

----------

## kashani

What options are you using to mount NFS? I recommend the following which might be more robust if your network is flaky or other issues. 

```
rsize=8192,wsize=8192,hard,intr,rw,noatime
```

kashani

----------

## depontius

I got the very same thing, and it happened right after "emerge -atuvDN world" brought my server and x86 client to nfs-utils-1.0.10. I seem to lose my mounts after about 15-20 minutes. Two more points of interest...

1 -My amd64 system, which remained at nfs-utils-1.0.6 also lost its mounts, in about the same time.

2 - I solved the problem by going back to nfs-utils-1.0.6 on server and x86 client. As an oops, after downgrading the server, I just did "/etc/init.d/nfs restart". According to a remote query, it looks like some of the nfsv4 components were left running. But it works.

I'd really like to get to nfsv4, at some point. Since I think the problem is at the server, I'm thinking of taking my client back to nfs-utils-1.0.10, to make sure the combination works, and to be closer to ready.

Update:My x86 client is now at 1.0.10, and the mount has been solid all morning. The problems seem to be related to 1.0.10 at the server.

----------

## ProTech

I can confirm this. After downgrading my server to 1.0.6 everything works as before.

My client is amd64, so I always used the good version there (1.0.6).

----------

## depontius

Just for jollies, I've bumped my other client to nfs-utils-1.0.10 with ~amd64, so both clients are at 1.0.10 and the server at 1.0.6. The amd64 machine has been mounted for almost the magic 15-20 minutes.

Someone opened a ticket for this on bugzilla, so I've also added my information there.

----------

## XenoTerraCide

what's the ticket # (or linkage) I can't find it. I seem to be affected by this.

----------

## depontius

 *XenoTerraCide wrote:*   

> what's the ticket # (or linkage) I can't find it. I seem to be affected by this.

 

Ticket #168170

As far as I can tell, nfs-utils-1.0.12 seems to be working for everyone that was having problems with 1.0.10. It's working for me.

As an aside, I'm having no problems with 1.0.10 on the client side, only on the server. At present my production server is at 1.0.6 and my test server at 1.0.12. Both are working with 1.0.10 clients. After a bit more testing I plan to move the production server to 1.0.12. I also managed to get nfsv4 working on the test server, but I think I'll migrate the production server after it's successfully serving nfsv3 for a few days.

----------

## TJNII

I'm having the stale file handle problem after upgrading to 1.0.12.  It works for about 5 minutes, then goes belly up.  I'm going to go back to 1.0.6, unless someone has a fix.

I don't think the problem is the client (unless the new version doesn't like older versions, client is a old laptop).  I have to restart the server to remount the share, so I think my upgrade tonight broke it (surprise suprise).

[EDIT]

Dug around a little more and found the no_subtree_check option can affect this.  I'll give that a try and report back.

[/EDIT]

[EDIT 2]

Yea.  No, that didn't work.  I can restart the server and after a second everything just works again, no need to do anything on the clients.  It acts the same on two client machines.[/EDIT]

----------

## TJNII

What I found:

* If you restart the server after this, make sure to do a full stop and start.  /etc/init.d/nfs restart doesn't always cleanly restart the server and can be misleading.  Check for rpc processes still running after stop.

* rpc.gssd seemed to cause issues for me.  Recompiling without the Kerberos use flag helped.

----------

## RaceTM

Hey all,

I was having a similar problem which wasn't consistant so it was hard to troubleshoot. I have a media PC which is only turned on when it is in use.  I was having an issue where about half the times I booted up and tried to play something from a nfs share, the application would lock up within 20 minutes or so and upon investigating, all nfs shares would be disconnected.  I discovered that if I didn't touch anything, the shares would reconnect after 5 - 10 minutes.  very frustrating when you are trying to watch a movie or listen to some music.  Anyways, I figured I would do a quick search on the forum and I stumbled across this thread.  Since I don't have time to troubleshoot I just wanted a quick fix.  After reading this thread, I decided to downgrade both the server and client to nfs-utils 1.0.6, and that seems to have solved the issue.  If it reoccurs I will post here, but hopefully this information helps somebody else out as well.

RaceTM

----------

## RaceTM

...yeah nevermind the problem is still there

 :Very Happy: 

----------

## mariourk

nfs-utils 1.0.6 is no longer in portage   :Confused: 

Anyone has a fix for this?

Update

I recompiled nfs-utils on the client, but this time without the kerberos flag. On the server, the kerberos flag allready was disabled.

I also followed  TJNII's advise. After stopping NFS, there where still several rpc processes running. I killed them manually and started

NFS after that. I did this on the server and the client.

I'm not sure if it was disabeling the kerberos flag on the client, or manually killing the rpc-processes, before restarting NFS. Maybe it was both?

However, the problem seems to be solved now.   :Very Happy: 

Thanks for your suggestions, TJNII  :Very Happy: 

----------

## xonogenic

I apologize is this is a dumb answer, but I have run into these issues using disparate versions of NFS ie, nfs version3 on server and version 4 on client. For me, it was always solved by adding nfsvers=3 ( if the server is version 3) basically force it to communicate with the lowest common version. 

I don't know if this will help, I hope it does

----------

