# Why "Stale NFS file handle" error on ext3 partition?

## genterminl

I'm getting "Stale NFS file handle" errors when trying to mount a local file system.  I do have some nfs mounts, but they are fine, and the error persists even if I umount all of them.  I finally looked in dmesg, and found "EXT3-fs (sdc2): error: get root inode failed."  I thought I had successfully done mke2fs, but obviously not, and running it now, it's taking forever on the bad block check, so I'm pretty sure the problem is that the disk itself is actually dying if not already dead.

My question is why do I get an NFS error message, instead of something about a bad filesystem or  bad disk?  I've found a few similar posts, going back years, but no answers.

----------

## NeddySeagoon

genterminl,

On several unclean shutdowns when something has died, I've been left with a file that shows all ????????? is ls -l /dirname and It returns that error.

Its clearly nothing to do with NFS.

Get smartmontools and post the drives error log.

----------

## genterminl

Thanks for the response.  The drive is a DiamondMax, SMART capable but not enabled, and I've been unable to enable it.  I've even booted to SeaTools for DOS but it will not even complete a long test.  It fails the short test - but it doesn't give any actual error code or value I can find.  I'll try again to enable SMART to get a report, but as I said, I'm pretty sure the drive is dead.  It had been sitting on a shelf for quite a while, and I thought I'd just try again to see if I can use it for swap and temp space (PORTAGE_TMPFILE).  I'll report back after I try some more with smartctl.

Anyway, my real question is why does something that has nothing to do with NFS trigger an NFS error?  It's clearly confused a lot of people over the years.

----------

## genterminl

I finally did "mke2fs -j -L diamond_space -t ext3 -v /dev/sdc2" and it returned without errors.  I assume I will run into trouble as soon as it tries to use one of the bad sectors, which have not been marked in any way.  Regarding smartmontools,  most uses of smartctl return  *Quote:*   

> SMART support is: Available - device has SMART capability.
> 
> SMART support is: Disabled
> 
> SMART Disabled. Use option -s with argument 'on' to enable it.

  and "smartctl -s on -S on -o on /dev/sdbc" gives  *Quote:*   

> smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.1-gentoo-r2-01] (local build)
> 
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> === START OF ENABLE/DISABLE COMMANDS SECTION ===
> ...

 

dmesg is showing *Quote:*   

> [ 3255.178410] ata5.00: limiting speed to UDMA/100:PIO4
> 
> [ 3255.178416] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> 
> [ 3255.178422] ata5.00: failed command: WRITE DMA EXT
> ...

 However, lots of googling suggests trying a new cable before totally writing off the drive, but I probably won't get to try that until tomorrow.

----------

## NeddySeagoon

genterminl,

```
mke2fs -j -cc -L diamond_space -t ext3 -v /dev/sdc2
```

might be better. It will do a read write test on the partition before making the filesystem.

The write will force a remapping of any failed sectors, if the drive has any spares left.  The SMART data would tell us that.

Can you enable SMART in the BIOS ?

Providing the drive has spare sectors, normal writes will force sector remapping too, so you may not see any errors.

----------

## genterminl

NeddySeagoon,

SMART is enabled in the BIOS, but I'm pretty sure all that does is try to enable SMART on the drive at startup.  I have not seen any BIOS errors at boot.  Plenty of attempts to enable smart on the drive have all failed.

I did try -c the first time, but it was still going after well over an hour (160GB drive) so -cc will be even slower.  However, I suppose I can let it run overnight, or even if it takes days.  At the moment, I don't plan on putting any critical data on it.

However - one bit of good news is that after switching cables, I'm so far not seeing any more errors for the drive in dmesg, even with attempts to enable smart.  If the cable was indeed contributing to any of the errors, then mke2fs -c or even -cc might finish in a more reasonable amount of time, so I may yet try it again.

Regarding my original question - I did some grepping in the kernel source, and "Stale NFS file handle" is error ESTALE, and it seems to be used in the code for ext3 (and several other file systems) in places where it looks like nfs is not involved, apparently for certain inode problems.  In my mind, if it's not NFS, they should not use that error - but I suppose I'd have to discuss that with a kernel developer to find if it's a known issue or if it's just historic.

----------

## genterminl

quick update - mke2fs is still running.  The write part of badblocks took about an hour.  The read is now about 48 hours, and only at 87.83%.  I think it actually went pretty quickly to over 80% and then started hitting the bad sectors.  We'll see if it can at least avoid them - assuming the disk is stable and is not just waiting until I really need it before it completely fails.

----------

## genterminl

Final update - the last I noticed, mke2fs had only hit 88.7% after running several hundred hours.  Overnight, the whole system crashed.  I'm not certain, but I think there may have been problems with the swap partition on the questionable drive.  That and the number of errors from mke2fs, I think I'll just toss the drive.  It did seem all the bad sectors were in the same area, but without an easy way to isolate that area, and without being able to actually enable SMART on the drive, I figure it would be a losing battle anyway.

In terms of my initial question - ESTALE (stale NFS file handle) seems to be used in several file system modules in the kernel, even where nfs is not involved.  Eventually, I'll try to figure out why, or ask on a Kernel mailing list or IRC, but it's not a very high priority for me at this point.

----------

