# Directories disappearing with NFS

## Ivion

I've recently noticed that occasionally some directories will become inaccessible/disappear on a NFS share. They will appear with question marks in the directory listing, like:

```
d?????????   ? ?    ?       ?            ? Break_Blade_[gg]
```

and when trying to access it with ls it results in:

```
$ ls Break_Blade_\[gg\]

/bin/ls: cannot access Break_Blade_[gg]: No such file or directory
```

Turning on debugging for nfs/nfsd for client/server resulted in:

Client:

```
[39506.737763] decode_attr_type: type=00

[39506.737766] decode_attr_change: change attribute=5935364140171929262

[39506.737768] decode_attr_size: file size=103

[39506.737769] decode_attr_fsid: fsid=(0x0/0x0)

[39506.737770] decode_attr_fileid: fileid=0

[39506.737771] decode_attr_fs_locations: fs_locations done, error = 0

[39506.737772] decode_attr_mode: file mode=00

[39506.737773] decode_attr_nlink: nlink=1

[39506.737774] decode_attr_owner: uid=-2

[39506.737775] decode_attr_group: gid=-2

[39506.737776] decode_attr_rdev: rdev=(0x0:0x0)

[39506.737777] decode_attr_space_used: space used=0

[39506.737778] decode_attr_time_access: atime=0

[39506.737779] decode_attr_time_metadata: ctime=1381934653

[39506.737780] decode_attr_time_modify: mtime=1381934653

[39506.737781] decode_attr_mounted_on_fileid: fileid=0

[39506.737782] decode_getfattr_attrs: xdr returned 0

[39506.737783] decode_getfattr_generic: xdr returned 0

[39506.737787] NFS: nfs_update_inode(0:25/128 fh_crc=0x798631d2 ct=2 info=0x26040)

[39506.737790] NFS: permission(0:25/128), mask=0x1, res=0

[39506.737793] NFS: nfs_lookup_revalidate(//Downloads) is valid

[39506.738173] decode_attr_type: type=00

[39506.738175] decode_attr_change: change attribute=5935474156057188396

[39506.738177] decode_attr_size: file size=8192

[39506.738179] decode_attr_fsid: fsid=(0x0/0x0)

[39506.738180] decode_attr_fileid: fileid=0

[39506.738182] decode_attr_fs_locations: fs_locations done, error = 0

[39506.738184] decode_attr_mode: file mode=00

[39506.738185] decode_attr_nlink: nlink=1

[39506.738187] decode_attr_owner: uid=-2

[39506.738188] decode_attr_group: gid=-2

[39506.738190] decode_attr_rdev: rdev=(0x0:0x0)

[39506.738192] decode_attr_space_used: space used=0

[39506.738194] decode_attr_time_access: atime=0

[39506.738195] decode_attr_time_metadata: ctime=1381960268

[39506.738197] decode_attr_time_modify: mtime=1381960268

[39506.738199] decode_attr_mounted_on_fileid: fileid=0

[39506.738200] decode_getfattr_attrs: xdr returned 0

[39506.738202] decode_getfattr_generic: xdr returned 0

[39506.738214] NFS: nfs_update_inode(0:25/2684800598 fh_crc=0xae248982 ct=1 info=0x26040)

[39506.738216] NFS: permission(0:25/2684800598), mask=0x1, res=0

[39506.738217] NFS: revalidating (0:25/2684800598)

[39506.738579] decode_attr_type: type=040000

[39506.738581] decode_attr_change: change attribute=5935474156057188396

[39506.738583] decode_attr_size: file size=8192

[39506.738585] decode_attr_fsid: fsid=(0xe44fd5cc49e34399/0x855b09ad20b7fca8)

[39506.738587] decode_attr_fileid: fileid=2684800598

[39506.738588] decode_attr_fs_locations: fs_locations done, error = 0

[39506.738590] decode_attr_mode: file mode=0775

[39506.738592] decode_attr_nlink: nlink=67

[39506.738595] decode_attr_owner: uid=0

[39506.738597] decode_attr_group: gid=408

[39506.738599] decode_attr_rdev: rdev=(0x0:0x0)

[39506.738601] decode_attr_space_used: space used=12288

[39506.738602] decode_attr_time_access: atime=1340493263

[39506.738604] decode_attr_time_metadata: ctime=1381960268

[39506.738606] decode_attr_time_modify: mtime=1381960268

[39506.738607] decode_attr_mounted_on_fileid: fileid=0

[39506.738609] decode_getfattr_attrs: xdr returned 0

[39506.738611] decode_getfattr_generic: xdr returned 0

[39506.738615] NFS: nfs_update_inode(0:25/2684800598 fh_crc=0xae248982 ct=1 info=0x27e7f)

[39506.738618] NFS: (0:25/2684800598) revalidation complete

[39506.738620] NFS: nfs_lookup_revalidate(Downloads/Break_Blade_[gg]) is valid

[39506.738623] NFS: revalidating (0:25/5929111053)

[39506.739191] nfs_revalidate_inode: (0:25/5929111053) getattr failed, error=-2
```

Server:

```
[39148.132535] nfsd_dispatch: vers 4 proc 1

[39148.132549] nfsv4 compound op #1/1: 30 (OP_RENEW)

[39148.132556] process_renew(525ed57d/00000006): starting

[39148.132563] renewing client (clientid 525ed57d/00000006)

[39148.132572] nfsv4 compound op dd029060 opcnt 1 #1: 30: status 0

[39148.132575] nfsv4 compound returned 0

[39149.123096] nfsd_dispatch: vers 4 proc 1

[39149.123112] nfsv4 compound op #1/3: 22 (OP_PUTFH)

[39149.123125] nfsd: fh_verify(20: 00060001 ccd54fe4 9943e349 ad095b85 a8fcb720 00000000)

[39149.123155] nfsv4 compound op dd029060 opcnt 3 #1: 22: status 0

[39149.123160] nfsv4 compound op #2/3: 3 (OP_ACCESS)

[39149.123170] nfsd: fh_verify(20: 00060001 ccd54fe4 9943e349 ad095b85 a8fcb720 00000000)

[39149.123183] nfsv4 compound op dd029060 opcnt 3 #2: 3: status 0

[39149.123187] nfsv4 compound op #3/3: 9 (OP_GETATTR)

[39149.123196] nfsd: fh_verify(20: 00060001 ccd54fe4 9943e349 ad095b85 a8fcb720 00000000)

[39149.123213] nfsv4 compound op dd029060 opcnt 3 #3: 9: status 0

[39149.123217] nfsv4 compound returned 0

[39149.123620] nfsd_dispatch: vers 4 proc 1

[39149.123627] nfsv4 compound op #1/3: 22 (OP_PUTFH)

[39149.123637] nfsd: fh_verify(32: 81060001 03090000 00000000 00000000 00000000 a006ce56)

[39149.123661] nfsv4 compound op dd029060 opcnt 3 #1: 22: status 0

[39149.123665] nfsv4 compound op #2/3: 3 (OP_ACCESS)

[39149.123674] nfsd: fh_verify(32: 81060001 03090000 00000000 00000000 00000000 a006ce56)

[39149.123684] nfsv4 compound op dd029060 opcnt 3 #2: 3: status 0

[39149.123688] nfsv4 compound op #3/3: 9 (OP_GETATTR)

[39149.123697] nfsd: fh_verify(32: 81060001 03090000 00000000 00000000 00000000 a006ce56)

[39149.123708] nfsv4 compound op dd029060 opcnt 3 #3: 9: status 0

[39149.123711] nfsv4 compound returned 0

[39149.124052] nfsd_dispatch: vers 4 proc 1

[39149.124059] nfsv4 compound op #1/2: 22 (OP_PUTFH)

[39149.124068] nfsd: fh_verify(32: 81060001 03090000 00000000 00000000 00000000 a006ce56)

[39149.124082] nfsv4 compound op dd029060 opcnt 2 #1: 22: status 0

[39149.124086] nfsv4 compound op #2/2: 9 (OP_GETATTR)

[39149.124095] nfsd: fh_verify(32: 81060001 03090000 00000000 00000000 00000000 a006ce56)

[39149.124110] nfsv4 compound op dd029060 opcnt 2 #2: 9: status 0

[39149.124113] nfsv4 compound returned 0

[39149.124452] nfsd_dispatch: vers 4 proc 1

[39149.124458] nfsv4 compound op #1/2: 22 (OP_PUTFH)

[39149.124468] nfsd: fh_verify(32: 81060001 03090000 00000000 00000000 00000000 61670e0d)

[39149.124816] nfsv4 compound op dd029060 opcnt 2 #1: 22: status 2

[39149.124819] nfsv4 compound returned 2
```

Both which seem to indicate that the directory cannot be found (error 2/-2). But I know it exists, because when I remotely log in to the server I can access the directory just fine.

Related to that, and to make matters more confusing, as soon as I check (remotely logging in with ssh and using ls) the offending directory it starts working again on the NFS share. I can also get that effect if I, on the client, disregard the error and try to ls the contents of the offending directory, like so:

```
ls Break_Blade_\[gg\]/*
```

at which point it just works and also fixes the directory listing (which includes other directories misbehaving).

Other, perhaps pertinent information - my exports:

```
/export      10.0.0.0/255.255.255.0(fsid=0,no_subtree_check,ro)

/export/Data   10.0.0.0/255.255.255.0(no_subtree_check,ro)
```

and how it's mounted on the client:

```
10.0.0.1:/Data on /Davidowitz type nfs (ro,vers=4,addr=10.0.0.1,clientaddr=10.0.0.178)
```

I'm really unsure of how to handle this problem, so I'm coming here looking for suggestions either to diagnose this problem or other tests I could do to home in on what's causing NFS to misbehave. Since I'm mostly out of my depth here concerning NFS I hope we have some experts here on this forum.  :Very Happy: 

Many thanks in advance.

----------

## zeronullity

This is not a share of a share is it? As this would not be supported & not to mention be a security risk.

Also do you have this problem with all NFS clients you've tested? As this problem is often caused by

the client side & not server side. Try mount -a, along with exportfs -a next time you run into this issue 

and see if it corrects it.

----------

## Ivion

 *zeronullity wrote:*   

> This is not a share of a share is it? As this would not be supported & not to mention be a security risk.
> 
> Also do you have this problem with all NFS clients you've tested? As this problem is often caused by
> 
> the client side & not server side. Try mount -a, along with exportfs -a next time you run into this issue 
> ...

 

No, it's not a share of a share. As far as I understand it NFS 4 needs a virtual root with shares located beneath it, so I made /export the virtual NFS root and mounted the to be exported directory beneath it:

```
/Data on /export/Data type none (rw,bind)
```

Another NFS client, an Ubuntu netbook, has the same problems accessing the problematic directories - at the same time as my desktop does.

Running `exportfs -a` on the server and remounting the share on the client "fixes" the issue, but I wonder if it's just the remounting that fixes it, next time it happens I'll try just remounting. Does that tell you anything?

Edit: Just remounting the NFS share "fixes" the issue as well, no need for `exportfs -a` on the server.

----------

## zeronullity

Try disabling any drive standby/power management on the server & see if the same problem occurs.

----------

## Ivion

 *zeronullity wrote:*   

> Try disabling any drive standby/power management on the server & see if the same problem occurs.

 

```
$ hdparm -B /dev/sda

/dev/sda:

 APM_level   = not supported

$ hdparm -B /dev/sdb

/dev/sdb:

 APM_level   = off
```

Checking the spin-down timeout for both drives gives me:

```
  -S: bad/missing standby-interval value (0..255)
```

As far as I know I have no powersaving settings enabled in the kernel, but I'll probably go check that again.

----------

## zeronullity

What does the file names look like.. standard? If in another language you may need to add UTF-8 support for that language.

As NFS can sometimes have issues with file names.

----------

## Ivion

 *zeronullity wrote:*   

> What does the file names look like.. standard? If in another language you may need to add UTF-8 support for that language.
> 
> As NFS can sometimes have issues with file names.

 

As you can see in my first post, there's nothing wrong with name of the problematic directories - the question marks are in the area where the file attributes are, not where the name is:

```
d?????????   ? ?    ?       ?            ? Break_Blade_[gg]
```

I have other files and directories containing non-Latin characters which are displayed without problem over NFS, so I sincerely doubt that has anything to do with it. Even if it did, the problem should be always present, but like I said, it only presents itself occasionally.

----------

## Ivion

Just to exclude the possibility of it being either a hardware or a filesystem error I ran SMART self-tests on both hard drives and checked the XFS filesystem with `xfs_repair -n`. The "Extended offline" self-tests reported "Completed without error" for both sda and sdb, and xfs_repair found nothing wrong with the consistency of the filesystem.

I've also moved the offending directories into their own subdirectory and they continued to misbehave occasionally as before, so now I've made copies of those misbehaving directories and will report back if they continue to show the same behaviour.

Just to make clear once more, these problems are non-existent on the server itself - I can browse around just fine and if I `ls` the directories giving me trouble (trouble on the NFS clients that is) the problems disappear... for a while (maybe 30 minutes to an hour). It's only on the NFS shares mounted on the clients I have problems with certain directories - whenever the problems arise it seems to effect the same directories, hence why my last effort has been copying them in the hope of fixing whatever is wrong. Does NFS have a maximum entries per directory? Or a maximum file-depth I'm bumping against? I'm seriously lost.

PS: Samba has none of these problems, the Windows clients on my network using SMB have not suffered from these problems. It's solely the Linux clients (a Gentoo and an Ubuntu box) using the NFS share.

----------

## zeronullity

Has NFSv4 had issues with large directories in the past YES, does it currently have with the version your using.. I'm not sure.

You can try setting your test directories using standard file names, similar sizes, permissions, etc. to see if same behavior occurs with small/large amount of files. If it only happens on certain directories that should be a red flag.

Approx. how many files are we talking about..?

Did you check your setting for fs.file-max/fs.file-nr?

The maximum file size over NFSv4 on ia32, when server runs XFS is 8 TB. This match to the maximal file size the local file system is able to create (on this architecture). So I don't think file size has anything to do with it. Number of files in a directory is more likely.

Have you tried sshfs any problems with that?

NFS may not like the name/path or # of files in your directory for what ever reason, I'd make more directories for testing to narrow the issue,

making sure to omit the directories you had issues with previously. Perhaps even making a COPY of the directory having all the issues and try 

resetting file mode/permissions & file names recursively with different values.

I know on other NFS OS, similar issues are related to user permissions and if your using SYS_AUTH or kerberos, even having too many groups in a uid could cause issues if running older versions NFS /w newer versions of NFS on some particular systems.

There is a option also if I recall correctly to run v4 as v3 in dumbed down mode I don't recall what it is on Linux.. you may try that also.

Does the same problem occur with no_subtree_check disabled?

----------

## Ivion

 *zeronullity wrote:*   

> Has NFSv4 had issues with large directories in the past YES, does it currently have with the version your using.. I'm not sure.
> 
> [...]
> 
> Does the same problem occur with no_subtree_check disabled?

 

I'm running kernel 3.10.1-hardened-r1 and have tried downgrading to 3.8.6-hardened today, but both had the same problem. The nfs-utils package is version 1.2.6 for both. Interestingly, while messing around with different settings for my exports (I had turned on subtree_checking and turned it off when it didn't change the situation) nfsd seemingly crashed. Now, in 3.8.6 NFS v4 support is marked "experimental" so I don't know if this has anything to with my problem:

```
[ 6391.523371] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory

[ 6391.523705] NFSD: starting 90-second grace period (net c13a7ee4)

[ 6494.043914] NFSD: unable to generate recoverydir name (-2).

[ 6494.043922] NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly!

[ 6494.043940] general protection fault: 0000 [#1] 

[ 6494.043961] Modules linked in: 3c59x

[ 6494.043985] Pid: 26691, comm: nfsd Not tainted 3.8.6-hardened #3    /CN900-8237R

[ 6494.044006] EIP: 0060:[<c10f93d2>] EFLAGS: 00010292 CPU: 0

[ 6494.044020] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 000003bc

[ 6494.044037] ESI: de4ca400 EDI: dd4ae1e4 EBP: 00000000 ESP: cce47e00

[ 6494.044054]  DS: 0068 ES: 0068 FS: 0000 GS: 0068 SS: 0068

[ 6494.044069] CR0: 8005003b CR2: a45e90c4 CR3: 1d93c000 CR4: 000006b0

[ 6494.044085] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000

[ 6494.044100] DR6: ffff0ff0 DR7: 00000400

[ 6494.044116] Process nfsd (pid: 26691, ti=d965e674 task=d965e440 task.ti=d965e674)

[ 6494.044131] Stack:

[ 6494.044136]  de4f4480 de4ca400 c10f968b cce47e48 0065e440 dd4ae1c0 dd4ae1c0 de598020

[ 6494.044172]  dd8fc000 c10f1595 00000000 de4f4480 00000400 c10df499 dd47ba80 00000054

[ 6494.044207]  c7836000 de4ca400 dd8fc014 dd4ae1e4 00000000 c10f97a5 dd4ae1c0 c10f447c

[ 6494.044244] Call Trace:

[ 6494.044268]  [<c10f968b>] ? nfsd4_create_clid_dir+0x73/0x164

[ 6494.044292]  [<c10f1595>] ? nfs4_preprocess_seqid_op+0xdd/0x100

[ 6494.044315]  [<c10df499>] ? mk_fsid+0xc4/0xc4

[ 6494.044332]  [<c10f97a5>] ? nfsd4_client_record_create+0x29/0x2b

[ 6494.044352]  [<c10f447c>] ? nfsd4_open_confirm+0x131/0x154

[ 6494.044386]  [<c103f8e4>] ? getboottime+0x39/0x3e

[ 6494.044414]  [<c10e80b4>] ? nfsd4_proc_compound+0x24e/0x421

[ 6494.044442]  [<c10eb414>] ? nfsd4_decode_open_confirm+0x13/0x9f

[ 6494.044474]  [<c1286ea8>] ? svcauth_unix_set_client+0x194/0x23c

[ 6494.044501]  [<c10dd639>] ? nfsd_dispatch+0xc9/0x19f

[ 6494.044525]  [<c1283430>] ? svc_process_common+0x289/0x488

[ 6494.044550]  [<c10dd17c>] ? nfsd_destroy+0x4f/0x4f

[ 6494.044572]  [<c12837ff>] ? svc_process+0xde/0xfd

[ 6494.044595]  [<c10dd224>] ? nfsd+0xa8/0xef

[ 6494.044620]  [<c10363ab>] ? kthread+0x66/0x6b

[ 6494.044649]  [<c129ca42>] ? ret_from_kernel_thread+0x1a/0x28

[ 6494.044674]  [<c1036345>] ? __kthread_parkme+0x4a/0x4a

[ 6494.044692] Code: 89 c7 74 1a 89 44 24 04 c7 04 24 41 80 33 c1 e8 0d d6 19 00 c7 83 90 00 00 00 00 00 00 00 83 c4 10 89 f8 5b 5e 5f c3 56 53 89 c3 <8b> 83 48 01 00 00 8b 15 6c d9 40 c1 e8 37 f4 ff ff 89 c6 8b 80

[ 6494.044904] EIP: [<c10f93d2>] nfsd4_client_tracking_exit+0x4/0x39 SS:ESP 0068:cce47e00

[ 6494.045422] ---[ end trace 9d928104970bb5af ]---
```

I'll try turning subtree_checking on again on 3.10.1 and see if that helps.

Edit: Turned on subtree_checking with 3.10.1 as well, no crashing but it didn't stop the problem from occurring.

 *Quote:*   

> Approx. how many files are we talking about..?
> 
> Did you check your setting for fs.file-max/fs.file-nr?

 

We're talking about 7 directories and them containing 79 files, all but 1 take up 4.0k (the directories, not the files) - but I have a lot of directories being 4.0k. This number doesn't seem to be anywhere near where it could be causing problems. Checking the open filehandles in /proc/sys/fs/file-nr shows 807 open filehandles of the 48970 max, also seems strange if that was the cause of my problems. Although the debugging of nfsd posted in my first post does show it failing at OP_PUTFH, which might have something to do with filehandles?

 *Quote:*   

> Have you tried sshfs any problems with that?

 

I'm currently testing that out, I haven't had the problem trigger yet so I can't report back if the directories continue working through sshfs if they're throwing errors through NFS.

Edit: I can access the problematic directories just fine though sshfs when they are acting up with NFS, and as soon as I `ls` them on the sshfs mount they start working as well on the mounted NFS share.

----------

## zeronullity

I know people who've had similar issues in the past, downgrading to v3 NFS has usually fixed these issues.

Also the way VFS/NFS works it shouldn't make a difference what file system your using.. however if you have a spare drive it 

might be worth trying a different FS other then XFS.

79 files per directory is no where near a lot.. and I wouldn't consider the size of the directory to be a issue.

Were these files/directories in question originally created on the server or copied from another system over the network or from another drive perhaps containing

another file system type?  I've seen unusual issues in the past when copying directories from NTFS/FAT partitions to other file system types & vice versa.

This sounds like a caching issue on the server side either due to..

- bug

- uid/gid authentication issues between server/client

- lack of support in kernel, UTF, file system settings, etc.

Also check to see if you have /var/lib/nfs/v4recovery

If not try creating it. Also make sure it's readable/writable by the owner.

I'd try getting the issue below resolved before troubleshooting any further.

 *Quote:*   

>  [ 6391.523371] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory 
> 
> [ 6391.523705] NFSD: starting 90-second grace period (net c13a7ee4) 
> 
> [ 6494.043914] NFSD: unable to generate recoverydir name (-2). 
> ...

 Last edited by zeronullity on Mon Oct 21, 2013 3:51 pm; edited 1 time in total

----------

