# [SOLVED] Diskless clients no longer mount nfsroot

## dobbs

I have two nearly identical systems running diskless.  They've been running smoothly for several months, but when I upgraded the kernels to 3.12.13 I enabled NFSv4 support, and now the clients timeout mounting their nfsroot and panick.  I have since read that NFSv4 doesn't support root over NSF, so I disabled NFSv4 client support on the diskless kernels and they still won't mount nfsroot with v3!  What's the heck?

I can see the mount requests in the server logs.  tcpdump labels every UDP packet with a checksum error (I have increased the capture buffer with "-s 2048" and no jumbo frames on my net), but I see that when everything works fine too.  That's another source of confusion, but doubtfully relevant here.

It looks like I'll have to disable everything NFSv4 related on the server as well, but that might not solve the problem.  Rebooting the server will have to wait until Tuesday, so in the mean time does anyone have any recommendations?  Is this just a cautionary tale to avoid NFSv4 for diskless?Last edited by dobbs on Sat Aug 02, 2014 1:51 am; edited 1 time in total

----------

## krinn

nfs-utils try to mount v4, v3...

So with an nfsv4 server, to get a nfsv3 mount succeed you must pass a first fail to mount with v4. This is how nfs-utils discover what version to mount.

Try to hint nfs-utils so it will first try to mount it as nfsv3, passing the options nfsvers=3,ver=3 to the server.

----------

## dobbs

I'll try the kernel parameters, but there is no user space at this point.  The kernel is loaded via tftp, and is trying to mount its root FS via NFS.  I don't think it's trying v4 because the diskless kernel doesn't have that built-in, just the v3 and v2 NFS clients.

----------

## krinn

And your server is nfsv4, i mean you migrate it to v4 ?

nfsv3 clients don't need any change, that doesn't mean nfsv4 server can be use with nfsv3 server format. (I'm not sure of that state currently, earlier implementation was allowing it, even the nfsv4 doc state it was invalid). And this was a misery as it has produce more problem, people figure out it was the way to do it, if nfsroot was lacking ; first directory seen was taken as nfsroot... So even earlier versions allow that, they shouldn't, and maybe newer versions just drop it finally to only allow valid nfsv4 format.

here's a simple nfsv4 server config

/export 192.168.0.0/24(rw,sec=sys,fsid=0,no_subtree_check,async,no_root_squash,nohide,anonuid=250,anongid=250)

/export/distfiles 192.168.0.0/24(rw,root_squash,no_subtree_check,nohide,anonuid=250,anongid=250,secure,nohide)

now you can mount it nfsv3 or nfsv4, but the export itself use nfsv4 format ; most diff are directories are bind to a "nfsroot", and the nfsroot must be define with fsid=0 (the /export with fsid=0 part).

a nfsv4 client can then mount server:/distfiles (in nfsv4 you omit the nfsroot directory)

a nfsv3 client can mount server:/export/distfiles

never tried to see, but mount -t nfs4 server:/export/distfiles should be invalid and return /export/distfiles doesn't exist typeof error in theory.

----------

## dobbs

 *krinn wrote:*   

> And your server is nfsv4, i mean you migrate it to v4 ?
> 
> nfsv3 clients don't need any change, that doesn't mean nfsv4 server can be use with nfsv3 server format. (I'm not sure of that state currently, earlier implementation was allowing it, even the nfsv4 doc state it was invalid). And this was a misery as it has produce more problem, people figure out it was the way to do it, if nfsroot was lacking ; first directory seen was taken as nfsroot... So even earlier versions allow that, they shouldn't, and maybe newer versions just drop it finally to only allow valid nfsv4 format.

 

Oh.  I wasn't aware v4 required a different config format.  That's likely the problem then.

When you say "first directory seen", is that the directory listed first in exports when no fsid is specified?

Can I not have multiple rootfs exports??

----------

## krinn

 *dobbs wrote:*   

> When you say "first directory seen", is that the directory listed first in exports when no fsid is specified?

 

Yes, but this is not correct handling (should check nfsv4 spec to confirm that), but it looks like a stupid implementation to migrate nfsv3 to nfsv4 config.

So if rootfs is missing, first directory declare is set as rootfs and it might work if you have only one declare then.

But it cause problem as then people get lost : /home/stuff should be refer as / and not as /home/stuff or /stuff...

 *dobbs wrote:*   

> Can I not have multiple rootfs exports??

 

No, you can only have one, as all directories are attach to it. This is to solve the problem when user wants export multi-path directories.

ie: exporting /home/stuff and /var/data is done as

/mynfs/myspecialnfs/nfs as fsid=0 (/ the rootfs, i pickup a big export directory name as rootfs to emphasis the handling, but people just use /export for that)

/home/stuff as /mynfs/myspecialnfs/nfs/stuff (bind)

/var/data as /mynfs/myspecialnfs/nfs/data (bind).

And the nfsv4 result is:

/

/stuff

/data

so mount -t nfs4 /stuff whereyouwant will mount it

as my distfiles example gaven previously, here's its fstab part on my server:

/usr/portage/distfiles   /export/distfiles    none            bind              0 0

----------

## dobbs

Wait.  We're talking about different things.  When I say "nfsroot", I mean my diskless nodes' root mounted over NFS.  Apparently NFSv4 also requires all exports to exist under a single shared tree, which you're also calling an nfsroot.  I can work around that misguided requirement with bind mounts as you demonstrate.  But I don't see the point; this is a poor implementation.  I'm reverting to NFSv3 for now.

Addendum:

Sorry, I should mention I am grateful for your help in understanding NFSv4, and I do understand any problems I'm currently experiencing are my own fault for enabling a feature on a whim without reading any documentation whatsoever.  But I do strongly feel mucking about with the real filesystem to make the apparent export topology simpler is a Bad Idea.

----------

## dobbs

Three months later, the diskless nodes do boot by appending nfsvers=3 or vers=3 (note: not ver=3) to the nfsroot kernel parameter like so:

```
DEFAULT /bzImage-NetBoot

APPEND ip=dhcp root=/dev/nfs rw nfsroot=192.168.1.5:/exports/diskless/root,vers=3
```

For some reason, the client makes a MNT1 request by default, which I'm guessing means NFSv1:

```
Aug  1 18:02:34 palshife rpc.mountd[4528]: Received MNT1(/exports/diskless/root) request from 192.168.1.103
```

While this apparently worked previously, MNT1 requests just don't work anymore so specifying the NFS version is required now.

```
$ man 5 nfs
```

----------

## krinn

 *dobbs wrote:*   

> Three months later, the diskless nodes do boot by appending nfsvers=3 or vers=3 (note: not ver=3) to the nfsroot

 

Sorry for the mistake :/

----------

## dobbs

 *krinn wrote:*   

> Sorry for the mistake :/

 

Don't worry about it too much.  The main reason it took me three months is because I waited to reboot my server (after removing NFSv4) for other reasons.  You did help a lot, and now I understand NFSv4 enough to consider migrating on next kernel update.  Still don't like the fake exports tree, but I'll deal with it. :)

The wiki needs to be updated with this info though.

----------

## krinn

It depend, the wiki should state clearly its nfs3 implementation only.

So it's correct, as long as you stick with what the wiki said as-is :

http://wiki.gentoo.org/wiki/Diskless_nodes#Before_you_start_2

Kernel config show nfsv4 is disable, if you follow it, no problem, but if anyone enable nfsv4, the game change.

Also note the http://wiki.gentoo.org/wiki/Diskless_nodes#Configuring_the_NFS_server_2 server configuration for exports is valid nfsv3 but invalid for nfsv4

----------

## dobbs

Yeah, the wiki article needs to mention its configuration is for NFSv3.  But the vers has to be specified even when supporting only NFSv3 now too.

My tests show that without specifying vers, clients will make MNT1 requests and then fail and halt.  With vers=3 the clients make MNT3 requests and work.  The only discernible change is that "it just worked" when the server ran kernel version 3.10.25 and prior, but failed on 3.12.13 and 3.12.21-r1.  The client kernel version didn't make a difference in my tests.  My kernels are all from the sys-kernel/gentoo-sources package.  It may work with properly configured NFSv4 as well, but I was too thick-headed to test it at the time.

----------

