# diskless workstation booting issue

## ridulak

I've been going through the Gentoo diskless workstation howto, but run into a problem that I just can't see the fix for. Having worn out the kernel root-nfs.txt, the gentoo docs and google, I'm forced to admit that I can't find the answer to this one without a pointer or two. 

Apologies if this isn't quite the right forum, I couldn't decide between networking, installation or misc for this posting.

The situation: I have a Gentoo server up and working. DHCP, tftp and NFS are all demonstratably working using it. The box is sync'd up to the current recommended portage (emerge sync|system|world, as of about a week ago) running kernel 2.6.8. I've installed the root filesystem et-al for the diskless client and have it exported via NFS. When I turn on my diskless client (PXE boot) it successfully gets its IP address, downloads and runs the kernel (also 2.6.8), but when it comes to mounting its root filesystem it panics with the following error;

VFS: Cannot open root device "nfs" or unknown block(0,255)

Please append a correct "root=" boot option

Kernel panic: VFS: Unable to mount root fs on unknown block(0,255)

As far as I can determine the pxe configuration is correct

# cat /diskless/pxelinux.cfg/default

DEFAULT /bzImage

APPEND ip=dhcp root=/dev/nfs nfsroot=192.168.1.25:/diskless/192.168.1.170

The workstation is definitely loading the /diskless/pxelinux.cfg/default file as I can see that from the tftp log entries.

The client kernel I am loading does appear to have NFS compiled in

# grep -i NFS /usr/src/linux/.config

CONFIG_NFS_FS=y

CONFIG_NFS_V3=y

CONFIG_NFS_V4=y

CONFIG_NFS_DIRECTIO=y

# CONFIG_NFSD is not set

NFS appears to be ok

# cat /etc/exports

# /etc/exports: NFS file systems being exported.  See exports(5).

/diskless/192.168.1.170         192.168.1.170(sync,rw,no_root_squash,no_all_squash)

/opt                    192.168.1.0/24(sync,rw,no_root_squash,no_all_squash)

/usr                    192.168.1.0/24(sync,rw,no_root_squash,no_all_squash)

/home                   192.168.1.0/24(sync,rw,no_root_squash,no_all_squash)

If I add in an extra entry (or remove the IP address restriction from an existing entry) I can mount the filesystem from another linux box, so that would seem to be ok. The server only has one network interface active just now (rest are marked as down in ifconfig), so it can't be that the nfs server has only bound to one interface. 

Is there some issue with the 2.6 kernel series not being able to use an NFS root filesystem? Since the nfsroot.txt file is still there in the kernel source I would assume not. The only thing I can think of is that I did use udev on the server and client, devfs support was not compiled in. I did try compiling a client kernel with devfs support but it made no difference.

I'm completely out of ideas, any and all help appreciated!

Thanks

Steven

----------

## rasmussen

I have the same problem. Kernel is 2.6.7 and /etc/exports looks like

```

# /etc/exports: NFS file systems being exported.  See exports(5).

/home                           172.16.0.0/255.255.0.0(rw,root_squash,async)

/export/scratch                 172.16.0.0/255.255.0.0(rw,root_squash,async)

/export/sif                    172.16.100.20/255.255.255.255(rw,no_root_squash,async)

/export/vit                    172.16.100.21/255.255.255.255(rw,no_root_squash,async)

/export/skjald                  172.16.100.17/255.255.255.255(rw,no_root_squash,async)

/export/udgaardsloke            172.16.100.40/255.255.255.255(rw,no_root_squash,async)

```

The rest of my config is similar to ridulak's.

Odd thing is that I found out that if I change e.g.

```

/export/sif                    172.16.100.20/255.255.255.255(rw,no_root_squash,async)

```

to

```

/export/sif                    172.16.0.0/255.255.0.0(rw,no_root_squash,async)

```

the diskless client is able to mount /export/sif as root. But this should not be necessary   :Confused: 

----------

## warren64

I had a similiar problem and was not able to get the 2.6 kernels to mount root over nfs.  I found only 2.4.25 gave me a config option for mounting root over NFS  which allowed the client to mount root with a static IP in /etc/exports.  

I would like to have the 2.6 kernel, but I need a static IP for NFS as I am planning to set up multiple clients.  Any help would be greatly appreciated.

----------

## MajikC

I use dnsmasq for my dhcp server and have lines as such to force a static ip even though I am using dhcp, this might help you warren64.

```

dhcp-host=zaphod,10.42.42.42,infinite

dhcp-host=ford,10.42.42.10,infinite

dhcp-host=parents,10.42.42.30,infinite

```

I may be missing the point here but do both of you also have:

```

Networking support  ---> Networking options  ---> IP: kernel level autoconfiguration

File Systems ---> Network File Systems ---> Root file system on NFS

```

...set in the nodes kernel?

Another good thing to try is running tcpdump on the nfs server, this will give you loads of great networking debug information.

Another point no need for a netmask in your exports if its got a full IP, i.e. 10.42.42.42(async,...) will work without problems.

BTW I use gentoo dev 2.6.9 kernel without (many  :Wink: ) problems.

----------

## jdgill0

I have recently been working on setting up diskless workstations that use the 2.6 kernel.  I ran into the same problem.  The problem is related to udev, or at least it was for me.  I switched to devfs and my diskless clients boot just fine now.  I would really like to know how to use udev instead of devfs.

----------

## MajikC

To use udev you will need to create the console and null devices, like so:

```
mknod -m 660 /diskless/node1/dev/console c 5 1

mknod -m 660 /diskless/node1/dev/null c 1 3
```

you may also want to stop gentoo from backingup the dev tree, in file /etc/conf.d/rc change RC_DEVICE_TARBALL="yes" to "no"

----------

## jdgill0

 *MajikC wrote:*   

> To use udev you will need to create the console and null devices, like so:
> 
> ```
> mknod -m 660 /diskless/node1/dev/console c 5 1
> 
> ...

 

Thanks MajikC, works great.  Using devfs was not a big deal, but given udev is now the standard for Linux I prefer to use it.

----------

## rafo

I have a nicely working diskless setup where the diskless node runs a kernel with devfs compiled in.

I have tried to change it to use udev, but I just can't get it to work. The diskless node semi-dies at the point where /sbin/rc calls /sbin/depscan.sh (it still responds to ping), and it does not seem to be entirely reproducible: Sometimes I see the green message "Caching service dependencies", sometimes not.

The diskless guide currently does not say anything about udev versus devfs. If it is very difficult, or impossible (?), to get udev and diskless to work together then the guide should advise to use devfs. But it may be that I am just missing something? (I have tried creating /dev/console and /dev/null as suggested in this thread, but it does not seem to help)

Does anyone out there have a working udev-based diskless configuration that is also fully updated? I am thinking that maybe udev+diskless worked well at some point in time but is now broken ..?

----------

## rafo

A couple of days ago I wrote

 *Quote:*   

> I have tried to change it to use udev, but I just can't get it to work. The diskless node semi-dies at the point where /sbin/rc calls /sbin/depscan.sh (it still responds to ping), and it does not seem to be entirely reproducible: Sometimes I see the green message "Caching service dependencies", sometimes not.

 

I now have it all sorted it out, I think. I have submitted a number of bug reports on the "Diskless Nodes with Gentoo" paper, September 16, 2005  (http://www.gentoo.org/doc/en/diskless-howto.xml). The Bugzilla numbers are 106525, 107258, 107260, 107262, 107263, 107264, 107271.

----------

## jamapii

The problem is likely it can't write to NFS.

You must supply wsize=1024 in pxelinux. The /usr/src/linux/Documentation/nfsroot.txt is wrong about the defaults.

----------

## rafo

Hi jamapii,

my understanding is that my problem (devfs works, udev fails) was solved by pre-creating the /sys directory in the slave filesystem, which the "Diskless Nodes with Gentoo" paper fails to mention. This is reported in https://bugs.gentoo.org/show_bug.cgi?id=107258.

----------

## rafo

 *jamapii wrote:*   

> 
> 
> You must supply wsize=1024 in pxelinux. The /usr/src/linux/Documentation/nfsroot.txt is wrong about the defaults.

 

Where should the wsize=1024 go? In the /diskless/pxelinux.cfg/* files? I have a working setup but it seems wsize=1024 is not needed, at least not in the pxelinux.cfg/* files.

----------

## emuller

Hi Rafo,

I worked through all your bug postings for diskless nodes and was able to get the boot working... was having the problem that /usr wasn't being mounted soon enough... now it's fixed.

I solved the "find xarg problem" by populating the /usr mount point in /diskless/clientname/usr with bin and putting find and xarg there.  They are clobbered once /usr gets mounted.

but now I have a shutdown problem:  the root file system (nfs) is mounted readonly by netmount because it can't unmount it.  I get a bunch of read-only errors from what follows.

How did you deal with this?

cheers, e.

----------

## rafo

Hi emuller,

it may well be that I have various complaints in my shutdown sequence--I guess I have been happy with the diskless node coming down one way or the other, and in reasonable time. But of course it should do so in a nice and clean way.

I don't have access to my test setup where I sit now. I'll make myself a note to check it and come back with more info.

----------

## emuller

For now I put a "return 0" as the first line in netmount stop() to disable stopping of netmount.  This results in a clean shutdown.  I don't want netmount going down anyway cuz that remounts my root filesystem readonly and drops /usr /opt /home... any ideas if this'll cause problems elsewhere?

----------

## rafo

I have similar results in my setup. The console output during `poweroff' is unclean, but it is wiped before I have a chance to write down the contents. If I type `init 1' there is an infinite loop saying

ln: creating symbolic link `var/lib/init.d/startd/netmount' to `/etc/init.d/net.lo': Read-only file system

With your hack (an immediate `return 0' in stop() of /etc/init.d/netmount) these symptoms are cured and doing `init 1' works fine. Will you consider writing a Bugzilla entry?

----------

## emuller

Acctually I think the fix is in these posts... 

 *kyron at https://forums.gentoo.org/viewtopic.php?p=3241346#3241346 wrote:*   

> FYI, there is some work being done specifically for the NFS root mount/unmount issues:
> 
> https://bugs.gentoo.org/show_bug.cgi?id=99682
> 
> And the following is also interesting:
> ...

 

I'll post again once I've tried them out.

----------

