# [SOLVED] NFS failure on mixed 10 and 100Mbps network

## andyfraser33

Hi,

On Friday my old 10/100Mbps switch died. Until my new switch arrives I'm using my even older Netgear 10Mbps hub. The hub is connected to another 10/100Mbps switch and has 4 machines connected to it. The switch has my main Gentoo PC and my Mac connected to it.

Today when I tried to copy a file from my Gentoo PC connected to the switch at 100Mbps full-duplex to my Gentoo file server connected to the hub at 10Mbps half-duplex the copy process hung. ps reported that cp was in an uninterruptible sleep (D) state. The only option to clear this process is to reboot. Every other network service worked perfectly.

I can mount and unmount ok. I can copy from the server to the client but I can't copy from the client to the server. No configuration has changed since I last used the PC on Friday.

My Gentoo PC has a nForce 2 Ultra motherboard and I use the onboard NIC via the forcedeth driver. I tried a PCI Realtek 8139 card but I got the same results.

Two other Gentoo machines connected to the hub at 10Mbps half-duplex had no problems using NFS to the file server. My Mac connected to the switch at 100Mbps full-duplex also has no problems using NFS with my Gentoo file server.

I connected the file server directly to the switch and everything worked perfectly.

All I can think of is that my Gentoo PC doesn't like the mixed network environment. Does anyone have any idea what's going on here? The only thing that's changed is the hub replacing the dead switch. I could move the file server for now and hopefully by Thursday I'll have another switch but I'd really like to know what's going on rather than just try to work around the issue.

TIA.

----------

## beavsux

 *andyfraser33 wrote:*   

> Hi,
> 
> Today when I tried to copy a file from my Gentoo PC connected to the switch at 100Mbps full-duplex to my Gentoo file server connected to the hub at 10Mbps half-duplex the copy process hung. ps reported that cp was in an uninterruptible sleep (D) state. The only option to clear this process is to reboot. Every other network service worked perfectly.
> 
> I connected the file server directly to the switch and everything worked perfectly.
> ...

 

The only thing I can think of is to make sure all of the computers are configured to autodetect the LAN speed.

----------

## andyfraser33

 *beavsux wrote:*   

> The only thing I can think of is to make sure all of the computers are configured to autodetect the LAN speed.

 

Yes, they are. There are zero other network problems, just NFS between one Gentoo machine and the Gentoo file server.

----------

## borchi

i found out (ltsp server with some workstations with old 10Mbps eth. adapters) that i needed to adjust the maximum block size for nfs server. couldn't figure out how to do it "softly" so i edited the kernel source file:

```
/usr/src/linux/include/linux/nfsd/const.h
```

changed the line:

```
#define NFSSVC_MAXBLKSIZE       (32*1024)
```

to:

```
#define NFSSVC_MAXBLKSIZE       (8*1024)
```

recompiled the kernel and my old workstations can connect to nfs server on ltsp. you could also try to go lower than 8*1024 (i heard some suggesting going as low as 2*1024) as i sometimes get a message that a client can't connect to nfsd when booting the ltsp workstation but it finds it the next second and i don't notice any problems after that.

----------

## andyfraser33

 *borchi wrote:*   

> i found out (ltsp server with some workstations with old 10Mbps eth. adapters) that i needed to adjust the maximum block size for nfs server. couldn't figure out how to do it "softly" so i edited the kernel source file:
> 
> ```
> /usr/src/linux/include/linux/nfsd/const.h
> ```
> ...

 

It's worth a try. I think the block size can be tuned with the rsize and wsize parameters in /etc/exports. I don't know whether they're the same block size though.

----------

## somasekh

I have the exact same problem.  Gig-E switch connected to four machine, 2 running e1000, 2 running forcedeth (nforce2 boards).  Every reboot I get very poor performance from the boxes with nforce2. 

Could you check if you are getting a large number of carrier failures

/sbin/ifconfig (look for the carrier: number).

The usual symptom of a problem is that there are a lot of carrier failures which keep increasing.  

Solution. Unplug the cable (with the machine up) for a short time (3secs), plug it back in again.  Everything is back to normal. This is not an ideal situation though.  

This problem seems to have appeared since 2.6.11 (everything was fine in 2.6. :Cool: .

----------

## andyfraser33

 *somasekh wrote:*   

> I have the exact same problem.  Gig-E switch connected to four machine, 2 running e1000, 2 running forcedeth (nforce2 boards).  Every reboot I get very poor performance from the boxes with nforce2. 
> 
> Could you check if you are getting a large number of carrier failures
> 
> /sbin/ifconfig (look for the carrier: number).
> ...

 

This rings a bell. I had this problem and as with you it only appeared with 2.6.11. Is the forcedeth driver compiled as a module by any chance? If it is compile it into the kernel. That fixed that problem for me. I have no idea why though.

----------

## somasekh

Yup,

  It is compiled as a module. Compiling it into the kernel right now on one of the machines.  Oh, I do hope you are right and this vexing problem vanishes.  Thanks Andy.

--

Dinesh

----------

## andyfraser33

 *somasekh wrote:*   

> Yup,
> 
>   It is compiled as a module. Compiling it into the kernel right now on one of the machines.  Oh, I do hope you are right and this vexing problem vanishes.  Thanks Andy.
> 
> --
> ...

 

I probably should have said I first saw your problem with gentoo-sources-2.6.11-r4. I haven't tested -r5 fully yet but the forcedeth problem stayed fixed with it. My NFS problem blew up before I switched over to -r5.

----------

## somasekh

Looks like Andy's solution works.  Got forcedeth compiled into the kernel. Just tried it out and it works fine. No more carrier errors.

Previously this was what ifconfig would show

TX packets:9309543 errors:211830 dropped:0 overruns:0 carrier:211830

Now it looks a whole lot better

TX packets:65465 errors:0 dropped:0 overruns:0 carrier:0

Oh, and as I type this I must admit that the best thing about Gentoo are these forums and the folks who inhabit it. Thanks again Andy.

--

Dinesh

----------

## andyfraser33

I glad it worked for you too.  :Smile: 

It doesn't make much sense to me that it works compiled in rather than as a module. It always used to work as a module up to 2.6.11. My money is on it being a bug that'll hopefully be sorted out by 2.6.12 but maybe that's just optimistic. I thought this issue might be relevant to my NFS problems initially but having tried a RTL8139 card I know it's not that. 8139 based cards have never caused me any problem in all the years I've been using them.

----------

## andyfraser33

borchi, you're a star! Your idea didn't directly solve my problem but it did point me in the right direction.

I remembered that the rsize and wsize mount options can be used to specify the block size to be used so I thought I'd try those rather than hacking the kernel. They didn't work but I was prompted to investigate other options. After a little Googling I came across the timeo option. Apparently NFS times out after one second or less depending on which web page you read. NFS also hard mounts by default it seems so I thought if the server wasn't responding quickly enough it might be putting the process into an uninterpretable state.

To cut a long story short and after some experimentation, I found that setting rsize and wsize to 1024 and timeo to 30 I was able to copy files again. I also added hard (it can't hurt and I read that using soft can lose data so it's better to be sure) and intr (that's supposed to allow the process to respond to interupt signals) to the options in /etc/fstab.

I'm still unsure why NFS is timing out (if it actually is) on the machine on the 100Mbps segment when machines on the 10Mbps segment are working fine. I'll need to investigate further.

Many thanks to all who responded.

[UPDATE] My new switch arrived today and NFS is now working as it was before with the original settings.

----------

