# NFS Help:  "server not responding" but connectivity is good

## RosenSama

I'm having problems with some NFS mounts.  I have one server with several exports and multiple clients mounting the same exports.  Only one client has problems.  Inside a few hours of booting up, I'll find the following message at the end of dmesg output:

```
nfs: server 192.168.1.1 not responding, still trying
```

and it never recovers.  I expect to see:

```
nfs: server 192.168.1.1 OK
```

 but never do.  While this is happening, I can SSH in and scp files back and forth with good throughput.  Other clients have no problems with the same mount.  So even if there is a connectivity issue why isn't that client automatically restoring the connection?  How can I go about debugging?

I'm using NFS v3 server and client all around.

----------

## VinzC

Do you have a Marvell network card?

----------

## RosenSama

Nope, unless one of the two below use Marvell stuff

```
$ lspci | grep -i net

00:05.0 Bridge: nVidia Corporation CK8S Ethernet Controller (rev a2)

02:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
```

The active interface is using the r8169 kernel driver.

----------

## RosenSama

I've found the first post that sounds reasonably similar to mine.  On my client that's having trouble netstat shows a connection to port 2049 on the NFS server "stuck" in FIN_WAIT2 state.  

http://www.ussg.iu.edu/hypermail/linux/kernel/0808.3/0123.html

----------

## VinzC

I once had troubles like this with a Marvell adapter. I never succeeded in fixing them so I plugged in a Realtek 8139. I know nothing about RTL-8169 unfortunately. Have you tried with a good old, well known network adapter like NE2000 or RTL-8139? I know for sure these work perfectly.

----------

## RosenSama

I don't believe it's a network adapter problem.  I have no problem with SSH or SCP during the period NFS reports troubles.  This implies to me it's a problem with NFS software.

----------

## VinzC

 *RosenSama wrote:*   

> I don't believe it's a network adapter problem.  I have no problem with SSH or SCP during the period NFS reports troubles.  This implies to me it's a problem with NFS software.

 

Not necessarily. I did experience exactly the same symptoms as yours. Neither SSH nor SCP failed. Only NFS caused major troubles with the network until I realized this wasn't NFS but my network adapter which caused so much troubles. You're free to believe me or not  :Wink:  .

----------

## jamapii

some ideas you can try:

make sure portmap is running on both machines (ps ax|grep portmap)

try mounting with the options "rsize=1024,wsize=1024"

----------

## RosenSama

 *VinzC wrote:*   

>  *RosenSama wrote:*   I don't believe it's a network adapter problem.  I have no problem with SSH or SCP during the period NFS reports troubles.  This implies to me it's a problem with NFS software. 
> 
> Not necessarily. I did experience exactly the same symptoms as yours. Neither SSH nor SCP failed. Only NFS caused major troubles with the network until I realized this wasn't NFS but my network adapter which caused so much troubles. You're free to believe me or not  .

 I believe you, I just don't understand the mechanism by which hardware / driver can have an adverse affect on one specific high level protocol like NFS and not others.

----------

## jamapii

 *RosenSama wrote:*   

> I believe you, I just don't understand the mechanism by which hardware / driver can have an adverse affect on one specific high level protocol like NFS and not others.

 

It's the difference between TCP and UDP, TCP compensates for dropped packages, with UDP the application is supposed to. NFS uses UDP usually. Try Voip, the bad network adapter should be audible  :Wink: 

----------

## RosenSama

So could I potentially test this by forcing NFS to use TCP somehow?

----------

## VinzC

Isn't it simpler to just try with another (well-known) network adapter?

----------

## Janne Pikkarainen

 *RosenSama wrote:*   

> So could I potentially test this by forcing NFS to use TCP somehow?

 

Put tcp to your /etc/fstab to the options part of your nfs mount line.

If your NIC is connected to 100 Mbps network, make sure it's using full duplex mode instead of half duplex. Even if you are sure it's running in full duplex, please do double-check the result with ethtool eth0 or mii-tool eth0.

----------

## RosenSama

Updating my NFS server kernel appears to have solved the issue.  Was gentoo-sources 2.6.22, now is gentoo-sources 2.6.25.  Thanks for your help.

----------

## RosenSama

I spoke too quickly.  It just takes much longer for the problem to show up.  

I'll try NFS via TCP now.

----------

## Janne Pikkarainen

Some versions of nfs-utils have been problematic for me over time, too - try to up/downgrade that package.

----------

## Anarcho

I had really the same issues. NFS drops but SSH and other things are working fine and other PCs don't loose the NFS mount.

I have a nVidia NIC in it, too, which I blame for these issues. I switched to an Intel Pro 1000 PCI-E Card a few weeks ago and had no problems since then. So, for me, the fix is get rid of this nVidia Network card!

----------

## RosenSama

For me it was happening with both forcedeth and r8169 drivers on two different clients to the same server.  It was just as defined in the mailing list post above.  `netstat -natp` shows a "hung" connection from a client port < 1024 to server port 2049.  On the client it's FIN_WAIT2 and on the server it's CLOSE_WAIT.   

Anyways, as found in that thread, downgrading the kernels on the clients from 2.6.26 to 2.6.24 has cleared up the issue for several days now.  

Thanks for all the help.

----------

## VinzC

 *RosenSama wrote:*   

> For me it was happening with both forcedeth and r8169 drivers on two different clients to the same server.  It was just as defined in the mailing list post above.  `netstat -natp` shows a "hung" connection from a client port < 1024 to server port 2049.  On the client it's FIN_WAIT2 and on the server it's CLOSE_WAIT.   
> 
> Anyways, as found in that thread, downgrading the kernels on the clients from 2.6.26 to 2.6.24 has cleared up the issue for several days now.  
> 
> Thanks for all the help.

 

If both clients (those that fail with NFS) have the same problematic Ethernet hardware and brand, I expect the same troubles both sides. This is just a guess though.

----------

## depontius

 *RosenSama wrote:*   

> Anyways, as found in that thread, downgrading the kernels on the clients from 2.6.26 to 2.6.24 has cleared up the issue for several days now.  

 

There has been a thread on LKML in the past month about nfs hangs beginning with 2.6.25-6 or so.  I'm running 2.6.25-hardened-r7 on my server and 2.6.25-gentoo-r7 on my clients.  I get occasional "stalls" for maybe 5 seconds at a time, but no out-and-out hangs.  There are some hints that ACLs may be involved, and I'm not sure if the critical kernel level is on the server or client.  I'm living with it for now, but next time I build a kernel I'm going to try leaving out the ACLs, since I've never gotten around to using them.  I'll add them back in when the stall is resolved, and then never get around to using them for several more years.

----------

