# [SOLVED] NFS causing High Load, Low CPU Util on Client

## Karma T. Foxx

Hey folks!

I have a configuration such that VM A is a web server that mounts a share from dedicated NFS server VM B. VM B is only there to serve the bulk data (images, thumbnails) for a couple sites, all the scripts and so on reside on VM A. This configuration has been working great with 20,000+ visitors/day for several months up until yesterday evening when crap hit the fan and VM A started seeing load averages ranging from 9.0 to 120.0 (that's a personal record, typical load average < 2) while the CPU util sits somewhere between 20-50% (normal). There has been no change in the web traffic and I have ruled out DoS through log analysis (also running mod_security). VM B is puttering along showing no increased load. I have spent the last 6 hours scouring google and it (as well as the low load when VM B is down) has me convinced the problem is apache waiting for NFS to become available so it can grab the files it needs to serve. I understand there is a 255 transaction hard-limit with NFS and will be trying to "split up the mounts" next, however I am doubtful this is a permanent solution since this issue seems to have cropped up out of the blue. I have tried sync, async, read/write-windows from 4k to 32k, forcing NFSv3 and NFSv4(default) to no avail. Network (100mbit/s link between physical servers) appears to be issue-free, no packets dropped or erroneous packets logged on either VM or the managed switch. Rebooted the switch and all dom0s anyway, still no dice. When VM A is rebooted things look great for about 5 minutes then suddenly the load average spikes which leads me to believe I am exhausting a limit somewhere that I am unaware of. Both VMs have tonnes of free RAM. Dmesg and the logfiles on either server seem to reveal nothing. Dom-Us are using xen-sources 2.6.18 with the nfs server and client (including v4) compiled in.

I am hoping to avoid ATA over ethernet since this environment was working great less than 24 hours ago and I need NFS' ability to be mounted across several hosts for a mass virtual hosting platform I will be launching later. If anyone has any pointers I would be all too thrilled to hear them, I am at wit's end.

My current configuration looks like this, though I have added, removed and changed every conceivable option with no success:

VM B /etc/exports

/mnt/storage aa.aa.aa.aa(rw,no_root_squash,no_subtree_check)

VM A /etc/fstab

bb.bb.bb.bb:/mnt/storage       /mnt/storage    nfs     rsize=32768,wsize=32768,soft,timeo=10,rw,intr,nosuid,noexec          0 0

Thanks for any insight!

----------

## Hu

Is there any significant difference in the number and type of NFS packets on the network between that five minute window of calm and the general failure afterward?  If you disable starting Apache, reboot, and wait 5 minutes, does the problem crop up anyway?

How many users have access to this server?  Could someone have installed a new web application, cron job, or other daemon that might be accessing the NFS mount?

----------

## Karma T. Foxx

 *Quote:*   

> Is there any significant difference in the number and type of NFS packets on the network between that five minute window of calm and the general failure afterward?

 

No

 *Quote:*   

> If you disable starting Apache, reboot, and wait 5 minutes, does the problem crop up anyway? 

 

No. Nothing but apache uses the share.

 *Quote:*   

> How many users have access to this server?

 

Just me, only over a VPN. No sign of forced entry.

----------

## Karma T. Foxx

I've spread across six shares and still no luck. RAID checks out fine.

----------

## Karma T. Foxx

So far it looks like it was a MAC conflict with a new sql-slave VM. It's the first thing I should have checked, feel like an idiot.

That's a Mistake You Only Make Once(tm)

Thanks for the help, Hu.

----------

