# swapper: page allocation failure

## fangorn

Hi, 

I get the following messages on a production server with increasing amount. 

It was running for months without problems. Then a few messages appeared. 

Then the machine completely hung itself without even recognizing it. 

After a hard reset (pulled the power cables because the machine did not 

react to anything less intruding any more) these messages appear multiple 

times a day while the server is doing next to nothing. Just a little NFS v3 

server work for ten clients. 

```
Oct 27 10:02:06 <server> kernel: [95082.443483] swapper: page allocation failure. order:0, mode:0x4020

Oct 27 10:02:06 <server> kernel: [95082.443489] Pid: 0, comm: swapper Not tainted 2.6.32-5-amd64 #1

Oct 27 10:02:06 <server> kernel: [95082.443493] Call Trace:

Oct 27 10:02:06 <server> kernel: [95082.443495]  <IRQ>  [<ffffffff810ba61a>] ? __alloc_pages_nodemask+0x592/0x5f4

Oct 27 10:02:06 <server> kernel: [95082.443514]  [<ffffffff810e696a>] ? new_slab+0x5b/0x1ca

Oct 27 10:02:06 <server> kernel: [95082.443520]  [<ffffffff810e6cc9>] ? __slab_alloc+0x1f0/0x39b

Oct 27 10:02:06 <server> kernel: [95082.443526]  [<ffffffff81249a8c>] ? __netdev_alloc_skb+0x29/0x45

Oct 27 10:02:06 <server> kernel: [95082.443532]  [<ffffffff810e76fb>] ? __kmalloc_node_track_caller+0xbb/0x11b

Oct 27 10:02:06 <server> kernel: [95082.443537]  [<ffffffff81249a8c>] ? __netdev_alloc_skb+0x29/0x45

Oct 27 10:02:06 <server> kernel: [95082.443544]  [<ffffffff81248ab9>] ? __alloc_skb+0x69/0x15a

Oct 27 10:02:06 <server> kernel: [95082.443550]  [<ffffffff8119bb88>] ? swiotlb_map_page+0x0/0xc4

Oct 27 10:02:06 <server> kernel: [95082.443552]  [<ffffffff81249a8c>] ? __netdev_alloc_skb+0x29/0x45

Oct 27 10:02:06 <server> kernel: [95082.443570]  [<ffffffffa00ce7b7>] ? e1000_alloc_rx_buffers+0x85/0x1b3 [e1000e]

Oct 27 10:02:06 <server> kernel: [95082.443576]  [<ffffffffa00cebcd>] ? e1000_clean_rx_irq+0x282/0x2bb [e1000e]

Oct 27 10:02:06 <server> kernel: [95082.443582]  [<ffffffffa00d0104>] ? e1000_clean+0x70/0x219 [e1000e]

Oct 27 10:02:06 <server> kernel: [95082.443585]  [<ffffffff810e584f>] ? __slab_free+0x7f/0x27a

Oct 27 10:02:06 <server> kernel: [95082.443591]  [<ffffffffa00ccb27>] ? e1000_put_txbuf+0x35/0x47 [e1000e]

Oct 27 10:02:06 <server> kernel: [95082.443596]  [<ffffffff8124fbe1>] ? net_rx_action+0xae/0x1c9

Oct 27 10:02:06 <server> kernel: [95082.443601]  [<ffffffff81053caf>] ? __do_softirq+0xdd/0x1a6

Oct 27 10:02:06 <server> kernel: [95082.443606]  [<ffffffffa00cce51>] ? e1000_intr_msix_tx+0x30/0x4f [e1000e]

Oct 27 10:02:06 <server> kernel: [95082.443611]  [<ffffffff81011cac>] ? call_softirq+0x1c/0x30

Oct 27 10:02:06 <server> kernel: [95082.443614]  [<ffffffff8101322b>] ? do_softirq+0x3f/0x7c

Oct 27 10:02:06 <server> kernel: [95082.443617]  [<ffffffff81053b1f>] ? irq_exit+0x36/0x76

Oct 27 10:02:06 <server> kernel: [95082.443619]  [<ffffffff81012922>] ? do_IRQ+0xa0/0xb6

Oct 27 10:02:06 <server> kernel: [95082.443622]  [<ffffffff810114d3>] ? ret_from_intr+0x0/0x11

Oct 27 10:02:06 <server> kernel: [95082.443623]  <EOI>  [<ffffffff8102c58c>] ? native_safe_halt+0x2/0x3

Oct 27 10:02:06 <server> kernel: [95082.443632]  [<ffffffffa018a1ad>] ? acpi_idle_do_entry+0x31/0x58 [processor]

Oct 27 10:02:06 <server> kernel: [95082.443636]  [<ffffffffa018a23c>] ? acpi_idle_enter_c1+0x68/0xb8 [processor]

Oct 27 10:02:06 <server> kernel: [95082.443640]  [<ffffffff81239e26>] ? cpuidle_idle_call+0x94/0xee

Oct 27 10:02:06 <server> kernel: [95082.443643]  [<ffffffff8100feb1>] ? cpu_idle+0xa2/0xda

Oct 27 10:02:06 <server> kernel: [95082.443648]  [<ffffffff814f3140>] ? early_idt_handler+0x0/0x71

Oct 27 10:02:06 <server> kernel: [95082.443651]  [<ffffffff814f3cdd>] ? start_kernel+0x3dc/0x3e8

Oct 27 10:02:06 <server> kernel: [95082.443654]  [<ffffffff814f33b7>] ? x86_64_start_kernel+0xf9/0x106

Oct 27 10:02:06 <server> kernel: [95082.443656] Mem-Info:

Oct 27 10:02:06 <server> kernel: [95082.443657] Node 0 DMA per-cpu:

Oct 27 10:02:06 <server> kernel: [95082.443659] CPU    0: hi:    0, btch:   1 usd:   0

Oct 27 10:02:06 <server> kernel: [95082.443661] CPU    1: hi:    0, btch:   1 usd:   0

Oct 27 10:02:06 <server> kernel: [95082.443663] CPU    2: hi:    0, btch:   1 usd:   0

Oct 27 10:02:06 <server> kernel: [95082.443665] CPU    3: hi:    0, btch:   1 usd:   0

Oct 27 10:02:06 <server> kernel: [95082.443666] Node 0 DMA32 per-cpu:

Oct 27 10:02:06 <server> kernel: [95082.443668] CPU    0: hi:  186, btch:  31 usd: 158

Oct 27 10:02:06 <server> kernel: [95082.443670] CPU    1: hi:  186, btch:  31 usd: 172

Oct 27 10:02:06 <server> kernel: [95082.443672] CPU    2: hi:  186, btch:  31 usd:  63

Oct 27 10:02:06 <server> kernel: [95082.443674] CPU    3: hi:  186, btch:  31 usd: 185

Oct 27 10:02:06 <server> kernel: [95082.443675] Node 0 Normal per-cpu:

Oct 27 10:02:06 <server> kernel: [95082.443677] CPU    0: hi:  186, btch:  31 usd: 192

Oct 27 10:02:06 <server> kernel: [95082.443679] CPU    1: hi:  186, btch:  31 usd: 159

Oct 27 10:02:06 <server> kernel: [95082.443680] CPU    2: hi:  186, btch:  31 usd:  66

Oct 27 10:02:06 <server> kernel: [95082.443682] CPU    3: hi:  186, btch:  31 usd: 180

Oct 27 10:02:06 <server> kernel: [95082.443687] active_anon:36074 inactive_anon:9148 isolated_anon:0

Oct 27 10:02:06 <server> kernel: [95082.443687]  active_file:533987 inactive_file:1177060 isolated_file:0

Oct 27 10:02:06 <server> kernel: [95082.443688]  unevictable:1517 dirty:4770 writeback:0 unstable:0

Oct 27 10:02:06 <server> kernel: [95082.443689]  free:10073 slab_reclaimable:251046 slab_unreclaimable:8685

Oct 27 10:02:06 <server> kernel: [95082.443690]  mapped:10507 shmem:77 pagetables:3683 bounce:0

Oct 27 10:02:06 <server> kernel: [95082.443692] Node 0 DMA free:15840kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15280kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes

Oct 27 10:02:06 <server> kernel: [95082.443701] lowmem_reserve[]: 0 2991 8041 8041

Oct 27 10:02:06 <server> kernel: [95082.443704] Node 0 DMA32 free:21788kB min:4264kB low:5328kB high:6396kB active_anon:16924kB inactive_anon:4576kB active_file:506692kB inactive_file:1980000kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3063264kB mlocked:0kB dirty:9788kB writeback:0kB mapped:380kB shmem:0kB slab_reclaimable:376116kB slab_unreclaimable:10168kB kernel_stack:312kB pagetables:2872kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Oct 27 10:02:06 <server> kernel: [95082.443714] lowmem_reserve[]: 0 0 5050 5050

Oct 27 10:02:06 <server> kernel: [95082.443717] Node 0 Normal free:2664kB min:7200kB low:9000kB high:10800kB active_anon:127372kB inactive_anon:32016kB active_file:1629256kB inactive_file:2728240kB unevictable:6068kB isolated(anon):0kB isolated(file):0kB present:5171200kB mlocked:6068kB dirty:9292kB writeback:0kB mapped:41648kB shmem:308kB slab_reclaimable:628068kB slab_unreclaimable:24556kB kernel_stack:2672kB pagetables:11860kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Oct 27 10:02:06 <server> kernel: [95082.443727] lowmem_reserve[]: 0 0 0 0

Oct 27 10:02:06 <server> kernel: [95082.443729] Node 0 DMA: 2*4kB 1*8kB 1*16kB 2*32kB 2*64kB 2*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15840kB

Oct 27 10:02:06 <server> kernel: [95082.443737] Node 0 DMA32: 314*4kB 8*8kB 341*16kB 33*32kB 86*64kB 53*128kB 5*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 21912kB

Oct 27 10:02:06 <server> kernel: [95082.443744] Node 0 Normal: 368*4kB 9*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 2664kB

Oct 27 10:02:06 <server> kernel: [95082.443751] 1707109 total pagecache pages

Oct 27 10:02:06 <server> kernel: [95082.443753] 0 pages in swap cache

Oct 27 10:02:06 <server> kernel: [95082.443754] Swap cache stats: add 0, delete 0, find 0/0

Oct 27 10:02:06 <server> kernel: [95082.443755] Free swap  = 11717624kB

Oct 27 10:02:06 <server> kernel: [95082.443757] Total swap = 11717624kB

Oct 27 10:02:06 <server> kernel: [95082.471457] 2097136 pages RAM

Oct 27 10:02:06 <server> kernel: [95082.471458] 49852 pages reserved

Oct 27 10:02:06 <server> kernel: [95082.471460] 1022202 pages shared

Oct 27 10:02:06 <server> kernel: [95082.471461] 1070407 pages non-shared

Oct 27 10:02:06 <server> kernel: [95082.471463] SLUB: Unable to allocate memory on node -1 (gfp=0x20)

Oct 27 10:02:06 <server> kernel: [95082.471466]   cache: kmalloc-4096, object size: 4096, buffer size: 4096, default order: 3, min order: 0

Oct 27 10:02:06 <server> kernel: [95082.471468]   node 0: slabs: 1182, objs: 3044, free: 0
```

Do I read this right that at network communication buffer allocation the kernel runs out of memory 

and drops the process?

I have atop monitoring this machine every ten minutes and here are the printouts from before and after this error: 

```
ATOP - <server>              2011/10/27  09:59:02              600 seconds elapsed

PRC | sys  36.15s | user  17.12s | #proc    257 | #zombie    0 | #exit    963 |

CPU | sys      6% | user      3% | irq       1% | idle    385% | wait      5% |

cpu | sys      3% | user      3% | irq       1% | idle     89% | cpu000 w  4% |

cpu | sys      2% | user      0% | irq       0% | idle     98% | cpu003 w  0% |

cpu | sys      1% | user      0% | irq       0% | idle     99% | cpu002 w  0% |

cpu | sys      1% | user      0% | irq       0% | idle     98% | cpu001 w  1% |

CPL | avg1   0.25 | avg5    0.26 | avg15   0.54 | csw  1309931 | intr 1282502 |

MEM | tot    7.8G | free  463.4M | cache   5.9G | buff  246.7M | slab    1.0G |

SWP | tot   11.2G | free   11.2G |              | vmcom 834.6M | vmlim  15.1G |

DSK |         sdb | busy     15% | read   12543 | write   2727 | avio    5 ms |

DSK |         sda | busy      0% | read     259 | write   4530 | avio    0 ms |

NET | transport   | tcpi  547521 | tcpo  409905 | udpi    2074 | udpo    1110 |

NET | network     | ipi   549273 | ipo   411027 | ipfrw      0 | deliv 549248 |

NET | eth0     0% | pcki  548716 | pcko  620165 | si 2971 Kbps | so 6149 Kbps |

NET | lo     ---- | pcki     858 | pcko     858 | si    3 Kbps | so    3 Kbps |

  PID  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK  ST EXC S  CPU CMD     1/47  

27213  10.91s   8.98s   272K   404K 54184K  1368K  --   - S   3% smbd

 1464  13.16s   0.36s     0K     0K     0K     0K  --   - S   2% gkrellmd

22360   6.67s   5.60s   152K   -24K 118.8M  1056K  --   - S   2% smbd
```

```
ATOP - <server>              2011/10/27  10:09:02              600 seconds elapsed

PRC | sys  56.75s | user   6.35s | #proc    258 | #zombie    0 | #exit   1530 |

CPU | sys      9% | user      1% | irq       5% | idle    342% | wait     43% |

cpu | sys      5% | user      1% | irq       5% | idle     64% | cpu000 w 26% |

cpu | sys      1% | user      0% | irq       0% | idle     97% | cpu003 w  1% |

cpu | sys      2% | user      0% | irq       0% | idle     87% | cpu001 w 11% |

cpu | sys      1% | user      0% | irq       0% | idle     95% | cpu002 w  4% |

CPL | avg1   0.16 | avg5    0.84 | avg15   0.77 | csw  4851725 | intr 4075693 |

MEM | tot    7.8G | free   56.4M | cache   6.9G | buff  226.4M | slab  440.5M |

SWP | tot   11.2G | free   11.2G |              | vmcom 836.2M | vmlim  15.1G |

PAG | scan 2019e3 | stall      0 |              | swin       0 | swout      0 |

DSK |         sdb | busy     40% | read   15233 | write 100967 | avio    2 ms |

DSK |         sda | busy      3% | read     874 | write   8171 | avio    2 ms |

NET | transport   | tcpi 7002578 | tcpo 3735999 | udpi    2212 | udpo    1335 |

NET | network     | ipi  7004629 | ipo  3737364 | ipfrw      0 | deliv 7005e3 |

NET | eth0    11% | pcki 7003715 | pcko 4866331 | si  111 Mbps | so   37 Mbps |

NET | lo     ---- | pcki    1185 | pcko    1185 | si    3 Kbps | so    3 Kbps |

  PID  SYSCPU  USRCPU  VGROW  RGROW  RDDSK  WRDSK  ST EXC S  CPU CMD     1/72  

22622  15.48s   2.26s     0K     0K    76K 418.8M  --   - S   3% apt-cacher-ng

 1464  13.87s   0.41s     0K     0K     0K     0K  --   - S   2% gkrellmd

 3089   3.36s   0.00s     0K     0K 217.6M   1.3G  --   - S   1% nfsd
```

As you might have noticed, this is not a Gentoo server, but in my experience 

the people in this forum can give you answers that others can't.   :Wink: 

The server has always had performance problems from time to time that were not

assignable to any events in the syslog. My gutt feeling tells me these problems 

have something in common. As this exact same software has worked for over 

half a year I fear this degradation is due to an increasing hardware problem of 

either network chip or RAM. 

I know of memtest to verify memory fitness - regardless the fact that I have to do 

this on a production server and therefore replace it temporarily.

Is there a test to verify NIC fitness? 

A little information on the machine: 

```
-> uname -a 

Linux lin71 2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 2011 x86_64 GNU/Linux

$ lspci 

00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22)

00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)

00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22)

00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22)

00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22)

00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22)

00:0d.0 Host bridge: Intel Corporation Device 343a (rev 22)

00:0d.1 Host bridge: Intel Corporation Device 343b (rev 22)

00:0d.2 Host bridge: Intel Corporation Device 343c (rev 22)

00:0d.3 Host bridge: Intel Corporation Device 343d (rev 22)

00:0d.4 Host bridge: Intel Corporation 5520/5500/X58 Physical Layer Port 0 (rev 22)

00:0d.5 Host bridge: Intel Corporation 5520/5500 Physical Layer Port 1 (rev 22)

00:0d.6 Host bridge: Intel Corporation Device 341a (rev 22)

00:0e.0 Host bridge: Intel Corporation Device 341c (rev 22)

00:0e.1 Host bridge: Intel Corporation Device 341d (rev 22)

00:0e.2 Host bridge: Intel Corporation Device 341e (rev 22)

00:0e.4 Host bridge: Intel Corporation Device 3439 (rev 22)

00:13.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 22)

00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 22)

00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22)

00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22)

00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 22)

00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)

00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)

00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)

00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)

00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)

00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)

00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)

00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22)

00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4

00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5

00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6

00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2

00:1c.0 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 1

00:1c.4 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 5

00:1c.5 PCI bridge: Intel Corporation 82801JI (ICH10 Family) PCI Express Root Port 6

00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1

00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2

00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3

00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90)

00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller

00:1f.2 IDE interface: Intel Corporation 82801JI (ICH10 Family) 4 port SATA IDE Controller #1

00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller

00:1f.5 IDE interface: Intel Corporation 82801JI (ICH10 Family) 2 port SATA IDE Controller #2

01:03.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a)

03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

07:00.0 RAID bus controller: 3ware Inc 9750 SAS2/SATA-II RAID PCIe (rev 05)

$ lspci -n 

00:00.0 0600: 8086:3406 (rev 22)

00:01.0 0604: 8086:3408 (rev 22)

00:03.0 0604: 8086:340a (rev 22)

00:05.0 0604: 8086:340c (rev 22)

00:07.0 0604: 8086:340e (rev 22)

00:09.0 0604: 8086:3410 (rev 22)

00:0d.0 0600: 8086:343a (rev 22)

00:0d.1 0600: 8086:343b (rev 22)

00:0d.2 0600: 8086:343c (rev 22)

00:0d.3 0600: 8086:343d (rev 22)

00:0d.4 0600: 8086:3418 (rev 22)

00:0d.5 0600: 8086:3419 (rev 22)

00:0d.6 0600: 8086:341a (rev 22)

00:0e.0 0600: 8086:341c (rev 22)

00:0e.1 0600: 8086:341d (rev 22)

00:0e.2 0600: 8086:341e (rev 22)

00:0e.4 0600: 8086:3439 (rev 22)

00:13.0 0800: 8086:342d (rev 22)

00:14.0 0800: 8086:342e (rev 22)

00:14.1 0800: 8086:3422 (rev 22)

00:14.2 0800: 8086:3423 (rev 22)

00:14.3 0800: 8086:3438 (rev 22)

00:16.0 0880: 8086:3430 (rev 22)

00:16.1 0880: 8086:3431 (rev 22)

00:16.2 0880: 8086:3432 (rev 22)

00:16.3 0880: 8086:3433 (rev 22)

00:16.4 0880: 8086:3429 (rev 22)

00:16.5 0880: 8086:342a (rev 22)

00:16.6 0880: 8086:342b (rev 22)

00:16.7 0880: 8086:342c (rev 22)

00:1a.0 0c03: 8086:3a37

00:1a.1 0c03: 8086:3a38

00:1a.2 0c03: 8086:3a39

00:1a.7 0c03: 8086:3a3c

00:1c.0 0604: 8086:3a40

00:1c.4 0604: 8086:3a48

00:1c.5 0604: 8086:3a4a

00:1d.0 0c03: 8086:3a34

00:1d.1 0c03: 8086:3a35

00:1d.2 0c03: 8086:3a36

00:1d.7 0c03: 8086:3a3a

00:1e.0 0604: 8086:244e (rev 90)

00:1f.0 0601: 8086:3a16

00:1f.2 0101: 8086:3a20

00:1f.3 0c05: 8086:3a30

00:1f.5 0101: 8086:3a26

01:03.0 0300: 102b:0532 (rev 0a)

03:00.0 0200: 8086:10d3

04:00.0 0200: 8086:10d3

07:00.0 0104: 13c1:1010 (rev 05)

```

I'm grateful for any hints on how to address this server problem. 

fangorn

Edit: Is it possible that MTU settings have something to do with this? I do not remember 

setting it up with jumbo frames, but I am not sure right now. Normally I keep the 

default mtu=1500, even if the Intel chips and Ciscos should be able to handle Jumbo Frames.

----------

## linuxtuxhellsinki

Hello,

I've seen this in multiple old servers with older kernels and with those the solution was to increase the memory buffers of network like in this LINK. You can test by echoing to /proc like "echo "4096 655360 6553600" > /proc/sys/net/ipv4/tcp_wmem"  and for permanent solution add 'em in /etc/sysctl.conf and run sysctl -p

```
net/core/rmem_max = 8738000

net/core/wmem_max = 6553600

net/ipv4/tcp_rmem = 8192 873800 8738000

net/ipv4/tcp_wmem = 4096 655360 6553600

vm/min_free_kbytes = 65536
```

But this was more like solution for some older 2.6.18 kernels or sth...

In some cases it was helping to turning off TCP TSO with  ethtool -K eth0 tso off

And I've seen also similar swapper errors with some servers which are using broadcom "tg3"-driver, which is not your case.

Check also this about NFS+SLUB ?

https://forums.gentoo.org/viewtopic-t-843865-start-0.html

Hope that you find some solution to your problem with these.

----------

## fangorn

Thanks a lot. 

I was hoping that someone of the pros has seen such a thing. 

After I have switched the production and the backup server last 

weekend I now have something to test. 

Thanks again. 

fangorn

----------

## Treborius

similar problem here, see https://forums.gentoo.org/viewtopic-t-899392.html

i get these messages : 

```

Oct 25 12:00:53 ponyslaystation kernel: swapper: page allocation failure. order:1, mode:0x20 

 Oct 25 12:00:53 ponyslaystation kernel: Pid: 0, comm: swapper Tainted: P        W   2.6.38-gentoo-r6-alix #9 

 Oct 25 12:00:53 ponyslaystation kernel: Call Trace: 

 Oct 25 12:00:53 ponyslaystation kernel: [<c106950d>] ? __alloc_pages_nodemask+0x4ad/0x650 

 Oct 25 12:00:53 ponyslaystation kernel: [<c108585e>] ? cache_alloc_refill+0x2ae/0x470 

 Oct 25 12:00:53 ponyslaystation kernel: [<c1085a92>] ? __kmalloc+0x72/0xa0 

 Oct 25 12:00:53 ponyslaystation kernel: [<c1229219>] ? __alloc_skb+0x49/0x100 

 Oct 25 12:00:53 ponyslaystation kernel: [<d00c504e>] ? ath_rxbuf_alloc+0x1e/0x80 [ath] 

 Oct 25 12:00:53 ponyslaystation kernel: [<cfff9095>] ? ath_rx_tasklet+0x615/0x15b0 [ath9k] 

 Oct 25 12:00:53 ponyslaystation kernel: [<c103ef51>] ? sched_clock_local.clone.1+0x41/0x170 

 Oct 25 12:00:53 ponyslaystation kernel: [<c10220bd>] ? enqueue_task_rt+0x1d/0x120 

 Oct 25 12:00:53 ponyslaystation kernel: [<c10221ea>] ? enqueue_task.clone.127+0x2a/0x60 

 Oct 25 12:00:53 ponyslaystation kernel: [<cfff7274>] ? ath9k_tasklet+0x64/0x120 [ath9k] 

 Oct 25 12:00:53 ponyslaystation kernel: [<c102aba9>] ? tasklet_action+0x39/0x70 

 Oct 25 12:00:53 ponyslaystation kernel: [<c102b09c>] ? __do_softirq+0x6c/0xd0 

 Oct 25 12:00:53 ponyslaystation kernel: [<c102b030>] ? __do_softirq+0x0/0xd0 

 Oct 25 12:00:53 ponyslaystation kernel: <IRQ>  [<c102b1c5>] ? irq_exit+0x65/0x70 

 Oct 25 12:00:53 ponyslaystation kernel: [<c1004215>] ? do_IRQ+0x35/0x90 

 Oct 25 12:00:53 ponyslaystation kernel: [<c1002ef0>] ? common_interrupt+0x30/0x40 

 Oct 25 12:00:53 ponyslaystation kernel: [<c100850c>] ? default_idle+0x2c/0x40 

 Oct 25 12:00:53 ponyslaystation kernel: [<c10015f4>] ? cpu_idle+0x74/0x90 

 Oct 25 12:00:53 ponyslaystation kernel: [<c138c5f0>] ? start_kernel+0x281/0x287 

 Oct 25 12:00:53 ponyslaystation kernel: Mem-Info: 

 Oct 25 12:00:53 ponyslaystation kernel: DMA per-cpu: 

 Oct 25 12:00:53 ponyslaystation kernel: CPU    0: hi:    0, btch:   1 usd:   0 

 Oct 25 12:00:53 ponyslaystation kernel: Normal per-cpu: 

 Oct 25 12:00:53 ponyslaystation kernel: CPU    0: hi:   90, btch:  15 usd:  84 

 Oct 25 12:00:53 ponyslaystation kernel: active_anon:11061 inactive_anon:12313 isolated_anon:0 

 Oct 25 12:00:53 ponyslaystation kernel: active_file:14113 inactive_file:14172 isolated_file:0 

 Oct 25 12:00:53 ponyslaystation kernel: unevictable:0 dirty:663 writeback:353 unstable:0 

 Oct 25 12:00:53 ponyslaystation kernel: free:985 slab_reclaimable:5358 slab_unreclaimable:2492 

 Oct 25 12:00:53 ponyslaystation kernel: mapped:3344 shmem:584 pagetables:406 bounce:0 

 Oct 25 12:00:53 ponyslaystation kernel: DMA free:1076kB min:124kB low:152kB high:184kB active_anon:876kB inactive_anon:2608kB active_file:4824kB inactive_file:5016kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15800kB mlocked:0kB di 

 rty:308kB writeback:80kB mapped:1304kB shmem:0kB slab_reclaimable:1344kB slab_unreclaimable:88kB kernel_stack:40kB pagetables:40kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no 

 Oct 25 12:00:53 ponyslaystation kernel: lowmem_reserve[]: 0 229 229 

 Oct 25 12:00:53 ponyslaystation kernel: Normal free:2864kB min:1876kB low:2344kB high:2812kB active_anon:43368kB inactive_anon:46644kB active_file:51628kB inactive_file:51672kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:235392kB mlocked:0kB dirty:2344kB writeback:1332kB mapped:12072kB shmem:2336kB slab_reclaimable:20088kB slab_unreclaimable:9880kB kernel_stack:704kB pagetables:1584kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:9 all_unreclaimable? no 

 Oct 25 12:00:53 ponyslaystation kernel: lowmem_reserve[]: 0 0 0 

 Oct 25 12:00:53 ponyslaystation kernel: DMA: 269*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1076kB 

 Oct 25 12:00:53 ponyslaystation kernel: Normal: 716*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2864kB 

 Oct 25 12:00:53 ponyslaystation kernel: 29744 total pagecache pages 

 Oct 25 12:00:53 ponyslaystation kernel: 874 pages in swap cache 

 Oct 25 12:00:53 ponyslaystation kernel: Swap cache stats: add 15150, delete 14276, find 8958/9845 

 Oct 25 12:00:53 ponyslaystation kernel: Free swap  = 1946376kB 

 Oct 25 12:00:53 ponyslaystation kernel: Total swap = 1959924kB 

 Oct 25 12:00:53 ponyslaystation kernel: 63392 pages RAM 

 Oct 25 12:00:53 ponyslaystation kernel: 1600 pages reserved 

 Oct 25 12:00:53 ponyslaystation kernel: 39078 pages shared 

 Oct 25 12:00:53 ponyslaystation kernel: 33126 pages non-shared 

 Oct 25 12:00:53 ponyslaystation kernel: skbuff alloc of size 3872 failed

```

----------

## fangorn

After switching from kernel 2.6.32 to 2.6.39 the servers now are running a week without error 

messages. I did not have the time for excessive testing so far. 

I did not change the network settings until now. But ultimately I want to go back to distribution 

standard kernel. So I will test with 2.6.32 later and probably need the network setting changes.

----------

