# page allocation failure (Proliant HP Gen8, 4.4.6-gentoo)

## atmosx

Hi,

I have an HP Proliant Gen8 microserver running Gentoo. I'm getting a page allocation failure consistently using the "4.4.6-gentoo". The system is stable, I don't have hangups or anything. The "dmesg" logs:

```
[1053460.047725] rdiff-backup: page allocation failure: order:2, mode:0x2204020

[1053460.047726] CPU: 0 PID: 57583 Comm: rdiff-backup Tainted: P           OE   4.4.6-gentoo #1

[1053460.047727] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 11/02/2015

[1053460.047728]  0000000000000000 ffff88040b403ae8 ffffffff8129bc12 0000000002204020

[1053460.047730]  0000000000000002 ffff88040b403b70 ffffffff811475db ffffffff81aef378

[1053460.047732]  ffff880400000060 ffffffff81094064 0220402000000000 0000000000000100

[1053460.047735] Call Trace:

[1053460.047735]  <IRQ>  [<ffffffff8129bc12>] dump_stack+0x67/0x95

[1053460.047739]  [<ffffffff811475db>] warn_alloc_failed+0xdb/0x130

[1053460.047741]  [<ffffffff81094064>] ? __wake_up+0x44/0x50

[1053460.047744]  [<ffffffff8114ae18>] __alloc_pages_nodemask+0x878/0xa20

[1053460.047746]  [<ffffffff810c7dff>] ? clockevents_program_event+0x7f/0x120

[1053460.047748]  [<ffffffff8118d571>] cache_alloc_refill+0x2f1/0x590

[1053460.047750]  [<ffffffff8118dc2f>] __kmalloc+0x1ef/0x230

[1053460.047753]  [<ffffffffa0d35821>] ? tg3_alloc_rx_data+0x71/0x260 [tg3]

[1053460.047756]  [<ffffffffa0d35821>] tg3_alloc_rx_data+0x71/0x260 [tg3]

[1053460.047760]  [<ffffffffa0d3cf93>] tg3_poll_work+0x633/0xf10 [tg3]

[1053460.047762]  [<ffffffff814e3248>] ? __netif_receive_skb+0x18/0x60

[1053460.047765]  [<ffffffffa0d3d8b6>] tg3_poll_msix+0x46/0x160 [tg3]

[1053460.047768]  [<ffffffff814e4853>] net_rx_action+0x1d3/0x330

[1053460.047770]  [<ffffffff8105d1aa>] __do_softirq+0x12a/0x2d0

[1053460.047772]  [<ffffffff8105d4aa>] irq_exit+0x8a/0xa0

[1053460.047774]  [<ffffffff815ec704>] do_IRQ+0x54/0xd0

[1053460.047777]  [<ffffffff815eab49>] common_interrupt+0x89/0x89

[1053460.047777]  <EOI> Mem-Info:

[1053460.047781] active_anon:53278 inactive_anon:53422 isolated_anon:0

                  active_file:346165 inactive_file:173692 isolated_file:0

                  unevictable:0 dirty:452 writeback:0 unstable:0

                  slab_reclaimable:880470 slab_unreclaimable:1893874

                  mapped:6400 shmem:233 pagetables:1627 bounce:0

                  free:78959 free_pcp:417 free_cma:0

[1053460.047787] DMA free:15884kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15968kB managed:15884kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes

[1053460.047788] lowmem_reserve[]: 0 3778 15967 15967

[1053460.047794] DMA32 free:123960kB min:15488kB low:19360kB high:23232kB active_anon:50996kB inactive_anon:51316kB active_file:269096kB inactive_file:139136kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3946384kB managed:3868760kB mlocked:0kB dirty:204kB writeback:0kB mapped:2764kB shmem:464kB slab_reclaimable:812872kB slab_unreclaimable:1850016kB kernel_stack:3024kB pagetables:1704kB unstable:0kB bounce:0kB free_pcp:868kB local_pcp:696kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

[1053460.047795] lowmem_reserve[]: 0 0 12189 12189

[1053460.047800] Normal free:175992kB min:49980kB low:62472kB high:74968kB active_anon:162116kB inactive_anon:162372kB active_file:1115564kB inactive_file:555632kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:12779516kB managed:12481760kB mlocked:0kB dirty:1604kB writeback:0kB mapped:22836kB shmem:468kB slab_reclaimable:2709008kB slab_unreclaimable:5725480kB kernel_stack:9712kB pagetables:4804kB unstable:0kB bounce:0kB free_pcp:800kB local_pcp:656kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

[1053460.047802] lowmem_reserve[]: 0 0 0 0

[1053460.047804] DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB

[1053460.047812] DMA32: 18972*4kB (UME) 6004*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 123920kB

[1053460.047818] Normal: 37239*4kB (UE) 3366*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 175884kB

[1053460.047825] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

[1053460.047826] 520513 total pagecache pages

[1053460.047827] 386 pages in swap cache

[1053460.047828] Swap cache stats: add 5302, delete 4916, find 202551/203330

[1053460.047829] Free swap  = 8372680kB

[1053460.047830] Total swap = 8387580kB

[1053460.047830] 4185467 pages RAM

[1053460.047831] 0 pages HighMem/MovableOnly

[1053460.047832] 93866 pages reserved

[1053460.047833] 0 pages hwpoisoned
```

Following an advise on SO I set vm.min_free_kbytes to 65536 which is more than twice the original size. That diminished considerably the frequency where page allocation failure would appear and basically, limited the failure to "fail2ban". I'm not running many processes so, this can't be an "out of memory" thing. This is a backup server, running a few dockerized services, a ZFS pool which used to cause the page allocation failure before updating the "min_free_kbytes" size. To give an idea, that's the mem usage on avg:

 *Quote:*   

> procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
> 
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
> 
>  0  0  15004 155552 1118828 4239840    0    0    98   543    7   15  2  0 96  2  0

 

Any idea why this happens? 

With the exception of a possible kernel upgrade, is there any possible workaround? Ideas, thoughts, etc. are welcomed!

Thanks

----------

## tholin

I think occasional page allocation failures are normal. The memory allocation failed when the kernel received a network packet. The packet will be dropped and retransmitted. It can happen when the kernel have to allocate a lot of pages all at once like when there is suddenly a lot of network traffic.

Is there a reason why you are running kernel 4.4.6? That's a lot of bug fixes in the memory management subsystem you are missing out on. 4.4.6 also got at least 100 security vulnerabilities fixed in later kernels.

Make sure your kernel is built with CONFIG_COMPACTION. Memory allocations can fail if memory is too fragmented and without compaction the kernel can't do anything about that. ZFS also complicates things because it use it's own pagecache. Reclaim of pagecache pages is tricky even without ZFS. Make sure you use the most recent version of ZFS.

----------

## bunder

if you're running zfs 0.6.5.x, you might consider upgrading to 0.7.0-rc3 or -rc4 as it has memory improvements.

----------

