# Keep running out of memory (solved)

## Bigun

Let me preface this whole post with this statement:  This has only become a problem in the last few months, before that, this system has ran like a top for the last 5 years or so.  With that, I'll move on.

For whatever reason, after about 7 to 18 days, the machine will start running out of memory and start killing off random processes to make RAM.  From what I can tell, swap is hardly even being touched.

```
# swapon

NAME       TYPE        SIZE USED PRIO

/dev/md125 partition 956.9M   1M   -1
```

I'd give you dmesg info, but I only got the command to work once before it started closing my SSH session when I ran it.  

Here's all the info I can give.

```
 # cat /proc/meminfo

MemTotal:        8230396 kB

MemFree:          137596 kB

MemAvailable:    7574380 kB

Buffers:               4 kB

Cached:          7261372 kB

SwapCached:           32 kB

Active:          6293844 kB

Inactive:         975316 kB

Active(anon):       7528 kB

Inactive(anon):     5904 kB

Active(file):    6286316 kB

Inactive(file):   969412 kB

Unevictable:           0 kB

Mlocked:               0 kB

HighTotal:       7401664 kB

HighFree:         128368 kB

LowTotal:         828732 kB

LowFree:            9228 kB

SwapTotal:        979836 kB

SwapFree:         978780 kB

Dirty:                 0 kB

Writeback:             0 kB

AnonPages:          7768 kB

Mapped:             6120 kB

Shmem:              5648 kB

Slab:             543296 kB

SReclaimable:     476480 kB

SUnreclaim:        66816 kB

KernelStack:        1304 kB

PageTables:          560 kB

NFS_Unstable:          0 kB

Bounce:                0 kB

WritebackTmp:          0 kB

CommitLimit:     5095032 kB

Committed_AS:      70112 kB

VmallocTotal:     122880 kB

VmallocUsed:           0 kB

VmallocChunk:          0 kB

HugePages_Total:       0

HugePages_Free:        0

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

DirectMap4k:       10232 kB

DirectMap2M:      901120 kB

```

```
# ps -A

  PID TTY          TIME CMD

    1 ?        00:00:04 init

    2 ?        00:00:00 kthreadd

    3 ?        00:00:23 ksoftirqd/0

    7 ?        00:01:03 rcu_sched

    8 ?        00:00:00 rcu_bh

    9 ?        00:00:00 migration/0

   10 ?        00:00:00 lru-add-drain

   11 ?        00:00:00 cpuhp/0

   12 ?        00:00:00 cpuhp/1

   13 ?        00:00:00 migration/1

   14 ?        00:00:00 ksoftirqd/1

   17 ?        00:00:00 kdevtmpfs

   18 ?        00:00:00 netns

  384 ?        00:00:00 oom_reaper

  385 ?        00:00:00 writeback

  387 ?        00:00:00 crypto

  388 ?        00:00:00 bioset

  390 ?        00:00:00 kblockd

  565 ?        00:00:00 ata_sff

  585 ?        00:00:00 md

  593 ?        00:00:00 cfg80211

  688 ?        00:00:00 rpciod

  689 ?        00:00:00 xprtiod

  701 ?        00:02:28 kswapd0

  723 ?        00:00:00 vmstat

  802 ?        00:00:00 nfsiod

  811 ?        00:00:00 cifsiod

  821 ?        00:00:00 bioset

  829 ?        00:00:00 xfsalloc

  830 ?        00:00:00 xfs_mru_cache

  897 ?        00:00:00 acpi_thermal_pm

  915 ?        00:00:00 i915/signal:0

  916 ?        00:00:00 i915/signal:1

  917 ?        00:00:00 i915/signal:2

  934 ?        00:00:00 bioset

  935 ?        00:00:00 bioset

  936 ?        00:00:00 bioset

  937 ?        00:00:00 bioset

  938 ?        00:00:00 bioset

  939 ?        00:00:00 bioset

  940 ?        00:00:00 bioset

  941 ?        00:00:00 bioset

  942 ?        00:00:00 bioset

  943 ?        00:00:00 bioset

  944 ?        00:00:00 bioset

  945 ?        00:00:00 bioset

  946 ?        00:00:00 bioset

  947 ?        00:00:00 bioset

  948 ?        00:00:00 bioset

  949 ?        00:00:00 bioset

  984 ?        00:00:00 bioset

  988 ?        00:00:00 bioset

  991 ?        00:00:00 bioset

  994 ?        00:00:00 bioset

  997 ?        00:00:00 bioset

 1000 ?        00:00:00 bioset

 1003 ?        00:00:00 bioset

 1006 ?        00:00:00 bioset

 1018 ?        00:00:00 iscsi_eh

 1041 ?        00:00:00 scsi_eh_0

 1042 ?        00:00:00 scsi_tmf_0

 1045 ?        00:00:00 scsi_eh_1

 1046 ?        00:00:00 scsi_tmf_1

 1049 ?        00:00:00 scsi_eh_2

 1050 ?        00:00:00 scsi_tmf_2

 1053 ?        00:00:00 scsi_eh_3

 1054 ?        00:00:00 scsi_tmf_3

 1057 ?        00:00:00 scsi_eh_4

 1058 ?        00:00:00 scsi_tmf_4

 1061 ?        00:00:00 scsi_eh_5

 1062 ?        00:00:00 scsi_tmf_5

 1066 ?        00:00:00 scsi_eh_6

 1068 ?        00:00:00 scsi_tmf_6

 1071 ?        00:00:00 scsi_eh_7

 1072 ?        00:00:00 scsi_tmf_7

 1146 ?        00:00:00 raid5wq

 1187 ?        00:00:00 bioset

 1195 ?        00:00:00 bioset

 1204 ?        00:00:00 bioset

 1212 ?        00:00:00 bioset

 1222 ?        00:00:08 kworker/1:2

 1232 ?        00:00:00 bioset

 1245 ?        00:00:00 bioset

 1257 ?        00:00:00 bioset

 1261 ?        00:00:00 bioset

 1265 ?        00:00:00 bioset

 1266 ?        00:00:17 md126_raid1

 1268 ?        00:00:00 bioset

 1272 ?        00:00:00 bioset

 1273 ?        00:00:00 md125_raid1

 1275 ?        00:00:00 bioset

 1279 ?        00:00:00 bioset

 1280 ?        00:00:00 md124_raid1

 1283 ?        00:00:00 xfs-buf/md126

 1284 ?        00:00:00 xfs-data/md126

 1285 ?        00:00:00 xfs-conv/md126

 1286 ?        00:00:00 xfs-cil/md126

 1287 ?        00:00:00 xfs-reclaim/md1

 1288 ?        00:00:00 xfs-log/md126

 1289 ?        00:00:00 xfs-eofblocks/m

 1290 ?        00:02:13 xfsaild/md126

 1698 ?        00:00:00 systemd-udevd

 2763 ?        00:00:00 syslog-ng

 2764 ?        00:00:01 syslog-ng

 2821 ?        00:00:00 rpcbind

 2847 ?        00:00:00 rpc.statd

 2897 ?        00:00:00 rpc.idmapd

 2928 ?        00:00:00 rpc.mountd

 2932 ?        00:00:00 nfsd4_callbacks

 2933 ?        00:00:00 lockd

 2935 ?        00:00:00 nfsd

 2936 ?        00:00:00 nfsd

 2937 ?        00:00:00 nfsd

 2938 ?        00:00:00 nfsd

 2939 ?        00:00:00 nfsd

 2940 ?        00:00:00 nfsd

 2941 ?        00:00:00 nfsd

 2942 ?        00:00:00 nfsd

 3081 ?        00:00:42 avahi-daemon

 3082 ?        00:00:00 avahi-daemon

 3166 ?        00:00:00 smartd

 3196 ?        00:00:00 sshd

 3221 ?        00:00:00 cron

 3252 tty1     00:00:00 agetty

 3253 tty2     00:00:00 agetty

 3254 tty3     00:00:00 agetty

 3255 tty4     00:00:00 agetty

 3256 tty5     00:00:00 agetty

 3257 tty6     00:00:00 agetty

 3456 ?        00:00:00 bioset

 3463 ?        00:00:00 bioset

 3464 ?        00:00:12 md127_raid5

 3482 ?        00:00:00 xfs-buf/md127

 3483 ?        00:00:00 xfs-data/md127

 3484 ?        00:00:00 xfs-conv/md127

 3485 ?        00:00:00 xfs-cil/md127

 3486 ?        00:00:00 xfs-reclaim/md1

 3487 ?        00:00:00 xfs-log/md127

 3488 ?        00:00:00 xfs-eofblocks/m

 3489 ?        00:00:05 xfsaild/md127

 3548 ?        00:02:14 cifsd

 9814 ?        00:00:24 kworker/0:0

16183 ?        00:00:00 kworker/1:1H

19416 ?        00:00:00 kworker/0:2H

19782 ?        00:00:00 kworker/0:2

19812 ?        00:00:00 kworker/1:0

19818 ?        00:00:00 kworker/0:1H

19819 ?        00:00:00 sshd

19825 ?        00:00:00 sshd

19830 pts/0    00:00:00 bash

19833 ?        00:00:00 kworker/u4:1

19859 ?        00:00:00 kworker/0:1

19861 pts/0    00:00:00 su

19864 pts/0    00:00:00 bash

19871 ?        00:00:00 kworker/0:0H

19873 pts/0    00:00:00 ps

23190 ?        00:00:00 kworker/1:0H

24797 ?        00:00:00 kworker/u4:0

```

```
# tail -n 100 /var/log/messages

Mar 30 05:56:31 projector sshd[19819]: pam_unix(sshd:session): session closed for user bigun

Mar 30 05:56:31 projector kernel: dmesg invoked oom-killer: gfp_mask=0x24040c0(GFP_KERNEL|__GFP_COMP), nodemask=0, order=2, oom_score_adj=0

Mar 30 05:56:31 projector kernel: COMPACTION is disabled!!!

Mar 30 05:56:31 projector kernel: dmesg cpuset=/ mems_allowed=0

Mar 30 05:56:31 projector kernel: CPU: 0 PID: 19874 Comm: dmesg Not tainted 4.9.6-gentoo-r1 #1

Mar 30 05:56:31 projector kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H61M-DS2H, BIOS F5 04/02/2012

Mar 30 05:56:31 projector kernel:  d526bcb0 c12ecd91 d526bd7c c18faa90 d526bcd8 c10e4e7b 00000000 f0ebc380

Mar 30 05:56:31 projector kernel:  00000206 d526bcd8 c12f0f42 f0b4b600 c18faa90 00000000 d526bcfc c10b5714

Mar 30 05:56:31 projector kernel:  c10b5566 00000000 00000000 d526bd7c d526bd7c c19ee700 c19ee9d4 d526bd18

Mar 30 05:56:31 projector kernel: Call Trace:

Mar 30 05:56:31 projector kernel:  [<c12ecd91>] dump_stack+0x47/0x5b

Mar 30 05:56:31 projector kernel:  [<c10e4e7b>] dump_header.isra.13+0x6e/0x179

Mar 30 05:56:31 projector kernel:  [<c12f0f42>] ? ___ratelimit+0xa1/0xab

Mar 30 05:56:31 projector kernel:  [<c10b5714>] oom_kill_process+0x66/0x2fe

Mar 30 05:56:31 projector kernel:  [<c10b5566>] ? oom_badness+0xc5/0xfc

Mar 30 05:56:31 projector kernel:  [<c10b5d01>] out_of_memory+0x254/0x28b

Mar 30 05:56:31 projector kernel:  [<c10b882f>] __alloc_pages_nodemask+0x828/0x8db

Mar 30 05:56:31 projector kernel:  [<c10c84e6>] kmalloc_order+0x16/0x28

Mar 30 05:56:31 projector kernel:  [<c106b164>] devkmsg_open+0x39/0xc6

Mar 30 05:56:31 projector kernel:  [<c1341fbb>] memory_open+0x48/0x4c

Mar 30 05:56:31 projector kernel:  [<c10eaa27>] chrdev_open+0x10c/0x12a

Mar 30 05:56:31 projector kernel:  [<c10e55ca>] do_dentry_open+0x193/0x272

Mar 30 05:56:31 projector kernel:  [<c10ea91b>] ? cdev_put+0x1a/0x1a

Mar 30 05:56:31 projector kernel:  [<c10e6388>] vfs_open+0x45/0x4e

Mar 30 05:56:31 projector kernel:  [<c10f2af1>] path_openat+0xae7/0xcb2

Mar 30 05:56:31 projector kernel:  [<c10f2ced>] do_filp_open+0x31/0x77

Mar 30 05:56:31 projector kernel:  [<c10fc3e1>] ? __alloc_fd+0x72/0x10f

Mar 30 05:56:31 projector kernel:  [<c10e6687>] do_sys_open+0x12d/0x1a6

Mar 30 05:56:31 projector kernel:  [<c10e6718>] SyS_open+0x18/0x1a

Mar 30 05:56:31 projector kernel:  [<c1001023>] do_fast_syscall_32+0x8b/0xf6

Mar 30 05:56:31 projector kernel:  [<c16cb6eb>] sysenter_past_esp+0x40/0x6a

Mar 30 05:56:31 projector kernel: Mem-Info:

Mar 30 05:56:31 projector kernel: active_anon:1888 inactive_anon:1476 isolated_anon:0\x0a active_file:1571579 inactive_file:242353 isolated_file:0\x0a unevictable:0 dirty:0 writeback:0 unstable:0\x0a slab_reclaimable:119120 slab_unreclaimable:16703\x0a mapped:1559 shmem:1412 pagetables:140 bounce:0\x0a free:34401 free_pcp:583 free_cma:0

Mar 30 05:56:31 projector kernel: Node 0 active_anon:7552kB inactive_anon:5904kB active_file:6286316kB inactive_file:969412kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:6236kB dirty:0kB writeback:0kB shmem:5648kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no

Mar 30 05:56:31 projector kernel: DMA free:3204kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15360kB mlocked:0kB slab_reclaimable:10372kB slab_unreclaimable:664kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

Mar 30 05:56:31 projector kernel: lowmem_reserve[]: 0 793 8021 8021

Mar 30 05:56:31 projector kernel: Normal free:6032kB min:3572kB low:4464kB high:5356kB active_anon:0kB inactive_anon:68kB active_file:4kB inactive_file:88kB unevictable:0kB writepending:0kB present:894968kB managed:813372kB mlocked:0kB slab_reclaimable:466108kB slab_unreclaimable:66148kB kernel_stack:1296kB pagetables:0kB bounce:0kB free_pcp:1280kB local_pcp:592kB free_cma:0kB

Mar 30 05:56:31 projector kernel: lowmem_reserve[]: 0 0 57825 57825

Mar 30 05:56:31 projector kernel: HighMem free:128368kB min:512kB low:8644kB high:16776kB active_anon:7552kB inactive_anon:5836kB active_file:6286312kB inactive_file:969324kB unevictable:0kB writepending:0kB present:7401664kB managed:7401664kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:560kB bounce:0kB free_pcp:1052kB local_pcp:580kB free_cma:0kB

Mar 30 05:56:31 projector kernel: lowmem_reserve[]: 0 0 0 0

Mar 30 05:56:31 projector kernel: DMA: 19*4kB (UE) 3*8kB (U) 2*16kB (UE) 2*32kB (UE) 1*64kB (U) 1*128kB (E) 7*256kB (UE) 2*512kB (U) 0*1024kB 0*2048kB 0*4096kB = 3204kB

Mar 30 05:56:31 projector kernel: Normal: 1488*4kB (UME) 9*8kB (ME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6024kB

Mar 30 05:56:31 projector kernel: HighMem: 11650*4kB (UM) 3061*8kB (M) 480*16kB (UM) 276*32kB (UM) 175*64kB (M) 63*128kB (UM) 16*256kB (M) 14*512kB (UM) 4*1024kB (M) 3*2048kB (UM) 0*4096kB = 128368kB

Mar 30 05:56:31 projector kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

Mar 30 05:56:31 projector kernel: 1815352 total pagecache pages

Mar 30 05:56:31 projector kernel: 8 pages in swap cache

Mar 30 05:56:31 projector kernel: Swap cache stats: add 784, delete 776, find 11/15

Mar 30 05:56:31 projector kernel: Free swap  = 978780kB

Mar 30 05:56:31 projector kernel: Total swap = 979836kB

Mar 30 05:56:31 projector kernel: 2078152 pages RAM

Mar 30 05:56:31 projector kernel: 1850416 pages HighMem/MovableOnly

Mar 30 05:56:31 projector kernel: 20553 pages reserved

Mar 30 05:56:31 projector kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name

Mar 30 05:56:31 projector kernel: [ 1698]     0  1698     2685      587       6       3        0         -1000 systemd-udevd

Mar 30 05:56:31 projector kernel: [ 2763]     0  2763     1703      103       7       3        0             0 syslog-ng

Mar 30 05:56:31 projector kernel: [ 2764]     0  2764     6470      546      10       3        0             0 syslog-ng

Mar 30 05:56:31 projector kernel: [ 2821]     0  2821      613       39       5       3        0             0 rpcbind

Mar 30 05:56:31 projector kernel: [ 2847]     0  2847      730      402       5       3        0             0 rpc.statd

Mar 30 05:56:31 projector kernel: [ 2897]     0  2897      692       43       5       3        0             0 rpc.idmapd

Mar 30 05:56:31 projector kernel: [ 2928]     0  2928      817      124       5       3        0             0 rpc.mountd

Mar 30 05:56:31 projector kernel: [ 3081]   104  3081      748      362       4       3        0             0 avahi-daemon

Mar 30 05:56:31 projector kernel: [ 3082]   104  3082      748       51       4       3        0             0 avahi-daemon

Mar 30 05:56:31 projector kernel: [ 3166]     0  3166     1098      128       5       3        0             0 smartd

Mar 30 05:56:31 projector kernel: [ 3196]     0  3196     1355      547       6       3        0         -1000 sshd

Mar 30 05:56:31 projector kernel: [ 3221]     0  3221      600      389       5       3        0             0 cron

Mar 30 05:56:31 projector kernel: [ 3252]     0  3252     1096      315       6       3        0             0 agetty

Mar 30 05:56:31 projector kernel: [ 3253]     0  3253     1096      298       6       3        0             0 agetty

Mar 30 05:56:31 projector kernel: [ 3254]     0  3254     1096      316       6       3        0             0 agetty

Mar 30 05:56:31 projector kernel: [ 3255]     0  3255     1096      294       6       3        0             0 agetty

Mar 30 05:56:31 projector kernel: [ 3256]     0  3256     1096      317       6       3        0             0 agetty

Mar 30 05:56:31 projector kernel: [ 3257]     0  3257     1096      314       6       3        0             0 agetty

Mar 30 05:56:31 projector kernel: [19819]     0 19819     2086     1121       7       3        0             0 sshd

Mar 30 05:56:31 projector kernel: [19825]  1000 19825    10412      981      10       3        0             0 sshd

Mar 30 05:56:31 projector kernel: [19830]  1000 19830      925      727       5       3        0             0 bash

Mar 30 05:56:31 projector kernel: [19861]  1000 19861      755      577       5       3        0             0 su

Mar 30 05:56:31 projector kernel: [19864]     0 19864      925      752       5       3        0             0 bash

Mar 30 05:56:31 projector kernel: [19874]     0 19874      563      169       5       3        0             0 dmesg

Mar 30 05:56:31 projector kernel: Out of memory: Kill process 19819 (sshd) score 0 or sacrifice child

Mar 30 05:56:31 projector kernel: Killed process 19825 (sshd) total-vm:41648kB, anon-rss:604kB, file-rss:3320kB, shmem-rss:0kB

Mar 30 05:56:31 projector su[19861]: pam_unix(su:session): session closed for user root

Mar 30 05:57:03 projector sshd[19877]: SSH: Server;Ltype: Version;Remote: 192.168.0.104-5081;Protocol: 2.0;Client: PuTTY_Release_0.66

Mar 30 05:57:03 projector sshd[19877]: SSH: Server;Ltype: Kex;Remote: 192.168.0.104-5081;Enc: aes256-ctr;MAC: hmac-sha2-256;Comp: none [preauth]

Mar 30 05:57:08 projector sshd[19877]: Accepted keyboard-interactive/pam for bigun from 192.168.0.104 port 5081 ssh2

Mar 30 05:57:08 projector sshd[19877]: pam_unix(sshd:session): session opened for user bigun by (uid=0)

Mar 30 05:57:08 projector sshd[19883]: SSH: Server;Ltype: Kex;Remote: 192.168.0.104-5081;Enc: aes256-ctr;MAC: hmac-sha2-256;Comp: none

Mar 30 05:57:23 projector su[19892]: Successful su for root by bigun

Mar 30 05:57:23 projector su[19892]: + /dev/pts/0 bigun:root

Mar 30 05:57:23 projector su[19892]: pam_unix(su:session): session opened for user root by bigun(uid=1000)

Mar 30 05:59:01 projector cron[19901]: (root) CMD (rm -f /var/spool/cron/lastrun/cron.hourly)

Mar 30 06:00:01 projector cron[19905]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons)

Mar 30 06:02:18 projector su[19892]: pam_unix(su:session): session closed for user root

Mar 30 06:03:16 projector su[19930]: Successful su for root by bigun

Mar 30 06:03:16 projector su[19930]: + /dev/pts/0 bigun:root

Mar 30 06:03:16 projector su[19930]: pam_unix(su:session): session opened for user root by bigun(uid=1000)

```

I don't even know where to start.  Going to reboot the server now to get it usable.

----------

## Roman_Gruber

my opinion

your kernel is outdated. 4.9.16 is stable for amd64.

+ experimental software like systemd + xfs

 *Quote:*   

> Mar 30 05:56:31 projector kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
> 
> Mar 30 05:56:31 projector kernel: [ 1698]     0  1698     2685      587       6       3        0         -1000 systemd-udevd 

 

not sure if i have read that correctly.

 total_vm => total virtual memory? => 2685 MB?  process systemd-udevd

--

if that was my box, i would use top regularly before it dies and check myself these processes with big memory usage.

have you checked /var/log/messages ?

 *Quote:*   

> Mar 30 05:56:31 projector kernel: [ 2763]     0  2763     1703      103       7       3        0             0 syslog-ng
> 
> Mar 30 05:56:31 projector kernel: [ 2764]     0  2764     6470      546      10       3        0             0 syslog-ng 

 

--

Easiest way to do something is first to update your kernel. than monitor with top regularly how the memory consumption is

--

There was a way to enalbe swap, swapon?, and also to lower the pirority of swap, so swap will be more used.

----------

## Bigun

 *Roman_Gruber wrote:*   

> my opinion
> 
> your kernel is outdated. 4.9.16 is stable for amd64.
> 
> + experimental software like systemd + xfs
> ...

 

I'll get the server on the latest kernel.

 *Roman_Gruber wrote:*   

> if that was my box, i would use top regularly before it dies and check myself these processes with big memory usage.
> 
> have you checked /var/log/messages ?
> 
>  *Quote:*   Mar 30 05:56:31 projector kernel: [ 2763]     0  2763     1703      103       7       3        0             0 syslog-ng
> ...

 

The last snippet of code in my OP was from /var/log/messages

 *Roman_Gruber wrote:*   

> There was a way to enalbe swap, swapon?, and also to lower the pirority of swap, so swap will be more used.

 

Swap is enabled AFAICT.  There's even a code snippet from the command in my OP.

----------

## Bigun

Also, this box is running a 32 bit kernel.

```
 # uname -a

Linux projector 4.9.6-gentoo-r1 #1 SMP Thu Feb 23 06:50:25 EST 2017 i686 Intel(R) Celeron(R) CPU G530 @ 2.40GHz GenuineIntel GNU/Linux

```

There isn't a stable 4.9.16 version available yet.

----------

## Roman_Gruber

Well i Know its very off topic

https://www.cnet.com/products/gigabyte-ga-h61m-ds2h-2-1-motherboard-micro-atx-lga1155-socket-h61/specs/

Any reason why you run 32bit on 64bit capable box?

It may be that you run in a not very well tested issue, bug. I assue most of the guys these days use amd64. Running the platform with more users may cause less issues in the long run => amd64, ext4, openrc / eudev

--

Only thing to ask is what you do with your server? Check regularly with top to see what causes this issue

Some software like google chrome have a memory hole since years ...

----------

## Bigun

 *Roman_Gruber wrote:*   

> Well i Know its very off topic
> 
> https://www.cnet.com/products/gigabyte-ga-h61m-ds2h-2-1-motherboard-micro-atx-lga1155-socket-h61/specs/
> 
> Any reason why you run 32bit on 64bit capable box?

 

The original thinking when I set up this box was "stability" (ironically enough)

But now-a-days, you're absolutely right.  If I ever redid the box, it would be 64-bit.

 *Roman_Gruber wrote:*   

> Only thing to ask is what you do with your server? Check regularly with top to see what causes this issue
> 
> Some software like google chrome have a memory hole since years ...

 

It's a headless box with no GUI components installed, so those kinds of things aren't a factor.

This box does five things:

1)  Host torrents via rtorrent

2)  Hosts a file share via Samba for workstations around the house

3)  Get's a daily backup of the webserver via unison

4)  Pushes offsite backup via SSH tunnel to a Raspberry Pi

5)  Plays media via Plex Media Server

----------

## Leio

I'd use something like https://github.com/pixelb/ps_mem to see where the memory goes and what is swapping (the -S option to it) things out (suggesting potential memory leaks or at least unused memory for a long while). I believe the ps_mem.py file can be used indepdently without dealing with the whole repo.

----------

## Bigun

 *Leio wrote:*   

> I'd use something like https://github.com/pixelb/ps_mem to see where the memory goes and what is swapping (the -S option to it) things out (suggesting potential memory leaks or at least unused memory for a long while). I believe the ps_mem.py file can be used indepdently without dealing with the whole repo.

 

Grabbed the script, time to monitor:

```
 # ~/ps_mem.py -S

 Private  +   Shared  =  RAM used   Swap used   Program

 96.0 KiB +  32.0 KiB = 128.0 KiB         0.0 KiB     0.0 KiB   sleep

120.0 KiB +  38.0 KiB = 158.0 KiB         0.0 KiB     0.0 KiB   init

172.0 KiB +  63.5 KiB = 235.5 KiB         0.0 KiB     0.0 KiB   cron

232.0 KiB + 119.0 KiB = 351.0 KiB         0.0 KiB     0.0 KiB   rpcbind

376.0 KiB + 141.0 KiB = 517.0 KiB         0.0 KiB     0.0 KiB   su

476.0 KiB +  48.5 KiB = 524.5 KiB         0.0 KiB     0.0 KiB   rpc.idmapd

184.0 KiB + 514.0 KiB = 698.0 KiB         0.0 KiB     0.0 KiB   start_pms (2)

696.0 KiB + 107.0 KiB = 803.0 KiB         0.0 KiB     0.0 KiB   rpc.mountd

688.0 KiB + 124.0 KiB = 812.0 KiB         0.0 KiB     0.0 KiB   rpc.statd

412.0 KiB + 551.5 KiB = 963.5 KiB         0.0 KiB     0.0 KiB   avahi-daemon [deleted] (2)

916.0 KiB + 129.0 KiB =   1.0 MiB         0.0 KiB     0.0 KiB   smartd

908.0 KiB + 164.0 KiB =   1.0 MiB         0.0 KiB     0.0 KiB   screen

908.0 KiB + 197.0 KiB =   1.1 MiB         0.0 KiB     0.0 KiB   pglcmd.wd

792.0 KiB + 422.0 KiB =   1.2 MiB         0.0 KiB     0.0 KiB   agetty (6)

700.0 KiB + 681.0 KiB =   1.3 MiB         0.0 KiB     0.0 KiB   bash (2)

  1.3 MiB + 399.5 KiB =   1.6 MiB         0.0 KiB     0.0 KiB   Plex Relay

  1.2 MiB + 564.5 KiB =   1.8 MiB         0.0 KiB     0.0 KiB   nmbd

  1.7 MiB + 120.0 KiB =   1.9 MiB         0.0 KiB     0.0 KiB   systemd-udevd

  1.5 MiB + 971.0 KiB =   2.5 MiB         0.0 KiB     0.0 KiB   syslog-ng (2)

  1.4 MiB +   2.2 MiB =   3.6 MiB         0.0 KiB     0.0 KiB   sshd (3)

  5.7 MiB +   1.6 MiB =   7.3 MiB         0.0 KiB     0.0 KiB   Plex Tuner Service

  2.9 MiB +   8.7 MiB =  11.5 MiB         0.0 KiB     0.0 KiB   smbd (4)

 10.6 MiB +   3.8 MiB =  14.3 MiB         0.0 KiB     0.0 KiB   Plex DLNA Server

 18.3 MiB + 153.5 KiB =  18.4 MiB         0.0 KiB     0.0 KiB   Plex Transcoder

 20.0 MiB +  60.5 KiB =  20.1 MiB         0.0 KiB     0.0 KiB   pgld

 22.6 MiB + 840.0 KiB =  23.5 MiB         0.0 KiB     0.0 KiB   rtorrent main

 64.2 MiB +   4.3 MiB =  68.5 MiB         0.0 KiB     0.0 KiB   Plex Media Server

125.3 MiB +   7.9 MiB = 133.2 MiB         0.0 KiB     0.0 KiB   Plex Script Host (7)

-------------------------------------------------------------

                        318.9 MiB         0.0 KiB     0.0 KiB

=============================================================

```

----------

## cboldt

I doubt this issue is the age of your kernel.  4.9 is plenty recent.

I see no reference in your dmesg that /dev/md125 is being used for swap.  This could be a kernel setting or an fstab issue

```
root@hypoid-2 [3] 1 /root # zgrep SWAP /proc/config.gz

CONFIG_SWAP=y

CONFIG_MEMCG_SWAP=y

# CONFIG_MEMCG_SWAP_ENABLED is not set

CONFIG_ARCH_USE_BUILTIN_BSWAP=y

CONFIG_FRONTSWAP=y

# CONFIG_ZSWAP is not set

# CONFIG_NFS_SWAP is not set

root@hypoid-2 [3] 2 /root # grep -i swap /var/log/kern.log

Mar  5 21:52:47 hypoid-2 kernel: Adding 8185112k swap on /dev/sda5.  Priority:-1 extents:1 across:8185112k SSFS

root@hypoid-2 [3] 3 /root # grep -i swap /etc/fstab

/dev/sda5               none            swap            sw                      0 3
```

Edit to add: I see the reference to /dev/md125 in your swapon output.  You and I are running the exact same kernel, 4.9.6-r1, and my kern.log and dmesg log report the use of swap quite differently compared with the report from your kernel.  On boot, my kernel reports "Adding swap on [device]"

Not to say your system or some app doesn't have a memory leak, but the "not swapping before crashing out with fully consumed memory" is a solvable problem.  Also, FWIW, I run a 64 bit kernel on a 32 bit system on a 64 bit machine.  It works fine.  It worked fine with a 32 bit kernel too.

Edit to add, I get a similar reference to the swap device in dmesg

```
[    4.418379] Adding 8388604k swap on /dev/sda5.  Priority:-1 extents:1 across:8388604k SSFS
```

Last edited by cboldt on Thu Mar 30, 2017 12:22 pm; edited 1 time in total

----------

## Bigun

 *cboldt wrote:*   

> I doubt this issue is the age of your kernel.  4.9 is plenty recent.
> 
> I see no reference in your dmesg that /dev/md125 is being used for swap.  This could be a kernel setting or an fstab issue
> 
> ```
> ...

 

```
 # cat /usr/src/linux/.config | grep SWAP

CONFIG_SWAP=y

CONFIG_ARCH_USE_BUILTIN_BSWAP=y

# CONFIG_FRONTSWAP is not set

# CONFIG_NFS_SWAP is not set

```

AFAIK, using the swapon command should show you the mounted swap partitions.  I always though as long as it was mounted as swap, it would be used.

And yeah, there may be a memory leak somewhere, but activating swap if it isn't should slow down the crashing process.

----------

## cboldt

Yes, swapon will show what is available.  I get similar report here, with `swapon` and `swapon -s`.  And in your case that command does refer to the device /dev/md125.

But your kernel boot log does not refer to the device at all, while my kernel boot log does, and reports adding it.

Could be just a difference in logging.  My swap is a normal partition.  Something is resisting paging out to SWAP, in favor of dropping information from RAM.

I don't know enough off the top of my head to say whether our kernel config differences re:SWAP have a role.

----------

## cboldt

What do you get on `stat /dev/md125`

For comparison purposes, here are the relevant specifications on my swap space, /dev/sda5

```
root@hypoid-2 [3] 8 /root # stat /dev/sda5

  File: '/dev/sda5'

  Size: 0               Blocks: 0          IO Block: 4096   block special file

Device: 6h/6d   Inode: 1162        Links: 1     Device type: 8,5

Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    6/    disk)
```

----------

## cboldt

For anybody curious, the (8 Gb) size of my swap space is only because /dev/sda5 on this machine was created with an eye to using it for hibernate space, if I ever get around to working on that project.  Other machines under my watchful eye have much less swap space, and swap space is barely touched on any machine, even after hundreds of days of uptime.

Edit to add, just to be the first to jump on "useless use of cat" -- you can grep the contents of .config directly

`grep SWAP /usr/src/linux/.config` works fine, and skips a pipeline.

----------

## cboldt

You aren't the only one, but this supposedly was fixed for 4.9 kernel

http://unix.stackexchange.com/questions/300106/why-is-the-oom-killer-killing-processes-when-swap-is-hardly-used

----------

## NeddySeagoon

Bigun,

swap can't be used for everything. Its only used for dynamically allocated RAM.

If something in RAM has a permanent home on your HDD, it won't be swapped to swap.

Dirty buffers will be written, so that the RAM they use can be freed.

Code or other read only data will be dropped. Next time its needed, you get a page fault and its reloaded form HDD.

It is swapping - just not using the swap space, which is why you should always have some swap.

When the kernel needs to invoke the OOM manager. Its done all of the above and has nowhere to go.

Its kill something or panic. That it choose sshd is unfortunate.

The usual cause in a memory leak, however, memory leaks come from dynamically allocated RAM that is not deallocated.

If turn, that means you can usually watch swap fill up before the OOM manager kicks in.

Your log shows

```
Mar 30 05:56:31 projector kernel: COMPACTION is disabled!!! 
```

Maybe you have lots of free RAM but its fragmented, so the kernel cannot get a contiguous allocation the size it needs. Its worth trying a kernel with the memory compaction option enabled.

----------

## Bigun

 *cboldt wrote:*   

> What do you get on `stat /dev/md125`
> 
> For comparison purposes, here are the relevant specifications on my swap space, /dev/sda5
> 
> ```
> ...

 

```
# stat /dev/md125

  File: '/dev/md125'

  Size: 0               Blocks: 0          IO Block: 4096   block special file

Device: 6h/6d   Inode: 147         Links: 1     Device type: 9,7d

Access: (0660/brw-rw----)  Uid: (    0/    root)   Gid: (    6/    disk)

Access: 2017-03-30 08:11:30.318037522 -0400

Modify: 2017-03-30 06:06:36.329000062 -0400

Change: 2017-03-30 06:06:36.329000062 -0400

 Birth: -

```

----------

## cboldt

New toys!  I never looked into the oom killer, but it is configurable.  You can protect sshd if you want ...

http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

In short, /proc/PID/oom_adj is available, and takes values from -16 (preserved) to +15 (vulnerable), and even recognizes a value of -17, which allows the protected app to commit suicide by bringing down the kernel.

----------

## Bigun

 *NeddySeagoon wrote:*   

> ...
> 
> The usual cause in a memory leak, however, memory leaks come from dynamically allocated RAM that is not deallocated.
> 
> If turn, that means you can usually watch swap fill up before the OOM manager kicks in.
> ...

 

Pleasure as always Neddy.

I'll be adding the option and recompiling.  Get back with you in weeks coming to see what happens.

----------

## Bigun

 *cboldt wrote:*   

> New toys!  I never looked into the oom killer, but it is configurable.  You can protect sshd if you want ...
> 
> http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html
> 
> In short, /proc/PID/oom_adj is available, and takes values from -16 (preserved) to +15 (vulnerable), and even recognizes a value of -17, which allows the protected app to commit suicide by bringing down the kernel.

 

If this keeps up, I'll be making that change post-haste.   :Wink: 

----------

## cboldt

```
CONFIG_COMPACTION:

  │  

  │ Compaction is the only memory management component to form  

  │ high order (larger physically contiguous) memory blocks

  │ reliably. The page allocator relies on compaction heavily and

  │ the lack of the feature can lead to unexpected OOM killer

  │ invocations for high order memory requests. You shouldn't

  │ disable this option unless there really is a strong reason for

  │ it and then we would be really interested to hear about that at

  │ linux-mm@kvack.org.

  │

  │ Symbol: COMPACTION [=y]  

  │ Type  : boolean

  │ Prompt: Allow for memory compaction

  │   Location:

  │     -> Processor type and features

  │   Defined at mm/Kconfig:259

  │   Depends on: MMU [=y]

  │   Selects: MIGRATION [=y]

  │   Selected by: TRANSPARENT_HUGEPAGE [=y] && HAVE_ARCH_TRANSPARENT_HUGEPAGE [=y]
```

I never would have caught that.  Neddy to the rescue ... again!

----------

## Bigun

 *cboldt wrote:*   

> 
> 
> ```
> CONFIG_COMPACTION:
> 
> ...

 

Honestly, as useful a feature as that is, you would think it would be suggested to be turned on in the Gentoo handbook.

----------

## cboldt

I suspect compaction is "naturally" turned on by other selections, or kernel default, but maybe not.  In my case, it is turned on by other configuration options.  It (compaction) has the -*- indicator next to it.  It can't be toggled directly.

The kernel has tons of "weird" options and pitfalls.  I chased "why does my system show a loadavg close to 1.0 with no CPU cycles?" for a few days, and found quite by accident that the loadavg calculation is greatly affected by high precision timers.  I bet there are literally hundreds of interactions like that, and yours with oom-killer.  It's tough enough to get the average computer user into a working system without scaring the crap out of them with so many issues and options.

----------

## Bigun

 *cboldt wrote:*   

> I suspect compaction is "naturally" turned on by other selections, or kernel default, but maybe not.  In my case, it is turned on by other configuration options.  It (compaction) has the -*- indicator next to it.  It can't be toggled directly.
> 
> The kernel has tons of "weird" options and pitfalls.  I chased "why does my system show a loadavg close to 1.0 with no CPU cycles?" for a few days, and found quite by accident that the loadavg calculation is greatly affected by high precision timers.  I bet there are literally hundreds of interactions like that, and yours with oom-killer.  It's tough enough to get the average computer user into a working system without scaring the crap out of them with so many issues and options.

 

My TRANSPARENT_HUGEPAGE wasn't on, so it was never turned on.  But if the Gentoo team was worried about overwhelming the user, then that ship sailed 12 pages into the handhook, what's one more line?

----------

## NeddySeagoon

Bigun,

You have the choice of forcing COMPACTION on by choosing 

```
Selected by: TRANSPARENT_HUGEPAGE [=y] && HAVE_ARCH_TRANSPARENT_HUGEPAGE [=y]
```

or selecting it manually.

Actually, HAVE_ARCH_TRANSPARENT_HUGEPAGE will be a hidden symbol forced by your choice of ARCH. The default setting for ARCH is defined in the environment. The only time you would change it is if you were cross compiling a kernel.

----------

## cboldt

The kernel default is CONFIG_COMPACTION=y (just tested with `make defconfig`)

Some kernel `make` options turn comapaction off.  `make tinyconfig` results in "# CONFIG_COMPACTION is not set"

I don't know what genkernel does, or what the "Gentoo helpers" toggle does vs. COMPACTION

I think Bigun has a good point.  If COMPACTION isn't the most likely outcome due to other causes (like genkernel, or start with default kernel, of "on" via the Gentoo switch, etc.), then a mention in the kernel config part of the handbook ought to be created.

----------

## Jaglover

Turning COMPACTION on will not remove the cause of original issue ...

----------

## Bigun

So it's happening again:

```
 # ~/ps_mem.py  -S

 Private  +   Shared  =  RAM used   Swap used   Program

100.0 KiB +  76.0 KiB = 176.0 KiB         0.0 KiB     0.0 KiB   init

188.0 KiB +  80.5 KiB = 268.5 KiB         0.0 KiB     0.0 KiB   rpc.idmapd

168.0 KiB + 110.5 KiB = 278.5 KiB         0.0 KiB     0.0 KiB   cron

196.0 KiB +  88.0 KiB = 284.0 KiB         0.0 KiB     0.0 KiB   rpcbind

492.0 KiB +  74.0 KiB = 566.0 KiB         0.0 KiB     0.0 KiB   rpc.mountd

376.0 KiB + 211.0 KiB = 587.0 KiB         0.0 KiB     0.0 KiB   su

592.0 KiB + 102.0 KiB = 694.0 KiB         0.0 KiB     0.0 KiB   rpc.statd

728.0 KiB +  70.0 KiB = 798.0 KiB         0.0 KiB     0.0 KiB   smartd

  1.1 MiB + 147.0 KiB =   1.3 MiB         0.0 KiB     0.0 KiB   systemd-udevd

784.0 KiB + 578.0 KiB =   1.3 MiB         0.0 KiB     0.0 KiB   agetty (6)

664.0 KiB +   1.3 MiB =   1.9 MiB         0.0 KiB     0.0 KiB   bash (2)

  1.4 MiB + 636.0 KiB =   2.0 MiB         0.0 KiB     0.0 KiB   syslog-ng (2)

  1.9 MiB +   2.4 MiB =   4.2 MiB         0.0 KiB     0.0 KiB   sshd (3)

-------------------------------------------------------------

                         14.3 MiB         0.0 KiB     0.0 KiB

=============================================================

```

```
top - 10:19:06 up 2 days, 40 min,  1 user,  load average: 0.34, 0.36, 0.36

Tasks: 158 total,   1 running, 157 sleeping,   0 stopped,   0 zombie

%Cpu(s):  3.4 us,  1.8 sy,  1.3 ni, 89.0 id,  4.4 wa,  0.0 hi,  0.1 si,  0.0 st

KiB Mem :  8230380 total,    47512 free,   718668 used,  7464200 buff/cache

KiB Swap:   979836 total,   979624 free,      212 used.  7215344 avail Mem

```

```
 # ps -A

  PID TTY          TIME CMD

    1 ?        00:00:01 init

    2 ?        00:00:00 kthreadd

    3 ?        00:00:18 ksoftirqd/0

    7 ?        00:00:14 rcu_sched

    8 ?        00:00:00 rcu_bh

    9 ?        00:00:00 migration/0

   10 ?        00:00:00 lru-add-drain

   11 ?        00:00:00 cpuhp/0

   12 ?        00:00:00 cpuhp/1

   13 ?        00:00:00 migration/1

   14 ?        00:00:00 ksoftirqd/1

   17 ?        00:00:00 kdevtmpfs

   18 ?        00:00:00 netns

  384 ?        00:00:00 oom_reaper

  385 ?        00:00:00 writeback

  387 ?        00:00:00 kcompactd0

  388 ?        00:00:00 crypto

  389 ?        00:00:00 bioset

  391 ?        00:00:00 kblockd

  566 ?        00:00:00 ata_sff

  586 ?        00:00:00 md

  595 ?        00:00:00 cfg80211

  689 ?        00:00:00 rpciod

  690 ?        00:00:00 xprtiod

  723 ?        00:05:36 kswapd0

  724 ?        00:00:00 vmstat

  803 ?        00:00:00 nfsiod

  812 ?        00:00:00 cifsiod

  822 ?        00:00:00 bioset

  830 ?        00:00:00 xfsalloc

  831 ?        00:00:00 xfs_mru_cache

  898 ?        00:00:00 acpi_thermal_pm

  916 ?        00:00:00 i915/signal:0

  917 ?        00:00:00 i915/signal:1

  918 ?        00:00:00 i915/signal:2

  935 ?        00:00:00 bioset

  936 ?        00:00:00 bioset

  937 ?        00:00:00 bioset

  938 ?        00:00:00 bioset

  939 ?        00:00:00 bioset

  940 ?        00:00:00 bioset

  941 ?        00:00:00 bioset

  942 ?        00:00:00 bioset

  943 ?        00:00:00 bioset

  944 ?        00:00:00 bioset

  945 ?        00:00:00 bioset

  946 ?        00:00:00 bioset

  947 ?        00:00:00 bioset

  948 ?        00:00:00 bioset

  949 ?        00:00:00 bioset

  950 ?        00:00:00 bioset

  985 ?        00:00:00 bioset

  988 ?        00:00:00 bioset

  991 ?        00:00:00 bioset

  994 ?        00:00:00 bioset

  997 ?        00:00:00 bioset

 1000 ?        00:00:00 bioset

 1003 ?        00:00:00 bioset

 1007 ?        00:00:00 bioset

 1009 ?        00:00:00 iscsi_eh

 1042 ?        00:00:00 scsi_eh_0

 1043 ?        00:00:00 scsi_tmf_0

 1046 ?        00:00:00 scsi_eh_1

 1047 ?        00:00:00 scsi_tmf_1

 1050 ?        00:00:00 scsi_eh_2

 1051 ?        00:00:00 scsi_tmf_2

 1052 ?        00:00:00 scsi_eh_3

 1055 ?        00:00:00 scsi_tmf_3

 1058 ?        00:00:00 scsi_eh_4

 1059 ?        00:00:00 scsi_tmf_4

 1062 ?        00:00:00 scsi_eh_5

 1063 ?        00:00:00 scsi_tmf_5

 1066 ?        00:00:00 scsi_eh_6

 1068 ?        00:00:00 scsi_tmf_6

 1071 ?        00:00:00 scsi_eh_7

 1073 ?        00:00:00 scsi_tmf_7

 1112 ?        00:00:09 kworker/1:0H

 1147 ?        00:00:00 raid5wq

 1188 ?        00:00:00 bioset

 1196 ?        00:00:00 bioset

 1205 ?        00:00:00 bioset

 1213 ?        00:00:00 bioset

 1233 ?        00:00:00 bioset

 1246 ?        00:00:00 bioset

 1258 ?        00:00:00 bioset

 1262 ?        00:00:00 bioset

 1266 ?        00:00:00 bioset

 1267 ?        00:01:40 md126_raid1

 1269 ?        00:00:00 bioset

 1273 ?        00:00:00 bioset

 1274 ?        00:00:01 md125_raid1

 1276 ?        00:00:00 bioset

 1280 ?        00:00:00 bioset

 1281 ?        00:00:00 md124_raid1

 1284 ?        00:00:00 xfs-buf/md126

 1285 ?        00:00:00 xfs-data/md126

 1286 ?        00:00:00 xfs-conv/md126

 1287 ?        00:00:00 xfs-cil/md126

 1288 ?        00:00:00 xfs-reclaim/md1

 1289 ?        00:00:00 xfs-log/md126

 1290 ?        00:00:00 xfs-eofblocks/m

 1291 ?        00:00:27 xfsaild/md126

 1695 ?        00:00:00 systemd-udevd

 1860 ?        00:00:00 bioset

 1870 ?        00:00:00 bioset

 1871 ?        00:55:37 md127_raid5

 2062 ?        00:00:00 xfs-buf/md127

 2063 ?        00:00:00 xfs-data/md127

 2064 ?        00:00:00 xfs-conv/md127

 2065 ?        00:00:00 xfs-cil/md127

 2066 ?        00:00:00 xfs-reclaim/md1

 2067 ?        00:00:00 xfs-log/md127

 2068 ?        00:00:00 xfs-eofblocks/m

 2069 ?        00:00:03 xfsaild/md127

 2771 ?        00:00:00 syslog-ng

 2772 ?        00:00:00 syslog-ng

 2829 ?        00:00:00 rpcbind

 2855 ?        00:00:00 rpc.statd

 2905 ?        00:00:00 rpc.idmapd

 2936 ?        00:00:00 rpc.mountd

 2940 ?        00:00:00 nfsd4_callbacks

 2941 ?        00:00:00 lockd

 2943 ?        00:00:00 nfsd

 2944 ?        00:00:00 nfsd

 2945 ?        00:00:00 nfsd

 2946 ?        00:00:00 nfsd

 2947 ?        00:00:00 nfsd

 2948 ?        00:00:00 nfsd

 2949 ?        00:00:00 nfsd

 2950 ?        00:00:00 nfsd

 3153 ?        00:00:00 smartd

 3183 ?        00:00:00 sshd

 3208 ?        00:00:00 cron

 3239 tty1     00:00:00 agetty

 3240 tty2     00:00:00 agetty

 3241 tty3     00:00:00 agetty

 3242 tty4     00:00:00 agetty

 3243 tty5     00:00:00 agetty

 3244 tty6     00:00:00 agetty

 3483 ?        00:00:45 cifsd

 8959 ?        00:00:10 kworker/0:2

14053 ?        00:00:04 kworker/1:2

14067 ?        00:00:00 kworker/1:1H

14357 ?        00:00:00 kworker/u4:1

14960 ?        00:00:00 kworker/u4:2

29191 ?        00:00:00 kworker/0:1

29838 ?        00:00:00 kworker/0:0H

29860 ?        00:00:00 kworker/1:1

29861 ?        00:00:00 kworker/0:2H

29895 ?        00:00:00 kworker/1:0

29897 ?        00:00:00 kworker/u4:0

29898 ?        00:00:00 kworker/0:1H

29905 ?        00:00:00 sshd

29911 ?        00:00:00 sshd

29916 pts/0    00:00:00 bash

29970 ?        00:00:00 kworker/0:0

29976 pts/0    00:00:00 su

29979 pts/0    00:00:00 bash

29982 pts/0    00:00:00 ps

```

```
 # tail -n 100 /var/log/messages

Apr  1 10:21:54 projector kernel: [ 3153]     0  3153     1098      128       7       3        0             0 smartd

Apr  1 10:21:54 projector kernel: [ 3183]     0  3183     1355      580       6       3        0         -1000 sshd

Apr  1 10:21:54 projector kernel: [ 3208]     0  3208      600      452       5       3        0             0 cron

Apr  1 10:21:54 projector kernel: [ 3239]     0  3239     1096      378       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3240]     0  3240     1096      351       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3241]     0  3241     1096      370       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3242]     0  3242     1096      359       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3243]     0  3243     1096      351       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3244]     0  3244     1096      349       5       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [29905]     0 29905     2086     1172       7       3        0             0 sshd

Apr  1 10:21:54 projector kernel: [29911]  1000 29911    10412     1017      11       3        0             0 sshd

Apr  1 10:21:54 projector kernel: [29916]  1000 29916      925      741       4       3        0             0 bash

Apr  1 10:21:54 projector kernel: [29976]  1000 29976      755      578       5       3        0             0 su

Apr  1 10:21:54 projector kernel: [29979]     0 29979      925      761       6       3        0             0 bash

Apr  1 10:21:54 projector kernel: [29984]     0 29984      563      184       5       3        0             0 dmesg

Apr  1 10:21:54 projector kernel: Out of memory: Kill process 29905 (sshd) score 0 or sacrifice child

Apr  1 10:21:54 projector kernel: Killed process 29911 (sshd) total-vm:41648kB, anon-rss:736kB, file-rss:3332kB, shmem-rss:0kB

Apr  1 10:21:54 projector kernel: dmesg invoked oom-killer: gfp_mask=0x24040c0(GFP_KERNEL|__GFP_COMP), nodemask=0, order=2, oom_score_adj=0

Apr  1 10:21:54 projector kernel: dmesg cpuset=/ mems_allowed=0

Apr  1 10:21:54 projector kernel: CPU: 0 PID: 29984 Comm: dmesg Not tainted 4.9.6-gentoo-r1 #2

Apr  1 10:21:54 projector kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H61M-DS2H, BIOS F5 04/02/2012

Apr  1 10:21:54 projector kernel:  d5e75ca4 c12f0831 d5e75d7c c18ff2c4 d5e75ccc c10e86d5 00000000 d4c95900

Apr  1 10:21:54 projector kernel:  00200206 d5e75ccc c12f49e2 f0f40900 c18ff2c4 00000000 d5e75cf0 c10b571b

Apr  1 10:21:54 projector kernel:  c10b556d 00000000 00000000 d5e75d7c d5e75d7c c19f2700 c19f29d4 d5e75d0c

Apr  1 10:21:54 projector kernel: Call Trace:

Apr  1 10:21:54 projector kernel:  [<c12f0831>] dump_stack+0x47/0x5b

Apr  1 10:21:54 projector kernel:  [<c10e86d5>] dump_header.isra.13+0x5d/0x168

Apr  1 10:21:54 projector kernel:  [<c12f49e2>] ? ___ratelimit+0xa1/0xab

Apr  1 10:21:54 projector kernel:  [<c10b571b>] oom_kill_process+0x66/0x2fe

Apr  1 10:21:54 projector kernel:  [<c10b556d>] ? oom_badness+0xc5/0xfc

Apr  1 10:21:54 projector kernel:  [<c10b5d08>] out_of_memory+0x254/0x28b

Apr  1 10:21:54 projector kernel:  [<c10b89d3>] __alloc_pages_nodemask+0x91f/0x9c7

Apr  1 10:21:54 projector kernel:  [<c12a9600>] ? avc_has_perm_noaudit+0x1/0x83

Apr  1 10:21:54 projector kernel:  [<c10c8a25>] kmalloc_order+0x16/0x28

Apr  1 10:21:54 projector kernel:  [<c106b16b>] devkmsg_open+0x39/0xc6

Apr  1 10:21:54 projector kernel:  [<c1345a5b>] memory_open+0x48/0x4c

Apr  1 10:21:54 projector kernel:  [<c10ee281>] chrdev_open+0x10c/0x12a

Apr  1 10:21:54 projector kernel:  [<c10e8e24>] do_dentry_open+0x193/0x272

Apr  1 10:21:54 projector kernel:  [<c10ee175>] ? cdev_put+0x1a/0x1a

Apr  1 10:21:54 projector kernel:  [<c10e9be2>] vfs_open+0x45/0x4e

Apr  1 10:21:54 projector kernel:  [<c10f634b>] path_openat+0xae7/0xcb2

Apr  1 10:21:54 projector kernel:  [<c10f6547>] do_filp_open+0x31/0x77

Apr  1 10:21:54 projector kernel:  [<c10ffc3b>] ? __alloc_fd+0x72/0x10f

Apr  1 10:21:54 projector kernel:  [<c10e9ee1>] do_sys_open+0x12d/0x1a6

Apr  1 10:21:54 projector kernel:  [<c10e9f72>] SyS_open+0x18/0x1a

Apr  1 10:21:54 projector kernel:  [<c1001023>] do_fast_syscall_32+0x8b/0xf6

Apr  1 10:21:54 projector kernel:  [<c16cf1ab>] sysenter_past_esp+0x40/0x6a

Apr  1 10:21:54 projector kernel: Mem-Info:

Apr  1 10:21:54 projector kernel: active_anon:354 inactive_anon:1532 isolated_anon:0\x0a active_file:773116 inactive_file:1064369 isolated_file:0\x0a unevictable:0 dirty:0 writeback:0 unstable:0\x0a slab_reclaimable:28383 slab_unreclaimable:13352\x0a mapped:1283 shmem:234 pagetables:134 bounce:0\x0a free:12096 free_pcp:528 free_cma:0

Apr  1 10:21:54 projector kernel: Node 0 active_anon:1416kB inactive_anon:6128kB active_file:3092464kB inactive_file:4257476kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:5132kB dirty:0kB writeback:0kB shmem:936kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no

Apr  1 10:21:54 projector kernel: DMA free:3212kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15360kB mlocked:0kB slab_reclaimable:92kB slab_unreclaimable:1648kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB

Apr  1 10:21:54 projector kernel: lowmem_reserve[]: 0 793 8021 8021

Apr  1 10:21:54 projector kernel: Normal free:5148kB min:3572kB low:4464kB high:5356kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:20kB unevictable:0kB writepending:0kB present:894968kB managed:813356kB mlocked:0kB slab_reclaimable:113440kB slab_unreclaimable:51760kB kernel_stack:1264kB pagetables:0kB bounce:0kB free_pcp:852kB local_pcp:144kB free_cma:0kB

Apr  1 10:21:54 projector kernel: lowmem_reserve[]: 0 0 57825 57825

Apr  1 10:21:54 projector kernel: HighMem free:40024kB min:512kB low:8644kB high:16776kB active_anon:1416kB inactive_anon:6128kB active_file:3092464kB inactive_file:4257456kB unevictable:0kB writepending:0kB present:7401664kB managed:7401664kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:536kB bounce:0kB free_pcp:1260kB local_pcp:648kB free_cma:0kB

Apr  1 10:21:54 projector kernel: lowmem_reserve[]: 0 0 0 0

Apr  1 10:21:54 projector kernel: DMA: 11*4kB (UE) 6*8kB (UE) 3*16kB (E) 6*32kB (UE) 1*64kB (U) 4*128kB (UE) 3*256kB (UE) 1*512kB (U) 1*1024kB (E) 0*2048kB 0*4096kB = 3212kB

Apr  1 10:21:54 projector kernel: Normal: 897*4kB (ME) 195*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 5148kB

Apr  1 10:21:54 projector kernel: HighMem: 2744*4kB (UM) 2351*8kB (UM) 69*16kB (UM) 19*32kB (UM) 4*64kB (UM) 3*128kB (UM) 3*256kB (UM) 2*512kB (U) 4*1024kB (M) 1*2048kB (M) 0*4096kB = 40072kB

Apr  1 10:21:54 projector kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB

Apr  1 10:21:54 projector kernel: 1837719 total pagecache pages

Apr  1 10:21:54 projector kernel: 0 pages in swap cache

Apr  1 10:21:54 projector kernel: Swap cache stats: add 462, delete 462, find 9/9

Apr  1 10:21:54 projector kernel: Free swap  = 979624kB

Apr  1 10:21:54 projector kernel: Total swap = 979836kB

Apr  1 10:21:54 projector kernel: 2078152 pages RAM

Apr  1 10:21:54 projector kernel: 1850416 pages HighMem/MovableOnly

Apr  1 10:21:54 projector kernel: 20557 pages reserved

Apr  1 10:21:54 projector kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name

Apr  1 10:21:54 projector kernel: [ 1695]     0  1695     2740      730       6       3        0         -1000 systemd-udevd

Apr  1 10:21:54 projector kernel: [ 2771]     0  2771     1703      103       7       3        0             0 syslog-ng

Apr  1 10:21:54 projector kernel: [ 2772]     0  2772     6470      731      10       3        0             0 syslog-ng

Apr  1 10:21:54 projector kernel: [ 2829]     0  2829      613       39       5       3        0             0 rpcbind

Apr  1 10:21:54 projector kernel: [ 2855]     0  2855      730      454       5       3        0             0 rpc.statd

Apr  1 10:21:54 projector kernel: [ 2905]     0  2905      692       45       5       3        0             0 rpc.idmapd

Apr  1 10:21:54 projector kernel: [ 2936]     0  2936      817      123       5       3        0             0 rpc.mountd

Apr  1 10:21:54 projector kernel: [ 3153]     0  3153     1098      128       7       3        0             0 smartd

Apr  1 10:21:54 projector kernel: [ 3183]     0  3183     1355      580       6       3        0         -1000 sshd

Apr  1 10:21:54 projector kernel: [ 3208]     0  3208      600      452       5       3        0             0 cron

Apr  1 10:21:54 projector kernel: [ 3239]     0  3239     1096      378       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3240]     0  3240     1096      351       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3241]     0  3241     1096      370       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3242]     0  3242     1096      359       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3243]     0  3243     1096      351       6       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [ 3244]     0  3244     1096      349       5       3        0             0 agetty

Apr  1 10:21:54 projector kernel: [29916]  1000 29916      925      741       4       3        0             0 bash

Apr  1 10:21:54 projector kernel: [29976]  1000 29976      755      578       5       3        0             0 su

Apr  1 10:21:54 projector kernel: [29979]     0 29979      925      761       6       3        0             0 bash

Apr  1 10:21:54 projector kernel: [29984]     0 29984      563      184       5       3        0             0 dmesg

Apr  1 10:21:54 projector kernel: Out of memory: Kill process 29916 (bash) score 0 or sacrifice child

Apr  1 10:21:54 projector kernel: Killed process 29976 (su) total-vm:3020kB, anon-rss:304kB, file-rss:2008kB, shmem-rss:0kB

Apr  1 10:21:54 projector kernel: oom_reaper: reaped process 29984 (dmesg), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Apr  1 10:22:21 projector sshd[29988]: SSH: Server;Ltype: Version;Remote: 192.168.0.104-15575;Protocol: 2.0;Client: PuTTY_Release_0.66

Apr  1 10:22:21 projector sshd[29988]: SSH: Server;Ltype: Kex;Remote: 192.168.0.104-15575;Enc: aes256-ctr;MAC: hmac-sha2-256;Comp: none [preauth]

Apr  1 10:22:26 projector sshd[29988]: Accepted keyboard-interactive/pam for bigun from 192.168.0.104 port 15575 ssh2

Apr  1 10:22:26 projector sshd[29988]: pam_unix(sshd:session): session opened for user bigun by (uid=0)

Apr  1 10:22:26 projector sshd[29994]: SSH: Server;Ltype: Kex;Remote: 192.168.0.104-15575;Enc: aes256-ctr;MAC: hmac-sha2-256;Comp: none

Apr  1 10:23:45 projector su[30003]: Successful su for root by bigun

Apr  1 10:23:45 projector su[30003]: + /dev/pts/0 bigun:root

Apr  1 10:23:45 projector su[30003]: pam_unix(su:session): session opened for user root by bigun(uid=1000)

```

----------

## cboldt

One purported fix (found at unix.stackexchange.com ... I think linked above)

```
sync && echo 1 > /proc/sys/vm/drop_caches
```

Here is another shot in the dark ...

```
root@hypoid-2 [3] 26 /root # zgrep SL[AU]B /proc/config.gz

CONFIG_SLAB=y

# CONFIG_SLUB is not set

# CONFIG_SLAB_FREELIST_RANDOM is not set

CONFIG_SLABINFO=y

# CONFIG_DEBUG_SLAB is not set
```

I'm not getting close to full RAM use, so no swapping going on here - just saying, I'm not pushing up against the RAM limit, so don't know if those settings "work" to make swap happen, or not.  The mention of SLAB/SLUB is in https://lkml.org/lkml/2016/12/12/49, you may have to follow some of the links around.

Edit to add a couple more remarks and references.

The preference to resolve this is SLUB, not SLAB.  In other words, the settings on my system would not tend to resolve oom-killer acting before the kernel resorts to swap.

The "sync && echo 1 > /proc/sys/vm/drop_caches" has to be in a cronjob, not a one-shot.  See https://bugzilla.redhat.com/show_bug.cgi?id=1373339

----------

## tholin

 *Bigun wrote:*   

> So it's happening again:

 

You have plenty of available (easily freeable) ram so the OOM condition is probably a bug. You are not locking huge amounts of page cache right?

There have been a lot of problems with memory management and early OOM with recent kernels. You use 4.9.6 which is an old 4.9 version. Try 4.9.20. Here are some patches for fixing OOM conditions that went in 4.9.7. No idea if that's your problem though.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=96e5cec10e7a75c931f8993633b3a5cedc99144e

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=ade7afe9dca6b13919f88abd38eefe32f22eaeb3

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=d1656c5aef4d72f03a7833d07a378c8f604b8307

In general I would advice against running any kernel that hasn't been "stable" for at least 6 months. Unless you really need some of the changes in 4.9 try downgrading to 4.4.59.

 *cboldt wrote:*   

> sync && echo 1 > /proc/sys/vm/drop_caches

 

That's a horribly ugly workaround. Better to find what cause the problem than doing things like that.

----------

## Roman_Gruber

4.4 branch is also buggy. I think it depends on the platform and the use case. I had another topic htis week i responded where the 4.4. branch was the culprit for the hole topic. I update my kernel on a weekly basis. And i stay on 4.9 becasue 4.10 is not supported by the binary nvidia drivers afaik.

----------

## Bigun

 *tholin wrote:*   

>  *Bigun wrote:*   So it's happening again: ...You are not locking huge amounts of page cache right?

 

I'm not sure what you mean by locking, but here is my current cache usage:

```
 $ free

              total        used        free      shared  buff/cache   available

Mem:        8230380      425608       21528         776     7783244     7508572

Swap:        979836           0      979836

```

----------

## Bigun

Now this is interesting.

```
[ 1774.540540] kworker/dying (1072) used greatest stack depth: 5424 bytes left

[54758.518469] kworker/dying (3320) used greatest stack depth: 5416 bytes left

[66456.307781] kworker/dying (30573) used greatest stack depth: 5296 bytes left

```

----------

## Bigun

 *Roman_Gruber wrote:*   

> 4.4 branch is also buggy. I think it depends on the platform and the use case. I had another topic htis week i responded where the 4.4. branch was the culprit for the hole topic. I update my kernel on a weekly basis. And i stay on 4.9 becasue 4.10 is not supported by the binary nvidia drivers afaik.

 

I usually update to a newer kernel when it gets marked as stable anyway, upgrading now.  We'll see if this works.

 *cboldt wrote:*   

> One purported fix (found at unix.stackexchange.com ... I think linked above)
> 
> ```
> sync && echo 1 > /proc/sys/vm/drop_caches
> ```
> ...

 

Going to try this next.  I want to pinpoint why it's happening.

----------

## NeddySeagoon

Bigun,

```
sync && echo 1 > /proc/sys/vm/drop_caches
```

if that works, you have a kernel bug.

The kernel is supposed to do that for itself by way of normal operation.

sync flushes dirty buffers to the disc.  The filesystem ensures that there is nothing more than a few seconds old in dirty buffers anyway.

drop_caches, flushes clean buffers, which the kernel is supposed to do when it needs RAM.

However,  drop_caches, drops all caches, which will have a performance impact. The kernel mechanism only drops what's needed.

As has been said, its a horrible hack.

----------

## Bigun

I'm starting to think it was a kernel bug.  After upgrading to 4.9.16, I've noticed caching doesn't completely consume my RAM, and the SWAP is actually being used.  Four days uptime, when I get to 14, I'll consider the issue resolved, but I'm starting to guess that was the issue.

----------

## Bigun

Consider it solved.  Upgrading to the new kernel version seems to have fixed the issue.

----------

## Carnildo

 *Bigun wrote:*   

> I'm starting to think it was a kernel bug.  After upgrading to 4.9.16, I've noticed caching doesn't completely consume my RAM, and the SWAP is actually being used.  Four days uptime, when I get to 14, I'll consider the issue resolved, but I'm starting to guess that was the issue.

 

It almost certainly was: https://bugzilla.kernel.org/show_bug.cgi?id=190351 and https://bugzilla.redhat.com/show_bug.cgi?id=1401012

----------

