# [solved] ACPI Kernel memory leak

## eccerr0r

I think I have a memory leak, and assumed to be kernel.  Anyone know how I can start debugging this?

Notice the uptime, resident memory, and total memory on my system -- it doesn't add up to anywhere close...  I only noticed that this machine is leaking badly when the machine starts swapping pretty badly when I submit a distcc job to it or the few days I kick off firefox or something.

This is 'top' with sorting by 'm'emory.

```

top - 14:39:43 up 72 days,  5:46,  2 users,  load average: 0.01, 0.07, 0.04

Tasks:  68 total,   2 running,  66 sleeping,   0 stopped,   0 zombie

Cpu(s):  2.8% us,  2.6% sy,  0.0% ni, 94.4% id,  0.0% wa,  0.0% hi,  0.2% si

Mem:    516392k total,   482708k used,    33684k free,     9160k buffers

Swap:   789344k total,    23336k used,   766008k free,    16520k cached

  PID USER      NI  VIRT  RES  SHR WCHAN     S %CPU   TIME COMMAND             

 5233 blc        0 14308 2108 1428 stext     R  3.2  20,43 gkrellm2             

 5250 blc        0 14308 2108 1428 stext     S  0.0   1:41 gkrellm2             

  830 root       0  6140 1932 1556 stext     S  0.0   0:00 sshd: blc [priv]     

 5200 root       0  158m 1828  628 stext     S  0.6 381:39 X -br -dpi 75 :0 -aut

  842 blc        0  2784 1496 1184 wait      S  0.0   0:00 -bash                

  835 blc        0  6272 1368  960 stext     S  0.0   0:00 sshd: blc@pts/3      

  888 blc        0  2072 1072  832 stext     R  0.6   0:00 top

14342 blc        0  5736  868  576 stext     S  0.0   1:36 BitchX irc.inet.tele.

31632 distcc    15  2000  520  192 stext     S  0.0   0:01 /usr/bin/distccd --pi

30622 root       0  5408  476  352 stext     S  0.0   0:18 sendmail: accepting c

30625 smmsp      0  5416  440  328 pause     S  0.0   0:00 sendmail: Queue runne

29804 root       0  4680  424  324 stext     S  0.0   0:11 /usr/sbin/cupsd      

30232 root       0  1672  364  244 stext     S  0.0   0:07 /usr/sbin/syslog-ng  

31457 distcc    15  2000  352  188 stext     S  0.0   0:01 /usr/bin/distccd --pi

30089 nobody     0  3896  308  220 stext     S  0.0   0:04 proftpd: (accepting c

 7584 root       0  3336  272  168 stext     S  0.0   0:18 /usr/sbin/sshd       

31519 distcc    15  2000  272  192 stext     S  0.0   0:01 /usr/bin/distccd --pi

30467 root       0  1460  268  216 stext     S  0.0   0:00 /usr/sbin/rwhod -b   

29731 root       0  1688  260  204 stext     S  0.0   0:02 /usr/sbin/cron       

30357 root       0  1624  236  192 stext     S  0.0   0:01 /usr/sbin/automount -

30381 root       0  1624  236  192 stext     S  0.0   0:01 /usr/sbin/automount -

    1 root       0  1440  212  192 stext     S  0.0   0:15 init [3]             

14227 root       0  2372  200  196 wait      S  0.0   0:00 /bin/login --        

 5183 blc        0  2576  200  196 wait      S  0.0   0:00 /bin/sh /usr/bin/star

17672 blc        0  2780  196  192 stext     S  0.0   0:00 -/bin/bash           

 1117 blc        0  2908  196  192 wait      S  0.0   0:00 -bash                

 5204 blc        0  4884  196  192 stext     S  0.0   0:50 fvwm2                

30010 root       0  1428  172  168 stext     S  0.0   0:00 /usr/sbin/acpid -c /e

29477 nobody     0  1596  168  164 stext     S  0.0   0:00 /sbin/rpc.statd      

29494 root       0  1744  168  164 stext     S  0.0   0:02 /usr/sbin/rpc.mountd 

30545 root       0  2088  164  160 stext     S  0.0   0:00 /usr/sbin/xinetd -pid

14302 blc        0  2640  160   80 stext     S  0.0   0:51 SCREEN               

29432 rpc        0  1668  156  152 stext     S  0.0   0:00 /sbin/portmap

30159 root       0  1640  152  120 stext     S  0.0   0:06 /usr/sbin/gpm -m /dev

 5199 blc        0  2400  148  144 wait      S  0.0   0:00 xinit /home/blc/.xini

30466 root       0  1460  112   88 stext     S  0.0   0:14 /usr/sbin/rwhod -b   

29659 distcc    15  2000   20   16 wait      S  0.0   0:00 /usr/bin/distccd --pi

 7849 root       0  1592    4    0 stext     S  0.0   0:00 /sbin/agetty 38400 tt

 7850 root       0  1592    4    0 stext     S  0.0   0:00 /sbin/agetty 38400 tt

 7851 root       0  1596    4    0 stext     S  0.0   0:00 /sbin/agetty 38400 tt

 7852 root       0  1596    4    0 stext     S  0.0   0:00 /sbin/agetty 38400 tt

 7853 root       0  1596    4    0 stext     S  0.0   0:00 /sbin/agetty 38400 tt

11780 blc        0  1616    4    0 stext     S  0.0   0:00 heyu_relay 8 off     

14303 blc        0  2784    4    0 wait      S  0.0   0:00 -/bin/bash           

    2 root      19     0    0    0 ksoftirqd S  0.0   0:00 [ksoftirqd/0]

[...]
```

This is 2.6.15-gentoo-r1 and has been persistant for the last couple of kernel versions, will be upgrading to 2.6.17 and see if it's any better; but I'm not sure how to accellerate the leak, if I have to wait 70 days to tell, that's no good either.  This is a P4 i865PE with ATI R9250SE running Xorg with the opensource DRI drivers.   Both 6.8 and 7.0 do the same thing and I run out of memory several months after a fresh boot.

```
$ lsmod

Module                  Size  Used by

snd_pcm_oss            42272  0 

snd_bt87x              11208  0 

nfs                    86316  0 

nls_iso8859_1           4096  0 

nls_cp437               5760  0 

vfat                    9728  0 

fat                    40732  1 vfat

sd_mod                 12176  0 

usb_storage            65216  0 

scsi_mod              113640  2 sd_mod,usb_storage

tuner                  38692  0 

tvaudio                20508  0 

bttv                  147920  0 

video_buf              16388  1 bttv

firmware_class          7680  1 bttv

i2c_algo_bit            8200  1 bttv

v4l2_common             4992  1 bttv

btcx_risc               3976  1 bttv

tveeprom               13072  1 bttv

videodev                7168  1 bttv

radeon                 97696  1 

snd_mixer_oss          14464  1 snd_pcm_oss

snd_intel8x0           25884  0 

snd_ac97_codec         80160  1 snd_intel8x0

snd_ac97_bus            2048  1 snd_ac97_codec

snd_pcm                69896  4 snd_pcm_oss,snd_bt87x,snd_intel8x0,snd_ac97_codec

snd_timer              19076  1 snd_pcm

snd                    42980  7 snd_pcm_oss,snd_bt87x,snd_mixer_oss,snd_intel8x0,snd_ac97_codec,snd_pcm,snd_timer

snd_page_alloc          8072  3 snd_bt87x,snd_intel8x0,snd_pcm

soundcore               7264  1 snd

autofs                 12160  2 

nfsd                   82792  13 

exportfs                4736  1 nfsd

lockd                  54024  3 nfs,nfsd

sunrpc                123964  9 nfs,nfsd,lockd

belkin_sa               8068  1 

usbserial              25576  3 belkin_sa

parport_pc             33732  1 

lp                      9160  0 

parport                29128  2 parport_pc,lp

sk98lin               144224  0 

rtc                    10036  0 

ext3                   96388  1 

jbd                    47892  1 ext3

ati_remote              9608  0 

w83781d                27300  0 

hwmon_vid               2304  1 w83781d

hwmon                   2452  1 w83781d

i2c_isa                 3584  1 w83781d

i2c_core               15888  7 tuner,tvaudio,bttv,i2c_algo_bit,tveeprom,w83781d,i2c_isa

```

 -- in case there are some known leaks in some of these drivers.

```
$ cat /proc/meminfo 

MemTotal:       516392 kB

MemFree:         31376 kB

Buffers:          9328 kB

Cached:          17732 kB

SwapCached:       4684 kB

Active:          20004 kB

Inactive:        13016 kB

HighTotal:           0 kB

HighFree:            0 kB

LowTotal:       516392 kB

LowFree:         31376 kB

SwapTotal:      789344 kB

SwapFree:       766012 kB

Dirty:               0 kB

Writeback:           0 kB

Mapped:           9864 kB

Slab:           436700 kB

CommitLimit:   1047540 kB

Committed_AS:    42132 kB

PageTables:        708 kB

VmallocTotal:   507896 kB

VmallocUsed:      8008 kB

VmallocChunk:   499440 kB

```

The only other possibility is that library code segments are still tied up in memory after upgrades but 'lsof' doesn't say very many.

Any ideas?

---------------------------------------

SOLVED

---------------------------------------

FIX SUMMARY:

Kernel 2.6.17 and older have a leak in its AML ACPI interpreter while interpreting a poorly written DSDTs in certain motherboard BIOS.  A kernel upgrade to 2.6.18 release is necessary to fix the leak.Last edited by eccerr0r on Sun Nov 05, 2006 11:08 pm; edited 2 times in total

----------

## CPUFreak91

Is your machine running slow at all?

You may want to check this article out (http://wiki.linuxquestions.org/wiki/FAQ_-_Linux_problems). It helped me with my "memory problems" ("" meaning that they never were reall problems to begin with. The article did help with my swapping problem though).

----------

## eccerr0r

It's actually running as fast as it always is, just that it hits swap sooner and thus the slowdown from OOM (and I get oom kills as well).

I tried terminating all processes and then tried to fill the ram with junk mallocs/wrote 0's, then killed this program.   Theoretically this should eat up all the buffers/cache and all memory gets dropped back into 'Free' for the time being - Well, it did clear up buffers and cache but the memory never came back.

This memory issue is "wired" memory for the lack of anything else to call it - it never swaps out, I can't provoke the kernel to dump it to swap.  'Top' doesn't appear to say anythiing consuming/locking that much RAM (which is on the order of 300MB it seems).  No clue what's using it.  Ruled out tempfs (looks like it should commit to swap) and I don't have a ramfs drive.

I rebooted the box with 2.6.17-gentoo-r4, crossing my fingers and hoping it won't debilitate itself again after a few months.

----------

## eccerr0r

Still leaking.  I cut the test short after two months to put SATA support in, but the used memory number just keeps on growing over time.  I check the number after quitting all apps, restarted X11, and checked the used minus buffers and cached.  On a fresh reboot and X startup, I get around 50MB used.  Just before rebooting, I tried quitting all apps and restart X, and it reported 140MB used (!).  None of this apparently kernel-used memory can be forced to swap either; I wrote a short userspace program that basically is a huge memory leak (allocates and clears memory, throws away the pointer, ad infinum utill you run out of memory) - but none of the memory that disappears ever gets back into the free memory heap - and the program starts hitting swap sooner and sooner each week of uptime.  Takes around three months uptime and the machine becomes unuseable, that two  month I lost almost 100MB RAM (which seems a little better than before though.)

I'm now testing gentoo-sources-2.6.17-r8 now with basically the same .config (recompiled with SATA support), but no confidence this will fix the problem.

The rate of consumption on this machine is just hideous.  My other machines seem to also leak memory too but it's more in the noise, or at least I don't notice as much...  This one just so happens to need all its memory for caching.

----------

## bollucks

Notice the huge "slab" size - that's kernel data structures. Check the output of /proc/slabinfo to see where it all is and that should give you the suspect kernel component.

----------

## eccerr0r

Thanks for the lead, I never knew what 'slab' meant but I finally googled it.  Very useful information.

I'll get back with another post next month or earlier if I can trigger this leak using this new information.  I rebooted it to get SATA working  :Sad: 

My largest suspects are currently:

- BTTV, tuner, v4l2

- radeon with agpgart

and maybe

- nfs/automount

- reiserfs

- oss-alsa?  probably not...

- more?  Any other drivers I have that are 'roads less travelled?'

Apparently ruled out:

normal caching

library upgrades

software mallocs

software mallocs triggerring kernel bugs (tested these with a small "voracious" program)

----------

## bollucks

By the way if you trigger this problem on a vanilla kernel the kernel developers should know about it since it's a real kernel issue.

----------

## syg00

I was going to suggest keeping an eye on smaps, but that assumes the "leak" is attributable to a process (or shared).

Might be a good idea to watch the slabinfo as per above.

----------

## eccerr0r

I think I may have found the culprit:

Tried my memory eating software and couldn't recover around 40MB of memory after 1 week of uptime:

So I checked slabinfo and this entry stuck out -

```

# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>

acpi_operand      1038012 1038036     40   92    1 : tunables  120   60    0 : slabdata  11283  11283      0

```

So I tried killing acpid hoping it would release some handles but it didn't help.

Hope I don't have to disable ACPI support to see if this is a good workaround.

Apparently there were a couple of recent ACPI changes addressing some memory leak, I hope these fix the problem and get integrated into the kernel soon!

--> http://bugzilla.kernel.org/show_bug.cgi?id=6514 <--

----------

## evoweiss

Hi,

I think I am having the same problem, notably I find my swap being accessed when it shouldn't be. At present it's being used, but there's still free memory at hand. 

Nothing seems to have stood out, but any suggestions on what to post and I'd be happy to do so.

Best,

Alex

----------

## eccerr0r

To rule out any temporary utilization like buffers, I made sure that I allocated all my ram with software mallocs... 

Here might be something useful to print out top slab offenders.

```
egrep -ve '^(#|slabinfo|size-)' /proc/slabinfo|awk '{print $3*$4" "$1;}'|sort -rn|head

```

I get (after eating all my RAM and freeing it)

```
66780960 acpi_operand

1885632 radix_tree_node

1372308 dentry_cache

1132880 reiser_inode_cache

462624 inode_cache

266112 sysfs_dir_cache

249600 filp

241664 pgd

216832 vm_area_struct

190944 buffer_head

```

Yes, that ACPI subsystem is eating 64MB RAM right now.  I tried logging it and it looks like it's eating another page every *minute* (i.e., I lose approximately 4KB/minute or 4MB/day).   I guess next on the scrutiny list is reiserfs, but that's nowhere near how bad the ACPI leak is.

----------

## evoweiss

Thanks, mine appears different, with no evidence of ACPI being a slab offender. Does this give anybody insight into what's happening on my system? I think the problem I am experiencing is related somehow, especially given the symptoms.

```

92980704 ext3_inode_cache

21643512 dentry_cache

2291328 buffer_head

2105880 radix_tree_node

334080 filp

329472 inode_cache

278784 vm_area_struct

225280 pgd

162540 shmem_inode_cache

139968 proc_inode_cache

```

Best,

Alex

----------

## eccerr0r

those don't look terribly wrong, I'd start looking elsewhere for leaks.  But one thing that I'm not sure you did is fill your ram so purge all excess cache pages from memory, the inode caches tend to grow quite a bit as you use your disk, but will shrink when memory is stressed (I can get my reiser_inode_cache to 100MB or so by just running 'du' on my harddrives.)

----------

## evoweiss

This may sound like a stupid question, but, as I've never looked for leaks, how do I go about doing it? Any recommendation?

 *eccerr0r wrote:*   

> those don't look terribly wrong, I'd start looking elsewhere for leaks.  But one thing that I'm not sure you did is fill your ram so purge all excess cache pages from memory, the inode caches tend to grow quite a bit as you use your disk, but will shrink when memory is stressed (I can get my reiser_inode_cache to 100MB or so by just running 'du' on my harddrives.)

 

----------

## syg00

I generally take the simple approach - run top in batch mode and write it to a file. Look for any obvious candidates. Best idea is to have a config in place that sorts on the field you are interested in; saves post-processing. All in the manpage. Something like the following should give you the idea.

```
for i in {1..10} ; do top -b -n 1 | head -n 25 > out.`date +%F_%H:%M:%S` ; sleep 30 ; done
```

I use a script so I can make the count and delay variable.

----------

## eccerr0r

More updates

vanilla 2.6.18.2 seems screwed up :-/  kernel modules seemed generated wrong.

Tried updating my firmware (Abit IS7) to version 24, still leaks.

Disabling ACPI obviously makes the problem go away.

This is the exploit I found to exasperate the issue:

```
while true; do cat /proc/acpi/thermal_zone/THRM/temperature ; done
```

This does NOT affect my other ACPI boxes (Dell 600M), only my Abit IS7 motherboard with Award BIOS.

And summary - it looks like it indeed is a duplicate of http://bugzilla.kernel.org/show_bug.cgi?id=6514

Gentoo's default linux kernel distribution should include this patch ASAP, I don't think I've gotten a new kernel on portage for a while now.  Having bad firmware is userland exploitable to DoS the machine.  Fortunately my machine is not a 'public' machine so I don't have to worry about someone doing the malicious exploit.

Closed, except for gentoo-sources needs a patch  :Smile: 

----------

