# comm: kswapd0 Tainted: GF / Unable to handle kernel paging

## domdorn

Hello![/code]

Since a week or two I have serious problems with a Gentoo Server (r). 

In /var/log/kernel/current there it says this

```

May 17 01:49:11 [kernel] Unable to handle kernel paging request at ffffffff76ffffff RIP: 

May 17 01:49:11 [kernel]  [<ffffffff802c86b9>] ext3_clear_inode+0x29/0xb0

May 17 01:49:11 [kernel] PGD 203027 PUD 0 

May 17 01:49:11 [kernel] CPU 1 

May 17 01:49:11 [kernel] Modules linked in: iptable_mangle iptable_nat ip_nat ip_conntrack ipt_LOG xt_tcpudp iptable_filter ip_tables x_tables w83627hf hwmon_vid eeprom i2c_isa i2c_nforce2 sata_vsc sata_promise sbp2 ohci1394 ieee1394 ohci_hcd uhci_hcd usb_storage usbhid ehci_hcd

May 17 01:49:11 [kernel] Pid: 218, comm: kswapd0 Tainted: GF     2.6.18-gentoo-r6 #1

May 17 01:49:11 [kernel] RIP: 0010:[<ffffffff802c86b9>]  [<ffffffff802c86b9>] ext3_clear_inode+0x29/0xb0

May 17 01:49:11 [kernel] RSP: 0018:ffff81012fb71d20  EFLAGS: 00010286

May 17 01:49:11 [kernel] RAX: ffffffff802c8690 RBX: ffff81005172fd50 RCX: 0000000000000000

May 17 01:49:11 [kernel] RDX: ffff81012fb71d90 RSI: 00000000000000d0 RDI: ffffffff76ffffff

May 17 01:49:11 [kernel] RBP: 0000000000000000 R08: 0000000000019277 R09: 00000000000fb8ab

May 17 01:49:11 [kernel] R10: 28f5c28f5c28f5c3 R11: 0000000000000000 R12: ffff81012fb71d90

May 17 01:49:11 [kernel] R13: 0000000000000000 R14: ffff81012fb71d90 R15: 0000000000000012

May 17 01:49:11 [kernel] FS:  0000000044458940(0000) GS:ffff81012fc3a6c0(0000) knlGS:0000000000000000

May 17 01:49:11 [kernel] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b

May 17 01:49:11 [kernel] CR2: ffffffff76ffffff CR3: 000000012e71f000 CR4: 00000000000006e0

May 17 01:49:11 [kernel] Process kswapd0 (pid: 218, threadinfo ffff81012fb70000, task ffff81012f80cfc0)

May 17 01:49:11 [kernel] Stack:  ffff81005172fd50 ffff81005172fd50 ffff81005172fd50 ffffffff8029125e

May 17 01:49:11 [kernel]  ffff81005172fd60 ffffffff8029201e ffff810004a64870 ffff8100295f8160

May 17 01:49:11 [kernel]  ffff8100295f8150 0000000000000080 0000000000000080 ffffffff802922d4

May 17 01:49:11 [kernel] Call Trace:

May 17 01:49:11 [kernel]  [<ffffffff8029125e>] clear_inode+0x8e/0xd0

May 17 01:49:11 [kernel]  [<ffffffff8029201e>] dispose_list+0x5e/0x100

May 17 01:49:11 [kernel]  [<ffffffff802922d4>] shrink_icache_memory+0x214/0x270

May 17 01:49:11 [kernel]  [<ffffffff8025dbbb>] shrink_slab+0x10b/0x190

May 17 01:49:11 [kernel]  [<ffffffff8025f1d2>] kswapd+0x372/0x480

May 17 01:49:11 [kernel]  [<ffffffff80245870>] autoremove_wake_function+0x0/0x30

May 17 01:49:11 [kernel]  [<ffffffff80245280>] keventd_create_kthread+0x0/0x80

May 17 01:49:11 [kernel]  [<ffffffff8025ee60>] kswapd+0x0/0x480

May 17 01:49:11 [kernel]  [<ffffffff80245280>] keventd_create_kthread+0x0/0x80

May 17 01:49:11 [kernel]  [<ffffffff80245239>] kthread+0xd9/0x120

May 17 01:49:11 [kernel]  [<ffffffff8020aaa0>] child_rip+0xa/0x12

May 17 01:49:11 [kernel]  [<ffffffff80245280>] keventd_create_kthread+0x0/0x80

May 17 01:49:11 [kernel]  [<ffffffff80245160>] kthread+0x0/0x120

May 17 01:49:11 [kernel]  [<ffffffff8020aa96>] child_rip+0x0/0x12

May 17 01:49:11 [kernel] Code: f0 ff 0f 0f 94 c0 84 c0 74 05 e8 88 ae fa ff 48 c7 43 b8 ff 

May 17 01:49:11 [kernel] RIP  [<ffffffff802c86b9>] ext3_clear_inode+0x29/0xb0

May 17 01:49:11 [kernel]  RSP <ffff81012fb71d20>

```

The CPU-load is about 350, ps -ef hangs after displaying about 1000 processes or so... 

I know how this is happening, but IMO the kernel should not fail because of this. 

At 01:49 a Backupscript of mine starts a mysqldump of a about 2.3 GB Database, writing

the file on a software raid 5 ... somehow there something strange happens, when i try to

change into the directory where the dump is placed and do a ls, the process hangs and is

unkillable like ps -ef and every other process in the system. 

The behaviour of the system is the same as described in this thread

https://forums.gentoo.org/viewtopic-t-500540-highlight-unable+handle+kernel+paging+request.html

Because I feared that the system will hang by doing a normal shutdown and the server is in a 

remote location I did a 

```

sync

reboot -f

```

Which solved the Problem temporarly.. but I'm sure it will occur again. 

If someone has a clue or needs more information, please post!

Thx!

Dominik

----------

## erik258

It almost sounds like you're out of system resources, in particular memory.  Generally, unless you've set it up differently on your software, out-of-memory results in death to the program trying to allocate more, which on a busy server is likely to be a thread for apache or named or something, so it might just drop a connection instead of breaking a service.  

anyway, load of 350 seems quite high, with 1000+ processes running the server is probably so bogged down trying to switch contects while conserving resources that it's completely unresponsive.  Is this expected by you?

----------

## domdorn

 *erik258 wrote:*   

> It almost sounds like you're out of system resources, in particular memory.  Generally, unless you've set it up differently on your software, out-of-memory results in death to the program trying to allocate more, which on a busy server is likely to be a thread for apache or named or something, so it might just drop a connection instead of breaking a service.  
> 
> anyway, load of 350 seems quite high, with 1000+ processes running the server is probably so bogged down trying to switch contects while conserving resources that it's completely unresponsive.  Is this expected by you?

 

Well, that's the problem. I have a cron job doing some regular task's for the page the server is serving 

(creating cached output for some page elements). 

At the moment the mysqldump starts, it seems, the server created some kind of lock ... 

normally this does not happen, the mysqldump works as expected and does not lock anything... 

the cron job then tries to run the cache-create processes and because of the lock, they are waiting for the lock to be released, what never happens. 

The weird thing is that the posted output of /var/log/kernel/current is exactly 11 seconds after the backup process started and at this moment, there should be no problem with out of memory or something like this and if there was one, I would be glad if the processes would just die, but they don't! I can't even kill them!

I will now look into setting up /etc/limits, maybe this helps...

if there's anything I can provide, please let me know

Dominik

(Sorry for the bad english, it's quite a while ago since I last had to _write_ something in english)

----------

## erik258

 *Quote:*   

> (Sorry for the bad english, it's quite a while ago since I last had to _write_ something in english)

 

Not at all!  Sorry my country's crappy educaitonal system didn't teach me german like it should have when I was 6 ! ;)  Anyway, your'e doing fine.  

I am a little confused though.  When you say

 *Quote:*   

> The weird thing is that the posted output of /var/log/kernel/current is exactly 11 seconds after the backup process started and at this moment, there should be no problem with out of memory or something like this and if there was one, I would be glad if the processes would just die, but they don't! I can't even kill them! 

 

do you mean after you rebooted?  How long does it take to run the cron job and how often is it running?

----------

## domdorn

 *erik258 wrote:*   

>  *Quote:*   (Sorry for the bad english, it's quite a while ago since I last had to _write_ something in english) 
> 
> Not at all!  Sorry my country's crappy educaitonal system didn't teach me german like it should have when I was 6 !   Anyway, your'e doing fine.  
> 
> I am a little confused though.  When you say
> ...

 

well no.. before the reboot... ok it was like this.. tonight I came home at around 4 o clock, thought about looking at the webpage... ok webpage is not responsive

gladly i had a ssh-session open, looked at it, ok.. it stll works... then i made a second ssh session, to see if this still works, ok, it worked.. 

i then did a uptime and saw that the load was about 320 or something in that area.. then i did a ps -ef and got about 1000 lines, the last 500 one of my php script which generates the cached content... but either there where more processes or something other happened, but ps -ef did not gave the control back to the bash.. i then tried to kill it from the other temrinal i had open, kill did not gave an errror message, but the process also didn't die... ok in fear i would lose complete control of the machine i created another ssh session, su'ed to root and looked at the logfiles, where the message posted above was shown... i then tried to kill the php processes by doing

killall php 

as this didn't worked i then tried 

killall -u lighttpd

also did not changed anything.. then i searched the forum for this bug i posted and seen that someone wrote that the system even hangs by doing a normal shutdown, which i can confirm because I tried this two weeks ago and then had to ask the operator of the server room to do a hard reset... ok.. then i tried "top" which worked to see the processes without hanging like ps -ef .. ok then somewhere i even saw tihs (currently in my /var/log/critical directory)

```

current                  log-2007-04-28-04:04:01  log-2007-05-16-23:05:11

log-2007-04-20-09:07:56  log-2007-05-11-23:05:40  log-2007-05-17-00:05:02

newW0rld critical # cat log-2007-05-17-00\:05\:02 

May 17 01:49:11 [kernel] Unable to handle kernel paging request at ffffffff76ffffff RIP: 

May 17 01:49:11 [kernel]  [<ffffffff802c86b9>] ext3_clear_inode+0x29/0xb0

May 17 01:49:11 [kernel] RIP  [<ffffffff802c86b9>] ext3_clear_inode+0x29/0xb0

```

I then created another ssh session, changed to the directory where the backup should be placed, did an ls and bang, the "ls" processed locked away like the ps - ef did... I then tried to kill it from the other terminal but with no luck... I then just thought "Ok, either way the webpage is not accessible, try a reboot" but gladly i had read that the normal shutdown method did not work at the other users computer because the normal shutdown process waits for the processes to die... so i just did a 

sync

to sync the discs and then did a 

reboot -f

forcing a reboot even without waiting for processes and discs to die/sync/whatever

gladly this worked but I'm not sure when this will happen again.. 

```

newW0rld critical # crontab -l -u root

# DO NOT EDIT THIS FILE - edit the master and reinstall.

# (/tmp/crontab.XXXXFDukgt installed on Sat Apr 28 14:52:30 2007)

# (Cron version V5.0 -- $Id: crontab.c,v 1.12 2004/01/23 18:56:42 vixie Exp $)

1 1 * * * ( cd /data/httplog/lighttpd/; mv access.log `date +\%Y\%m\%d\%H\%M`_access.log; killall lighttpd; /etc/init.d/lighttpd restart;)

15 1 * * * ( webalizer -c /data/httpanalyze/conf/newW0rld.conf /data/httplog/lighttpd/`date +\%Y\%m\%d`0101_access.log )

#15 1 * * * ( /usr/local/awffull/bin/awffull -c /data/httpanalyze/conf/newW0rld.conf /data/httplog/lighttpd/`date +\%Y\%m\%d`0101_access.log )

1 2 * * * ( cd /data/httplog/lighttpd/; nice bzip2 `date +\%Y\%m\%d`0101_access.log; )

49 1 * * * ( /data/httpbackup/lyrixbackup.sh ) 

^^^^^^^ notice the time 01:49

1 6 * * * ( emerge --sync --quiet; emerge -f world )

```

```

newW0rld critical # cat /data/httpbackup/lyrixbackup.sh 

#! /bin/bash

echo "Set Date"

DATE=`date +%Y-%m-%d`;

YEAR=`date +%Y`;

MONTH=`date +%m`;

DAY=`date +%d`;

BACKUPBASEDIR='/data/httpbackup'

BACKUPPREFIX="lyrix_backup_"

OWNER='lighttpd'

MYSQLHOST='localhost'

MYSQLUSER='lyrix_backup'

MYSQLPASS='somepassword'

MYSQLSOCK='/var/run/mysqld/mysqld.sock'

MYSQLDB="lyriks"

MYSQLDUMP='mysqldump'

TAR='tar'

HTDOCS='/var/www/www.lyrix.at/htdocs'

LIGHTTPDCONF='/etc/lighttpd/lighttpd.conf'

PHPINI='/usr/local/php5-cgi/lib/php.ini'

MYCNF='/etc/mysql/my.cnf'

NICE=nice -n 15

echo "Date ist $DATE"

echo "Erstelle Backupverzeichnis von $DATE"

mkdir ${BACKUPBASEDIR}/${DATE}

echo "In das Backupverzeichnis wechseln"

cd ${BACKUPBASEDIR}/${DATE}

echo "Erstelle mysqldump von $MYSQLDB " #{{{

#--flush-logs \

$NICE ${MYSQLDUMP} \

--add-drop-table \

--add-locks \

--allow_keywords \

--max_allowed_packet=16M \

--allow-keywords \

--disable-keys \

--hex-blob \

--quote-names \

--complete-insert \

--create-options \

--set-charset=utf8 \

--disable-keys \

--databases ${MYSQLDB} \

--user=${MYSQLUSER} --password=${MYSQLPASS} --socket=${MYSQLSOCK} \

--result-file=${BACKUPPREFIX}DBDUMP_${MYSQLDB}_${DATE}.sql

#}}}

echo "Changedir to ${HTDOCS}"

PWD=`pwd`

cd ${HTDOCS}

// doing other things... 

```

```

newW0rld critical # crontab -l -u lighttpd

# DO NOT EDIT THIS FILE - edit the master and reinstall.

# (/tmp/crontab.XXXXxXa48p installed on Tue May 15 20:45:39 2007)

# (Cron version V5.0 -- $Id: crontab.c,v 1.12 2004/01/23 18:56:42 vixie Exp $)

#MAILTO=""

SHELL=/bin/bash

MAILTO=my_emailn@gmail.com

##### HEADER STATS

3,5,7,9,11,13,15,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,0 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php header_stats.php )

0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php stats_stayonlineusers.php ) 

32,34,36,38,40,42,44,46,48,50,52,54,56,58 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php stats_stayonlineusers.php ) 

MAILTO=""

### RRD-Stats lighttpd

0-59/6 * * * * (sh /var/www/rrdtool_generate.sh )

###### TEXTE

5,15,30,45,59 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php text_autors.php )  

5,15,30,45,59 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php text_artists.php )  

6 1 * * * ( cd /var/www/mydomain/htdocs/tools/; /usr/local/php5-cgi/bin/php -q generate_songtextlist_real.php | grep lyrix_texte > generate_songtextlist.php )

40 1 * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php text_show.php )  

33 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php text_createUserCharts.php )  

#3,17,33,48 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php text_autor_list.php )  

####### SPRACHEN

18,31 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php exportLang_de.php )  

18,31 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php exportLang_en.php )

###### LISTS

30,59 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php list_topuser.php )

3,5,7,9,11,13,15,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,0 * * * * ( cd /var/www/mydomain/htdocs/ly/mkcache/; /usr/local/php5-cgi/bin/php export_lyttexttype.php )

```

So you can see, there are a lot of entries creating cached content and doing that every few minutes... normally those processes just last at maximum 10-15 seconds, but with the lock on the database or filesystem or whatever happens there, they just don't die somehow and would also explain why there are so much processes in ps -ef

Ah.. i nearly forgotten to mention.. i also did a free -m seeing normal stats, just about 40 MB of swap space used, output nearly identical

this one i just created

```

newW0rld critical # free -m

             total       used       free     shared    buffers     cached

Mem:          3954       3901         52          0        163       3240

-/+ buffers/cache:        497       3457

Swap:         1960          2       1958

```

I think it has something to do with the raid or with ext3, but I'm not sure... I would even suspect the Ram, but unfortunately the next time I can visit the serverroom is at beginning of july.. 

thanks for your help!

dominik

----------

## erik258

I only wish I could help more!  In the mean time though, I wonder whether your script should really run more than once at a time if it's always doing the same thing every time.  I wonder whether there could be some strange race condition, or other hard-to-spot bug that could result in strange behavior like this at the kernel/fs level.

----------

