# Kernel OOPS on rm

## vm666

I erased a file on a NAS and lost control: no way to connect thru SSH, current SSH freezed, but the machine still answered to ping and the hardware watchdog did not fire.

I had an SSH open on another machine and was able to get a dmesg. After that neither reboot nor poweroff worked, I had to switch the machine off and on.

```

[258937.535809] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028

[258937.535862] IP: [<ffffffff8116318f>] ext4_ext_remove_space+0x750/0xa0a

[258937.535903] PGD 0 

[258937.535919] Oops: 0000 [#1] SMP 

[258937.535942] CPU 3 

[258937.535954] Modules linked in: nfs auth_rpcgss nfsd lockd nfs_acl sunrpc w83627hf hwmon_vid ipv6 dm_crypt dm_mod loop crc32c uhci_hcd ehci_hcd usbcore coretemp r8169 i2c_i801 iTCO_wdt mii usb_common

[258937.536097] 

[258937.536116] Pid: 27061, comm: rm Not tainted 3.4.9-gentoo #3 Tranquil PC T series/D510MO

[258937.536172] RIP: 0010:[<ffffffff8116318f>]  [<ffffffff8116318f>] ext4_ext_remove_space+0x750/0xa0a

[258937.536229] RSP: 0018:ffff88007301dcb8  EFLAGS: 00010246

[258937.536262] RAX: 0000000000000000 RBX: ffff8801298f70f0 RCX: 0000000000000001

[258937.536307] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 000000007459c3c0

[258937.536360] RBP: ffff88007301dd98 R08: 000000007459c3c0 R09: ffffffff811613f9

[258937.536475] R10: ffffffff8104214e R11: ffff88004ec75ab8 R12: ffff880103529000

[258937.536673] R13: ffff8801298f70c0 R14: ffff88004ec75ab8 R15: 0000000000000000

[258937.536849] FS:  00007f612115d700(0000) GS:ffff88012fd80000(0000) knlGS:0000000000000000

[258937.537025] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

[258937.537121] CR2: 0000000000000028 CR3: 0000000100540000 CR4: 00000000000007e0

[258937.537295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[258937.537468] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

[258937.537643] Process rm (pid: 27061, threadinfo ffff88007301c000, task ffff88012924dac0)

[258937.537817] Stack:

[258937.537900]  ffff88007301dd18 ffffffff811500e7 ffff88007301dd48 ffff880000000000

[258937.538086]  ffff880109e47138 0000000047e79200 ffff8801fffffff5 00000001ffffffff

[258937.538270]  0000004000c01000 0000000000000040 000000004ec75a88 ffff880129211000

[258937.538454] Call Trace:

[258937.538545]  [<ffffffff811500e7>] ? ext4_mark_iloc_dirty+0x4a9/0x507

[258937.538648]  [<ffffffff81164afa>] ext4_ext_truncate+0xe8/0x169

[258937.538748]  [<ffffffff8114e4cf>] ext4_truncate+0x64/0x73

[258937.538846]  [<ffffffff811519f2>] ext4_evict_inode+0x1bc/0x2a0

[258937.538947]  [<ffffffff810e873f>] evict+0xa4/0x15e

[258937.539043]  [<ffffffff810e89af>] iput+0x1b6/0x1be

[258937.539140]  [<ffffffff810e0292>] do_unlinkat+0x11d/0x175

[258937.539240]  [<ffffffff810d848c>] ? sys_newfstatat+0x2a/0x36

[258937.539339]  [<ffffffff810e0cb5>] sys_unlinkat+0x24/0x26

[258937.539438]  [<ffffffff8136bc62>] system_call_fastpath+0x16/0x1b

[258937.539534] Code: e1 ff ff 48 63 5d bc 48 6b db 30 48 03 5d b0 e9 ff 00 00 00 48 63 55 bc 48 6b da 30 48 03 5d b0 48 83 7b 20 00 75 0c 48 8b 43 28 <48> 8b 40 28 48 89 43 20 48 8b 43 18 48 85 c0 75 20 48 8b 43 20 

[258937.539985] RIP  [<ffffffff8116318f>] ext4_ext_remove_space+0x750/0xa0a

[258937.540087]  RSP <ffff88007301dcb8>

[258937.540174] CR2: 0000000000000028

[258937.540532] ---[ end trace 8c27ef8f255b7fec ]---

chaudron ~ # uname -a

Linux chaudron 3.4.9-gentoo #3 SMP Sat Sep 1 13:21:06 CEST 2012 x86_64 Intel(R) Atom(TM) CPU D510 @ 1.66GHz GenuineIntel GNU/Linux

chaudron ~ # 

```

I set kernel.panic_on_oops=1 in sysctl.conf and it paniced when I tried to erased another file in the same group. The machine freezed and did not reboot automatically, although I have kernel.panic=3

Did anybody see something like that?

----------

## eccerr0r

Could you try fscking the disk?  Just want to make sure there's no bad data on the disk that would cause the ext4fs module to puke like that.

There's a possibility the oops->panic and panic->reboot options changed for 3.4.9, but I'm not sure... does "kernel.oops=panic" work?

----------

## Hu

That is not an Oops.  That is just a BUG.  I suggest you upgrade to a newer kernel.  There was a known regression relating to ext4 that is fixed in 3.4.10.

----------

## PaulBredbury

Yeah, see changelog:

 *Quote:*   

> ext4: fix kernel BUG on large-scale rm -rf commands

 

----------

## vm666

 *Quote:*   

> ext4: fix kernel BUG on large-scale rm -rf commands

 

This is probably worse than that: I got the bug by just removing one big file (a 4.5 GB DVD image).

And when I rebooted, for whatever reason, the kernel was unable to mount / (which is on an auto-detected RAID5 array, but I guess this is irrelevant).

I got a message from ext3 saying that the root device could not be mounted because of unsupported feature (so far this is normal) and then it freezed instead of mounting it with EXT4.

I recompiled a kernel with just EXT4 and it freezed without anything message when it tried to mount /

I unmasked 3.5.3-gentoo, it works again.

----------

