# Kernel issue 4.8.15 with xen

## MasterPrenium

Hello Guys,

I've having some trouble on a new system I'm setting up.

Dmesg errors :

```
Dec 21 13:49:23 Node_1 kernel: [  413.220915] ------------[ cut here ]------------

Dec 21 13:49:23 Node_1 kernel: [  413.221072] kernel BUG at drivers/md/raid5.c:527!

Dec 21 13:49:23 Node_1 kernel: [  413.221293] invalid opcode: 0000 [#1] SMP

Dec 21 13:49:23 Node_1 kernel: [  413.221381] Modules linked in: x86_pkg_temp_thermal coretemp crc32c_intel aesni_intel aes_x86_64 ablk_helper mei_me mei mpt3sas

Dec 21 13:49:23 Node_1 kernel: [  413.221735] CPU: 2 PID: 5598 Comm: btrfs-transacti Not tainted 4.8.15-gentoo #3

Dec 21 13:49:23 Node_1 kernel: [  413.221901] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F, BIOS 1.0b 11/21/2016

Dec 21 13:49:23 Node_1 kernel: [  413.222073] task: ffff880267851a00 task.stack: ffff8802520e8000

Dec 21 13:49:23 Node_1 kernel: [  413.222202] RIP: e030:[<ffffffff819c5cac>]  [<ffffffff819c5cac>] raid5_get_active_stripe+0x5cc/0x670

Dec 21 13:49:23 Node_1 kernel: [  413.222421] RSP: e02b:ffff8802520eb8a0  EFLAGS: 00010086

Dec 21 13:49:23 Node_1 kernel: [  413.222546] RAX: ffff8802520eb8e0 RBX: ffff880265d0d800 RCX: ffff880265d0d9c0

Dec 21 13:49:23 Node_1 kernel: [  413.222708] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880263d244e8

Dec 21 13:49:23 Node_1 kernel: [  413.222848] RBP: ffff8802520eb930 R08: ffff88026438f1c0 R09: 0000000000000000

Dec 21 13:49:23 Node_1 kernel: [  413.223010] R10: ffff880265d0d9c8 R11: 0000000000000000 R12: ffff880265d0d800

Dec 21 13:49:23 Node_1 kernel: [  413.223149] R13: ffff880263d244e8 R14: ffff88016d3b8400 R15: ffff880263d244e8

Dec 21 13:49:23 Node_1 kernel: [  413.223324] FS:  0000000000000000(0000) GS:ffff880270c80000(0000) knlGS:ffff880270c80000

Dec 21 13:49:23 Node_1 kernel: [  413.223496] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033

Dec 21 13:49:23 Node_1 kernel: [  413.223623] CR2: 00007fb58b66c318 CR3: 0000000257aeb000 CR4: 0000000000042660

Dec 21 13:49:23 Node_1 kernel: [  413.223786] Stack:

Dec 21 13:49:23 Node_1 kernel: [  413.223834]  ffff88016d39ba80 00000000ffffffff 0000000000000000 ffff880265d0d9c8

Dec 21 13:49:23 Node_1 kernel: [  413.224018]  ffff8802520eb8e0 0000000000000000 000000000ee23800 ffff880265d0d808

Dec 21 13:49:23 Node_1 kernel: [  413.224225]  0000000000000001 ffff8802520eb930 ffff88016d3b8498 0000000000000000

Dec 21 13:49:23 Node_1 kernel: [  413.224431] Call Trace:

Dec 21 13:49:23 Node_1 kernel: [  413.224490]  [<ffffffff819c5ec7>] raid5_make_request+0x177/0xdb0

Dec 21 13:49:23 Node_1 kernel: [  413.224648]  [<ffffffff810b6c70>] ? wait_woken+0x80/0x80

Dec 21 13:49:23 Node_1 kernel: [  413.224749]  [<ffffffff819cf562>] md_make_request+0xe2/0x220

Dec 21 13:49:23 Node_1 kernel: [  413.224878]  [<ffffffff8147457b>] generic_make_request+0xcb/0x1a0

Dec 21 13:49:23 Node_1 kernel: [  413.225034]  [<ffffffff814746b9>] submit_bio+0x69/0x120

Dec 21 13:49:23 Node_1 kernel: [  413.225162]  [<ffffffff813be50e>] btrfs_map_bio+0xfe/0x340

Dec 21 13:49:23 Node_1 kernel: [  413.225266]  [<ffffffff81392d98>] btrfs_submit_bio_hook+0xb8/0x180

Dec 21 13:49:23 Node_1 kernel: [  413.225424]  [<ffffffff813ad796>] submit_one_bio+0x66/0xa0

Dec 21 13:49:23 Node_1 kernel: [  413.225551]  [<ffffffff813ad952>] flush_epd_write_bio+0x42/0x60

Dec 21 13:49:23 Node_1 kernel: [  413.225682]  [<ffffffff813b38a3>] extent_writepages+0x53/0x60

Dec 21 13:49:23 Node_1 kernel: [  413.225811]  [<ffffffff81394ef0>] ? btrfs_set_bit_hook+0x270/0x270

Dec 21 13:49:23 Node_1 kernel: [  413.225944]  [<ffffffff813ae64a>] ? free_extent_state+0x3a/0xa0

Dec 21 13:49:23 Node_1 kernel: [  413.226075]  [<ffffffff81391b03>] btrfs_writepages+0x23/0x30

Dec 21 13:49:23 Node_1 kernel: [  413.226205]  [<ffffffff8115aa39>] do_writepages+0x19/0x30

Dec 21 13:49:23 Node_1 kernel: [  413.226332]  [<ffffffff8114ebac>] __filemap_fdatawrite_range+0x6c/0x90

Dec 21 13:49:23 Node_1 kernel: [  413.226467]  [<ffffffff8114ec6e>] filemap_fdatawrite_range+0xe/0x10

Dec 21 13:49:23 Node_1 kernel: [  413.226627]  [<ffffffff813a789b>] btrfs_fdatawrite_range+0x1b/0x50

Dec 21 13:49:23 Node_1 kernel: [  413.226762]  [<ffffffff813d8ae2>] __btrfs_write_out_cache.isra.26+0x432/0x480

Dec 21 13:49:23 Node_1 kernel: [  413.226928]  [<ffffffff813d91a3>] btrfs_write_out_cache+0x93/0x130

Dec 21 13:49:23 Node_1 kernel: [  413.227062]  [<ffffffff8137e101>] btrfs_start_dirty_block_groups+0x211/0x430

Dec 21 13:49:23 Node_1 kernel: [  413.227203]  [<ffffffff8138fc5a>] btrfs_commit_transaction+0x15a/0xa40

Dec 21 13:49:23 Node_1 kernel: [  413.227362]  [<ffffffff813905d1>] ? start_transaction+0x91/0x4d0

Dec 21 13:49:23 Node_1 kernel: [  413.227496]  [<ffffffff8138a760>] transaction_kthread+0x1f0/0x220

Dec 21 13:49:23 Node_1 kernel: [  413.227630]  [<ffffffff8138a570>] ? btrfs_cleanup_transaction+0x5f0/0x5f0

Dec 21 13:49:23 Node_1 kernel: [  413.227792]  [<ffffffff81098cb4>] kthread+0xc4/0xe0

Dec 21 13:49:23 Node_1 kernel: [  413.232373]  [<ffffffff8102b855>] ? __switch_to+0x355/0x7a0

Dec 21 13:49:23 Node_1 kernel: [  413.236942]  [<ffffffff81c8d0bf>] ret_from_fork+0x1f/0x40

Dec 21 13:49:23 Node_1 kernel: [  413.241540]  [<ffffffff81098bf0>] ? kthread_park+0x50/0x50

Dec 21 13:49:23 Node_1 kernel: [  413.245941] Code: 0f 85 3f fd ff ff 0f 0b f3 90 8b 43 70 a8 01 75 f7 89 45 98 e9 33 fe ff ff f0 ff 83 48 02 00 00 e9 63 fd ff ff 0f 0b 0f 0b 0f 0b <0f> 0b 49 8b 84 24 a0 02 00 00 a8 10 0f 85 d4 fb ff ff f0 41 80 

Dec 21 13:49:23 Node_1 kernel: [  413.255592] RIP  [<ffffffff819c5cac>] raid5_get_active_stripe+0x5cc/0x670

Dec 21 13:49:23 Node_1 kernel: [  413.260122]  RSP <ffff8802520eb8a0>

Dec 21 13:49:23 Node_1 kernel: [  413.264518] ---[ end trace 523662f52765a413 ]---

Dec 21 13:49:23 Node_1 kernel: [  417.769176] BUG: unable to handle kernel NULL pointer dereference at           (null)

Dec 21 13:49:23 Node_1 kernel: [  417.769211] IP: [<ffffffff810b6656>] __wake_up_common+0x26/0x80

Dec 21 13:49:23 Node_1 kernel: [  417.769228] PGD 257277067 PUD 257163067 PMD 0 

Dec 21 13:49:23 Node_1 kernel: [  417.769271] Oops: 0000 [#2] SMP

Dec 21 13:49:23 Node_1 kernel: [  417.769283] Modules linked in: x86_pkg_temp_thermal coretemp crc32c_intel aesni_intel aes_x86_64 ablk_helper mei_me mei mpt3sas

Dec 21 13:49:23 Node_1 kernel: [  417.769314] CPU: 2 PID: 5598 Comm: btrfs-transacti Tainted: G      D         4.8.15-gentoo #3

Dec 21 13:49:23 Node_1 kernel: [  417.769394] Hardware name: Supermicro Super Server/X10SDV-4C-7TP4F, BIOS 1.0b 11/21/2016

Dec 21 13:49:23 Node_1 kernel: [  417.769412] task: ffff880267851a00 task.stack: ffff8802520e8000

Dec 21 13:49:23 Node_1 kernel: [  417.769412] RIP: e030:[<ffffffff810b6656>]  [<ffffffff810b6656>] __wake_up_common+0x26/0x80

Dec 21 13:49:23 Node_1 kernel: [  417.769412] RSP: e02b:ffff8802520ebe48  EFLAGS: 00010086

Dec 21 13:49:23 Node_1 kernel: [  417.769437] RAX: 0000000000000200 RBX: ffff8802520ebf18 RCX: 0000000000000000

Dec 21 13:49:23 Node_1 kernel: [  417.769439] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff8802520ebf18

Dec 21 13:49:23 Node_1 kernel: [  417.769440] RBP: ffff8802520ebe80 R08: 0000000000000000 R09: 0000000000000000

Dec 21 13:49:23 Node_1 kernel: [  417.769445] R10: 0000000000000008 R11: 0000000000000000 R12: ffff8802520ebf20

Dec 21 13:49:23 Node_1 kernel: [  417.769450] R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000003

Dec 21 13:49:23 Node_1 kernel: [  417.769505] FS:  0000000000000000(0000) GS:ffff880270c80000(0000) knlGS:ffff880270c80000

Dec 21 13:49:23 Node_1 kernel: [  417.769510] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033

Dec 21 13:49:23 Node_1 kernel: [  417.769543] CR2: 0000000000000000 CR3: 0000000257aeb000 CR4: 0000000000042660

Dec 21 13:49:23 Node_1 kernel: [  417.769627] Stack:

Dec 21 13:49:23 Node_1 kernel: [  417.769647]  0000000181022fc6 0000000000000000 ffff8802520ebf18 ffff8802520ebf10

Dec 21 13:49:23 Node_1 kernel: [  417.769652]  0000000000000200 0000000000000000 ffff8802520eb7f8 ffff8802520ebe90

Dec 21 13:49:23 Node_1 kernel: [  417.769652]  ffffffff810b66be ffff8802520ebeb8 ffffffff810b7172 ffff8802678520d8

Dec 21 13:49:23 Node_1 kernel: [  417.769652] Call Trace:

Dec 21 13:49:23 Node_1 kernel: [  417.769671]  [<ffffffff810b66be>] __wake_up_locked+0xe/0x10

Dec 21 13:49:23 Node_1 kernel: [  417.769679]  [<ffffffff810b7172>] complete+0x32/0x50

Dec 21 13:49:23 Node_1 kernel: [  417.769727]  [<ffffffff81078dc0>] mm_release+0xc0/0x160

Dec 21 13:49:23 Node_1 kernel: [  417.769757]  [<ffffffff8107ddc9>] do_exit+0x139/0xb80

Dec 21 13:49:23 Node_1 kernel: [  417.769846]  [<ffffffff81c8f227>] rewind_stack_do_exit+0x17/0x20

Dec 21 13:49:23 Node_1 kernel: [  417.769853]  [<ffffffff81098bf0>] ? kthread_park+0x50/0x50

Dec 21 13:49:23 Node_1 kernel: [  417.770077] Code: 00 00 00 00 00 55 48 89 e5 41 57 41 89 f7 41 56 41 89 ce 41 55 41 54 4c 8d 67 08 53 48 83 ec 10 89 55 cc 48 8b 57 08 4c 89 45 d0 <48> 8b 32 48 8d 42 e8 49 39 d4 4c 8d 6e e8 75 05 eb 38 49 89 d5 

Dec 21 13:49:23 Node_1 kernel: [  417.770111] RIP  [<ffffffff810b6656>] __wake_up_common+0x26/0x80

Dec 21 13:49:23 Node_1 kernel: [  417.770198]  RSP <ffff8802520ebe48>

Dec 21 13:49:23 Node_1 kernel: [  417.770202] CR2: 0000000000000000

Dec 21 13:49:23 Node_1 kernel: [  417.770210] ---[ end trace 523662f52765a414 ]---

Dec 21 13:49:23 Node_1 kernel: [  417.789818] Fixing recursive fault but reboot is needed!
```

Kernel config (gentoo-sources 4.8.15) (with or without experimental patch): http://pastebin.com/p0EcHjbu

Xen Version : 

app-emulation/xen-4.6.4-r3

app-emulation/xen-tools-4.6.4-r4

(same issue with xen 4. :Cool: 

This is happening when I'm making huge i/o on a raid 5 RAID stack.

I've to reset system to make it work again.

Here is configuration :

- 3x Hard Drives running on RAID 5 Software raid created by mdadm

- On top of it, I'm running DRBD for replication over another node (Active/passive cluster)

- On top of it, a BTRFS FileSystem with a few subvolumes

- On top of it, XEN VMs running.

Kernel bug ? Or any idea on how to fix it ?

Bests

[Moderator edit: changed [quote] tags to [code] tags to preserve output layout. -Hu]

----------

## bandreabis

UP?

----------

## MasterPrenium

Race condition ...

Fixed in 4.9.15  :Wink: 

----------

