# Xen Dom0 crashes regularlay since Kernel 2.6.18-xen-r12

## rfolkerts

Hi,

in December we updated our Gentoo-Xen Dom0-Machine; amongst these Updates was the latest Xen 3.3.0.

After this Update we booted the "old" 2.6.21-Xen Kernel. It did (and does) boot fine but after running a few minutes the System loses Network; there is no message in /var/log/messages or dmesg, but neither the Hypervisor nor it's DomUs can be reached (ping, ssh). We also were unable to compile the 2.6.21, as there seems to be a Problem with installed Header-Files.

No Problem, as the 2.6.21 was "deprecated" we choose the 2.6.18 (r12) which compiled and booted 1a.

Unfortunately, since then this machine crashes every few days (see below for stack trace).

We tried to update world to a more recent GCC, following the Gentoo Documentation (from 3.4.6 to i686-pc-linux-gnu-4.1.2) -- but that didn't help.

Next we searched for the "swiotlb_map_sg" Crash-Point in the Net and found several references. The hints there where to add a "swiotlb=n" Kernel-Parameter.

From what we understand this Parameter controls the size of a Table that's being used by the Code for use as DMA Buffer.

However, we didn't find a "rule" explaining what value to use under which conditions.

So, we added "swiotlb=512".

(On a Novell-Site were hints that asked to set to 2, in some Forums/MailingLists people reported to have set it up to 4096).

However, the Problem still occurs.

Now, before we blindly set "swiotlb" to some unrealistic Values, does someone have a hint on what might be going on there? The System did run 1a rock-solid with Kernel 2.6.21, so I hope to get it somewhat stable again...

The System is a 16G Dual Xeon-Server with Intel MB (unfortunately) still running x86 w. PAE (we didn't change that sine we started with Xen 3.0 several Years ago, it definitely should be updated to x64 -- however, before that step it should just ran stable again). Disks are connected via a 3Ware 9550SX Controller. The System hosts 14 DomUs running x86/PAE Linux. It also runs a NFS-Server for sharing Data between the DomUs which these mount.

We didn't change enything re. this setup, i.e. this System "as is" just with older "world" ran 2a with Kernel 2.6.21.

Any hint would be really great!

Cheers,

_ralf_

```

Oops: 0000 [#1]

SMP

Modules linked in: uhci_hcd ehci_hcd usbcore e1000

CPU:    0

EIP:    0061:[<c0109f45>]    Not tainted VLI

EFLAGS: 00010002   (2.6.18-xen-r12 #5)

EIP is at range_straddles_page_boundary+0x30/0xee

eax: c04e0000   ebx: eb126000   ecx: 000eb126   edx: 000eb126

esi: 00000000   edi: 00002000   ebp: 00000003   esp: ecf25a9c

ds: 007b   es: 007b   ss: 0069

Process nfsd (pid: 3610, ti=ecf24000 task=ec09f550 task.ti=ecf24000)

Stack: 00000030 e27e9ec0 eb126000 00000000 0ff26000 00000003 c022ee00 00001000

       00000000 00002000 00000000 00000002 e27e9ec0 ed744048 00000000 00000000

       00000000 ed744048 00000002 da1c71c0 ed2ee880 c02cc553 00000000 00000000

Call Trace:

 [<c022ee00>] swiotlb_map_sg+0x13c/0x26c

 [<c02cc553>] twa_scsiop_execute_scsi+0x3a5/0x6e1

 [<c02c0a8f>] scsi_done+0x0/0x16

 [<c02cc8f7>] twa_scsi_queue+0x68/0xe3

 [<c02c0ec4>] scsi_dispatch_cmd+0x130/0x210

 [<c02c4e42>] scsi_request_fn+0x183/0x346

 [<c021d15b>] __generic_unplug_device+0x1f/0x25

 [<c021e100>] __make_request+0xee/0x370

 [<c014014b>] mempool_alloc+0x1f/0xcb

 [<c021c555>] generic_make_request+0xea/0x156

 [<c0164d45>] bio_clone+0x28/0x2d

 [<c02e99b3>] __map_bio+0x2e/0x73

 [<c02ea357>] __split_bio+0x284/0x358

 [<c02c0939>] scsi_finish_command+0x3c/0x40

 [<c02ea5f1>] dm_request+0xbc/0xf2

 [<c021c555>] generic_make_request+0xea/0x156

 [<c0298f35>] evtchn_do_upcall+0xc7/0x1e2

 [<c015c33c>] kmem_cache_alloc+0xb4/0xba

 [<c021e57d>] submit_bio+0x6b/0x109

 [<c016400c>] bio_alloc_bioset+0x78/0x134

 [<c0160c11>] submit_bh+0xc0/0x10d

 [<c0162425>] __block_write_full_page+0x1b0/0x328

 [<c019ab2e>] ext3_get_block+0x0/0xcb

 [<c0162859>] block_write_full_page+0xf8/0x100

 [<c019ab2e>] ext3_get_block+0x0/0xcb

 [<c019c37c>] ext3_ordered_writepage+0xe5/0x1ad

 [<c019921d>] bget_one+0x0/0x7

 [<c0147827>] dec_zone_page_state+0x30/0x5f

 [<c017ff7c>] mpage_writepages+0x149/0x3a2

 [<c019c297>] ext3_ordered_writepage+0x0/0x1ad

 [<c0142b30>] do_writepages+0x35/0x37

 [<c013df99>] __filemap_fdatawrite_range+0x66/0x72

 [<c013e1cb>] filemap_fdatawrite+0x23/0x27

 [<c01dfb1a>] nfsd_sync+0x3e/0x96

 [<c01e0284>] nfsd_open+0xe4/0x132

 [<c01e043e>] nfsd_commit+0x93/0xa7

 [<c01e6ce1>] nfsd3_proc_commit+0xde/0xf7

 [<c01dc6b2>] nfsd_dispatch+0x82/0x1b9

 [<c0366e0a>] _spin_lock_bh+0x8/0x18

 [<c0356e31>] svc_process+0x3de/0x6ba

 [<c0366e0a>] _spin_lock_bh+0x8/0x18

 [<c0359732>] svc_recv+0x3d7/0x4ad

 [<c01dcc42>] nfsd+0x19e/0x32c

 [<c01dcaa4>] nfsd+0x0/0x32c

 [<c0102ac5>] kernel_thread_helper+0x5/0xb

Code: ec 08 89 c3 25 ff 0f 00 00 8d 3c 08 81 ff 00 10 00 00 77 0a 31 c0 83 c4 08

 5b 5e 5f 5d c3 89 d9 0f ac d1 0c 89 ca a1 20 96 47 c0 <0f> a3 08 19 c0 85 c0 75

 e0 0f b6 05 22 49 43 c0 88 44 24 07 89

EIP: [<c0109f45>] range_straddles_page_boundary+0x30/0xee SS:ESP 0069:ecf25a9c

```

----------

## trikolon

did you try this kernel too: http://code.google.com/p/gentoo-xen-kernel/downloads/list

----------

## rfolkerts

 *trikolon wrote:*   

> did you try this kernel too: http://code.google.com/p/gentoo-xen-kernel/downloads/list

 

Hi,

wow, no! I was not aware of that Project; only looked for more up-to-date xen-kernels in Portage (and checked the Kernel-Log on Heise - OpenSource).

Will check that and give it a try! Will post my experience  :Wink: 

Thanks!

_ralf_

----------

## rfolkerts

Hi,

just a short update:

I did have a look at the Google-Gentoo-Xen-Kernel Project but was a bit reluctant to give it a try.

However, I remembered that with the Update to Xen 3.3 I removed the "dom0_mem" Line from Xen's Grub-Config.

So, I added -using the "old" Value- that line -- and the machine did not crash again yet (while it used to crash at least once a week w/o that Parameter it keeps running since a few weeks now).

The Parameter was (and now again is) set to: dom0_mem=262144

W/o that Paramter the Dom0-Machine did have ~1.8G RAM available.

Just write this here in case someone else runs into the same Problem!

Cheers,

_ralf_

----------

## linuxtuxhellsinki

I used to have some problems with "dynamic" memory in dom0 and e1000 nic, but they went away with static memory allocation. You can also use dom0_mem=256M with xen-3.* versions and it's easy to increase memory of dom0 with xm if needed.

----------

## rfolkerts

Hi,

thanks for the reply!

Well, I had in mind the "m" suffix but was to lazy to look it up (and as the machine kept crashing ~once a week and I would not have bet that the "solution" would help at all I just put in the old entry quickly). Nevertheless, thanks for pointing me to that!

Cheers,

_ralf_

(Much more relaxed as the Hypervisor uses to work rock-solid again).

----------

