# BUG: Bad page map in process xz

## Simdol

Hello,

I've been seeing this message on my jouralctl for every single second (literally), and I wanted to know what is the cause for this message and what I should do to resolve this. Would it be fine if I were to ignore this message? Or is this message something that I should be concerned about? Apart form this message appearing in my journalctl, I don't see apparent issue with my installation. Before I begin anything else, I would like to note that I am currently using pf-sources as my kernel source and this issue seems to still persist on the gentoo-source on the latest stable version. The same kernel seems to run fine on my laptop without any notable issues but my desktop seems to not like linux in general.

```
Mar 28 11:34:54  kernel: BUG: Bad page map in process xz  pte:26db71025 pmd:113434067

Mar 28 11:34:55  kernel: page:ffffea0009b6dc40 count:62 mapcount:-197 mapping:ffff8801ccc36ad0 index:0x15e

Mar 28 11:34:55  kernel: flags: 0x20000000000086c(referenced|uptodate|lru|active|private)

Mar 28 11:34:55  kernel: page dumped because: bad pte

Mar 28 11:34:55  kernel: addr:00007fdca96de000 vm_flags:00000075 anon_vma:          (null) mapping:ffff8801ccc36ad0 index:15e

Mar 28 11:34:55  kernel: file:libc-2.21.so fault:filemap_fault mmap:btrfs_file_mmap readpage:btrfs_readpage

Mar 28 11:34:55  kernel: CPU: 0 PID: 19769 Comm: xz Tainted: P    B      O    4.4.0-pf6 #15

Mar 28 11:34:55  kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X99-Gaming 5, BIOS F20 01/12/2016

Mar 28 11:34:55  kernel:  0000000000000000 ffff88011b0cfc70 ffffffff81689b78 00007fdca96de000

Mar 28 11:34:55  kernel:  ffff880254b132e0 ffff88011b0cfcc0 ffffffff81215fad 00000001e09f8025

Mar 28 11:34:55  kernel:  00003ffffffff000 00000001236f7067 00007fdca9711000 00007fdca96de000

Mar 28 11:34:55  kernel: Call Trace:

Mar 28 11:34:55  kernel:  [<ffffffff81689b78>] dump_stack+0x4d/0x65

Mar 28 11:34:55  kernel:  [<ffffffff81215fad>] print_bad_pte+0x1bd/0x280

Mar 28 11:34:55  kernel:  [<ffffffff81217ca5>] unmap_single_vma+0x735/0x790

Mar 28 11:34:55  kernel:  [<ffffffff81218525>] unmap_vmas+0x45/0xa0

Mar 28 11:34:55  kernel:  [<ffffffff81220825>] exit_mmap+0xc5/0x180

Mar 28 11:34:55  kernel:  [<ffffffff81123288>] mmput+0x38/0xd0

Mar 28 11:34:55  kernel:  [<ffffffff81127761>] do_exit+0x2f1/0xb60

Mar 28 11:34:55  kernel:  [<ffffffff8113f776>] ? task_work_run+0x76/0x90

Mar 28 11:34:55  kernel:  [<ffffffff81128dc0>] do_group_exit+0x40/0xa0

Mar 28 11:34:55  kernel:  [<ffffffff81128e2f>] SyS_exit_group+0xf/0x10

Mar 28 11:34:55  kernel:  [<ffffffff81c84857>] entry_SYSCALL_64_fastpath+0x12/0x6a

systemd[1]: Looping too fast. Throttling execution a little.

systemd[1]: Looping too fast. Throttling execution a little.

systemd[1]: Looping too fast. Throttling execution a little.

```

I am running stable version of Gentoo without '~amd64' tag with exception of few software that needs that tag (genkerne) due to systemd. The weird thing to note is that once I re-emerge libc again, the error message seems to disappear after awhile. Here are some information that may be useful:

Hardware Specification

```

Intel X99 platform (Gigabyte X99 Gaming 5)

Intel Core i7 Haswell-E 5820k

NVIDIA GTX 970

Micron Crucial DDR4 2133Mhz

```

make.conf

```

CHOST="x86_64-pc-linux-gnu"

CFLAGS="-O2 -pipe -march=haswell -mmmx -mno-3dnow"

CXXFLAGS="${CFLAGS}"

MAKEOPTS="-j13"

INPUT_DEVICES="evdev" #synaptics"

VIDEO_CARDS="nvidia"

GRUB_PLATFORMS="efi-64"

USE="-qt4 -kde X dbus gtk gnome vdpau vaapi xvmc samba"

PORTDIR="/usr/portage"

DISTDIR="${PORTDIR}/distfiles"

PKGDIR="${PORTDIR}/packages"

```

Kernel Config: http://pastebin.com/BDiurVRb

Thank you.

----------

## eccerr0r

indeed this is not normal.  It seems hardware but could be software-hardware interaction.

same kernel as in binary or source compared to your laptop?

I guess this came up recently too: updated firmware on the desktop?  Are you overclocking your desktop?

----------

## Simdol

 *eccerr0r wrote:*   

> indeed this is not normal.  It seems hardware but could be software-hardware interaction.
> 
> same kernel as in binary or source compared to your laptop?
> 
> I guess this came up recently too: updated firmware on the desktop?  Are you overclocking your desktop?

 

I've been recently switching over to Gentoo from Arch Linux distribution and I've not had an issue like this over Arch Linux. I've compiled both source from the scratch, using the configuration posted on the pastebin above. I am using latest stable version for my desktop's motherboard directly from the motherboard vender, Gigabyte. Indeed, I've been overclocking my desktop -- which I am sure that it is stable at this point as I've been using this overclocked configuration for decades (even in Arch Linux) now. If you are sure that it is the overclocked setting that may have caused this issue, please feel free to inform me; I will test this out with default BIOS settings.

----------

## eccerr0r

If another distribution works, perhaps it was compiled with different CFLAGS and possibly not as cycle efficient as Gentoo or perhaps compiled with a different compiler or version.

But likely you're hitting limits of your CPU.  But there's no way for me to tell, you have to experimentally test it.  Ideally there should be a flag on bug reports to make it easy to weed out overclockers as it's too easy to mistake a software problem for a hardware problem.

Yes time will tell whether a system is stable, but how could you have been running your Haswell for decades?  It hasn't been out for that long.  My Sandybridge is just about 5 years old, and not to mention the wires will wear out in a CPU so maximum speed limit will go down as the part ages.

\|/ yes I have a K unlocked chip too... though I've stopped overclocking as I ended up choosing longevity and stability over speed (cpu gets too warm when overclocking.)  I should go fix my sig though the chip still runs fine at 4.1/stockvoltage/stockheatsinkfan...

----------

## Simdol

 *eccerr0r wrote:*   

> If another distribution works, perhaps it was compiled with different CFLAGS and possibly not as cycle efficient as Gentoo or perhaps compiled with a different compiler or version.
> 
> But likely you're hitting limits of your CPU.  But there's no way for me to tell, you have to experimentally test it.  Ideally there should be a flag on bug reports to make it easy to weed out overclockers as it's too easy to mistake a software problem for a hardware problem.
> 
> Yes time will tell whether a system is stable, but how could you have been running your Haswell for decades?  It hasn't been out for that long.  My Sandybridge is just almost 5 years old... Not to mention the wires will wear out in a CPU so maximum speed limit will go down as the part ages.

 

Apologies for my poor wording. I've had this overclocked configuration since December of 2014 and have not changed the overclocked setting as it seemed stable after 2 days of stress testing. I am not sure if my CPU degraded overtime but I don't see how it could degrade that quickly enough for the system to fail, given that the CPU temperature at max load was reasonable and the vcore voltage supplied to the CPU wasn't excessive. The other reason that I assume is that this may have been due to poor use of CFLAGS which resulted in this issue. Could you take a look into this?

My current CFLAG is: '-O2 -pipe -march=native', used to be '-O2 -pipe -march=haswell -mmmx -mno-3dnow', but as it was failing to compile numerous packages including systemd with 32bits support, I've had to change it recently. I've followed up the guide for Safe CFLAGS but I am unsure what to change at this moment.

Here is the output of 'diff march.s native.s'  after following the guide for https://wiki.gentoo.org/wiki/Safe_CFLAGS

```

18,19c18,19

< # -m128bit-long-double -m64 -m80387 -maes -malign-stringops -mavx -mavx2

< # -mbmi -mbmi2 -mcx16 -mf16c -mfancy-math-387 -mfma -mfp-ret-in-387

---

> # -m128bit-long-double -m64 -m80387 -mabm -maes -malign-stringops -mavx

> # -mavx2 -mbmi -mbmi2 -mcx16 -mf16c -mfancy-math-387 -mfma -mfp-ret-in-387

```

Thank you,

----------

## eccerr0r

Find out what Arch uses and see if you can use the same gcc compiler as Arch since it works.

You may have to also recompile your kernel with the same gcc as Arch.

You may also need to try with generic x86_64 to see if it has any effect.

----------

## Simdol

 *eccerr0r wrote:*   

> Find out what Arch uses and see if you can use the same gcc compiler as Arch since it works.
> 
> You may have to also recompile your kernel with the same gcc as Arch.
> 
> You may also need to try with generic x86_64 to see if it has any effect.

 

Thank you very much. After recompiling most of the base system with native '-march=native', the recurring message in the 'dmesg' and 'journalctl' seems to be gone! It is apparent now that somehow -march=haswell doesn't support my CPU (Haswell-E 5820k), either missing instruction or having additional non-usable instruction. Those who are having the same issue with the 5820k CPU, try to compile your package with CFLAG of '-O2 -pipe -march=native' as it did the trick for me.

Thank you.

----------

## eccerr0r

I still wonder if it works without overclocking.

And if there are anyone with a Haswell working with -march=haswell .. if nobody has it, gcc needs to fix it, but the errors seem too random to be a gcc bug versus hardware issue.

----------

