# Gentoo sources are killing me

## curmudgeon

Initial post here:

https://forums.gentoo.org/viewtopic-t-743945.html

Sorry for the new thread, but (temporarily, I hope) on dialup it is painful to download the entire .config file each time I read the thread.

Some additional information:

This is an IBM T23 laptop.

All kernels (gentoo-sources) up to and including 2.6.23 work flawlessly on the machine.

I have never tried any 2.6.24 kernel.

I can not get any 2.6.25 or 2.6.26 kernel (gentoo-sources) to boot at all (kernel panic during startup).

Every 2.6.27 kernell (gentoo-sources) I have tried gives me the following (always about twenty minutes after startup):

```

Mar 10 11:10:01 system BUG: unable to handle kernel paging request at ff59f000

Mar 10 11:10:01 system IP: [<ff59f000>]

Mar 10 11:10:01 system *pde = 00000000

Mar 10 11:10:01 system Oops: 0000 [#1]

Mar 10 11:10:01 system

Mar 10 11:10:01 system Pid: 5, comm: events/0 Not tainted (2.6.27-gentoo-r8 #1)

Mar 10 11:10:01 system EIP: 0060:[<ff59f000>] EFLAGS: 00010286 CPU: 0

Mar 10 11:10:01 system EIP is at 0xff59f000

Mar 10 11:10:01 system EAX: 0000005e EBX: 0000005e ECX: ef859ee0 EDX: 0000000c

Mar 10 11:10:01 system ESI: ffffffff EDI: 00000000 EBP: e6f2f000 ESP: ef859e8c

```

Note the above comes from a non-modular kernel (modular kernels give the same error).

Out of desperation (and since I already had the sources), I tried the vanilla 2.6.27 kernel. It worked fine.

Any suggestions for tracking this down or overcoming it?

----------

## poly_poly-man

diff between configs?

----------

## minor_prophets

Have you tried using genkernel on the same failed sources?

Just a thought.  Definitely not a solution, though.

----------

## curmudgeon

 *poly_poly-man wrote:*   

> diff between configs?

 

I assume you mean between the .config used for gentoo-sources (2.6.27-r8) and the generic 2.6.27?

Basically none, except for the extra items present in the gentoo sources.

So we have:

```

$ diff /usr/src/linux-2.6.27-gentoo-r8/.config /usr/src/linux-2.6.27/.config

3,4c3,4

< # Linux kernel version: 2.6.27-gentoo-r8

< # Tue Mar 10 10:15:42 2009

---

> # Linux kernel version: 2.6.27

> # Tue Mar 10 13:59:54 2009

243d242

< CONFIG_X86_RESERVE_LOW_64K=y

806d804

< # CONFIG_BLK_DEV_DM_BBR is not set

1333d1330

< # CONFIG_FB_CON_DECOR is not set

1625d1621

< # CONFIG_SQUASHFS is not set

```

 *minor_prophets wrote:*   

> Have you tried using genkernel on the same failed sources?

 

I hadn't considered it. I have been building kernels for thirteen years, and I know the hardware on the machine. Plus I prefer running non-modular kernels when possible. I don't really see what it could add.

----------

## energyman76b

try latest sources - not gentoo-sources but vanilla - and turn on the bios-memory-corruption check/workaround.

----------

## cwr

I'm running a T23, but not a recent-enough kernel to show the problem.  I have

an awful feeling that it's not the kernel, but the hardware that's failing and the

"buggy" kernels simply hit the wrong piece of memory.  The only time I've had

kernel oops with my T23 they were on a memory boundary; it happened two

or three times over a couple of months, at which point I reseated the memory

and the problem hasn't returned.

Reseating the memory, and running memtest86 overnight (tho' memtest has

never found anything for me) might be a starting point.

Good luck - Will

EDIT: In fact I'm running a 24 kernel; I thought it was earlier. I've had no problems

         with it, but unfortunately I don't plan to upgrade any time soon.Last edited by cwr on Thu Mar 12, 2009 3:03 pm; edited 1 time in total

----------

## curmudgeon

 *energyman76b wrote:*   

> try latest sources - not gentoo-sources but vanilla - and turn on the bios-memory-corruption check/workaround.

 

Do you mean latest stable? I actually tried vanilla 2.6.27.12 (since I wrote this post), and the kernel crashes whenever I try to start kde (or xorg).

```

Mar 11 15:21:24 system mtrr: base(0xe4000000) is not aligned on a size(0x5000000) boundary

Mar 11 15:21:24 system invalid opcode: 0000 [#1]

Mar 11 15:21:24 system

Mar 11 15:21:24 system Pid: 2430, comm: X Not tainted (2.6.27.12 #1)

Mar 11 15:21:24 system EIP: 0060:[<00000053>] EFLAGS: 00013246 CPU: 0

Mar 11 15:21:24 system EIP is at 0x53

Mar 11 15:21:24 system EAX: ef8c5800 EBX: 00000053 ECX: eed9c580 EDX: 00000000

Mar 11 15:21:24 system ESI: 00000000 EDI: 00000076 EBP: ef8c5800 ESP: eee79f50

Mar 11 15:21:24 system DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068

Mar 11 15:21:24 system Process X (pid: 2430, ti=eee78000 task=eedca360 task.ti=eee78000)

Mar 11 15:21:24 system Stack: c02276dc eed9c580 00000002 ef8c5824 eed74300 c022753b 00000006 eee78000

Mar 11 15:21:24 system c0153d81 0900a1e8 eed74300 0900a1e8 c0153f6e 00000006 efb73f00 fffffff7

Mar 11 15:21:24 system eed74300 fffffff7 00000006 eee78000 c0153fad 0900a1e8 00000000 00000006

Mar 11 15:21:24 system Call Trace:

Mar 11 15:21:24 system [<c02276dc>] drm_ioctl+0x1a1/0x229

Mar 11 15:21:24 system [<c022753b>] drm_ioctl+0x0/0x229

Mar 11 15:21:24 system [<c0153d81>] vfs_ioctl+0x39/0x48

Mar 11 15:21:24 system [<c0153f6e>] do_vfs_ioctl+0x1de/0x1f1

Mar 11 15:21:24 system [<c0153fad>] sys_ioctl+0x2c/0x43

Mar 11 15:21:24 system [<c01029d9>] sysenter_do_call+0x12/0x25

Mar 11 15:21:24 system [<c0246400>] mdio_ctrl+0x5f/0xf7

Mar 11 15:21:24 system kdm: :0[2432]: IO Error in XOpenDisplay

Mar 11 15:21:24 system =======================

Mar 11 15:21:24 system Code:  Bad EIP value.

Mar 11 15:21:24 system EIP: [<00000053>] 0x53 SS:ESP 0068:eee79f50

Mar 11 15:21:24 system ---[ end trace 022b2545cee56b14 ]---

Mar 11 15:21:24 system kdm[2427]: Display :0 cannot be opened

Mar 11 15:21:24 system kdm[2427]: Unable to fire up local display :0; disabling.

Mar 11 15:21:26 system [drm:drm_release] *ERROR* Device busy: 1 0

```

Where is this bios corrpution set?

In the help for "Processor type and features" / "Reserve low 64K of RAM on AMI/Phoenix BIOSen" , I see:

"If you have doubts about the BIOS (e.g. suspend/resume does not work or there's kernel crashes after certain hardware hotplug events) and it's not AMI or Phoenix, then you might want to enable X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check typical corruption patterns."

However:

```

linux-2.6.27.12 # grep -r X86_CHECK_BIOS_CORRUPTION .

./arch/x86/Kconfig:      X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check typical

linux-2.6.27.12 #

```

----------

## curmudgeon

 *cwr wrote:*   

> I'm running a T23, but not a recent-enough kernel to show the problem.  I have
> 
> an awful feeling that it's not the kernel, but the hardware that's failing and the
> 
> "buggy" kernels simply hit the wrong piece of memory.  The only time I've had
> ...

 

Very interesting. You are about the fourth person who has suggested memtest86. I have run it several times (for hours at a time), including just a couple of days ago, and it never found anything. I guess it cant hurt to reseat the memory. I will try it.

----------

## curmudgeon

No luck with the memory. The machine has two dimms - a 256 MB IBM (Samsung lable on the back) piece, and a 512 MB Kingston piece. I tried running each one separately, swapping slots, and everything else I could think of. Nothing helped.

----------

## energyman76b

since your crash happen with vanilla sources and your kernel is not tainted:

go to lkml

----------

