# kernel upgrade disaster: panic on root mount [solved]

## jesnow

I just upgraded from 3.5.7 to 3.7.9 when it went stable.  The 3.7.9 kernel panics when it tries to mount root. (EXT4). Strange, but OK, I'll just use the old kernel until I figure out what's going on. But now the old kernel panics too. A still older kernel can't deal with the new udev and mounts root, but doesn't make it to a login. 

I'm mystified. At first I thought maybe LILO had overwritten the partition table (that was before 3.2.5 succeeded in mounting root). I booted with a sabayon usb stick and can see everything, so I haven't lost data, but now I don't know what to fix.

Help me out here. I don't even know where to start! 

Cheers, 

jon.Last edited by jesnow on Wed Feb 27, 2013 6:49 pm; edited 2 times in total

----------

## Hu

Pick a problem to solve first, and provide enough information that we can help you solve it.  Maybe you want to abandon the old kernels, make 3.7.9 work, and never look back.  Maybe you want to make 3.5.7 work again, and fix 3.7.9 afterward.  Regardless, please provide at least the last 25 lines of output before the panic, and specify which problem you have selected.

----------

## jesnow

If I had a lot of troubleshooting info, I would probably have solved it myself. I need to develop some hypotheses and build a decision tree, and without going to the extra step of capturing the output from a panicking kernel (set the console to the serial port I haven't got?), I'm kind of stuck. 

I typed in a few lines from the panic output -- here they are. 

The top line on the screen starts out:

```

Call Trace:

[<c1434e79>] ? panic+0x7b/0x159

[<c1601c50>] ? mount_block_root+0x1df/0x1ef

[<c1601dba>] ? mount_root+0x7d/0xd3

```

It goes on like that for 6 more lines, then

```

panic occurred, switching back to text console

----------------[ cut here ]--------------------

WARNING: at arch/x86/kernel/apic/ipi.c:113 default_send_IPI_mask_logical+0x55/0xad

Hardware name: VGN-TZ90S

Modules linked in:

Pid: 1, comm: swapper/0 Not tainted 3.7.9-gentoo #1

Call Trace:

```

Then more addresses.  It looks to me like it panicked *again* while trying to get to a text console. 

So far I have verified that both the partition table and file system are sound, and supported by both kernels. That leaves a bad link in LILO's internal table for the location of / as the only possibility I can think of to test.  

Not sure how to do that or remediate it, or if there's another possibility.

----------

## baaann

No answers just some ideas(may be well wide of the mark)

I think that Hu has asked which kernel you wish to go with because some config locations changed >=3.7(certainly with video, but maybe other devices?)

Have you enabled devtmpfs in your config? Udev requires that.

Given that your system was booting prior to upgrading the kernel, maybe post your config and device details(should be able to get these with a livecd) so that the experts(doesn't include me  :Smile:  ) can check it out

----------

## Hu

 *jesnow wrote:*   

> I need to develop some hypotheses and build a decision tree, and without going to the extra step of capturing the output from a panicking kernel (set the console to the serial port I haven't got?), I'm kind of stuck.

 Do you have no way at all to capture the contents of the screen?  No camera, either in a phone or freestanding?  No netconsole support?

baaann is correct.  You need to pick which kernel you want to fix first.  The 3.5 kernel is an easy choice, because we know it worked before an unspecified change to the system.  The 3.7 kernel is a harder choice since we do not know if it would start working when you fix 3.5 or if there is something else wrong with it.

----------

## jesnow

Thanks for explaining. I think fixing the 3.5 kernel makes sense, as that one worked. I'm getting it chrooted right now.

UPDATE: I chrooted it and cleaned the ~20 old kernels out of lilo.conf, on the off chance that an overstuffed MBR was responsible. 

The Sabayon distro I used has 3.7.0, and what used to be /dev/hda (the original ssd on the TZ90) is now /dev/sda. I hope that change won't cause trouble down the road. Anyway LILO ran fine. But didn't solve the problem.

----------

## jesnow

OK here are the files:

379 panic screenshot: http://www.fileswap.com/dl/tFeZpndKIq/

379 config file: http://www.fileswap.com/dl/PtDgMqyF/

357 panic screenshot: http://www.fileswap.com/dl/WxkR7IYQOH/

357 config file: http://www.fileswap.com/dl/Avi3C56MAJ/

I have it chrooted and I can compile a kernel, but still don't know what the problem is.

----------

## jburns

Is your 357 kernel trying to mount the correct disk?

----------

## jesnow

 *jburns wrote:*   

> Is your 357 kernel trying to mount the correct disk?

 

That's a very good question. 

All the symptoms seem to point that way, don't they. I did notice

that the disk names had changed when I edited lilo.conf just now. It used to be that all disks were /dev/hdx,

except for scsi which were /dev/sdx. Now (since the new udev?) all disks are /dev/sdx, and 

the *IDE* ssd that came builtin to the TZ-90 is no longer /dev/hda but /dev/sda, meaning that the sata 

I'm booting from was /dev/sda and is now /dev/sdb. 

That's a hypothesis to test when I get to work. BUT, how would lilo have found the kernel image just now 

when I ran it? But that's definitely a line to follow. I don't think kernel configuration is the issue.

----------

## jesnow

Success. 

The problem was in fact that the device naming had changed. 

I seem to recall disabling CONFIG_IDE as instructed by somebody. I presume the sata driver owns the ide code now and gives ide devices /dev/sdx device files, so that changed the device naming. Duh. It was so mysterious before and so obvious now. 

The internal IDE ssd became /dev/sda and the SATA SSD became /dev/sdb -- I had always pointed root at /dev/sda1 and only needed to change it to /dev/sdb1 for both kernels to work. 

Thanks jburns for coming up with the key question to ask! 

I'm back in business!

Jon.

----------

