# Blind kernel upgrade, need a bit of advice/help

## randalla

One of our servers is currently running kernel 2.6.20-gentoo-r8, and I'm trying to get it upgraded. Unfortunately, I don't have easy physical access to it. Instead, I only have a ssh connection.

Because of udev stating that I need to be running something above 2.6.25 I need to get the machine upgraded. I configured 2.6.31-gentoo-r6, and thought I went through all the configuration necessary for this machine (I didn't do the initial build, but there's nothing overly special about it). Unfortunately, it seems that my new kernel is panicking and not coming up. Thankfully, my bootloader is smart enough to bounce back to the default after a panic thus I still have ssh access to it, though without the new kernel.

Anyway, I'd like to know what is panicking the new kernel so that I can fix it in the build. As I'm blind without any visual interface to the machine it's difficult for me go guess. Is there a way for it to log out the errors to the file system, or even do some type of remote debugging while the kernel is coming up?

I guess, alternately, I'd be willing to push up my (novice) kernel .config, as well as information on the machine.

Thanks for any time you might have to spare.

Adam.

----------

## Hu

You can redirect output over a serial console (ttyS0) or netconsole.  It is very likely that the panic occurs before the filesystem is writable, so there is no way to record anything of value there.

----------

## Mike Hunt

Could you test the new kernel .config on a similar machine to which you do have physical access?

----------

## randalla

 *Mike Hunt wrote:*   

> Could you test the new kernel .config on a similar machine to which you do have physical access?

 

Yeah, that's probably what I'm going to end up having to do. I've rebuilt the kernel, and painstakingly compared the make menuconfig between a working 2.6.30 and this new 2.6.31. Since the 2.6.30 works like a charm, it makes me wonder what on earth is happening in the 2.6.31. There's differences in the hardware, but I didn't think enough to cause this.

However, I have a sneaking suspicion that lspci may be lying to me and that I have a different MegaRAID card installed that supports SAS. The Dell service tag says that the machine has SAS drives in it, yet lspci is claiming UltraSCSI320. I rebuilt the kernel and am testing it out now.

More digging necessary it seems.

----------

## randalla

 *randalla wrote:*   

>  *Mike Hunt wrote:*   Could you test the new kernel .config on a similar machine to which you do have physical access? 
> 
> Yeah, that's probably what I'm going to end up having to do. I've rebuilt the kernel, and painstakingly compared the make menuconfig between a working 2.6.30 and this new 2.6.31. Since the 2.6.30 works like a charm, it makes me wonder what on earth is happening in the 2.6.31. There's differences in the hardware, but I didn't think enough to cause this.
> 
> However, I have a sneaking suspicion that lspci may be lying to me and that I have a different MegaRAID card installed that supports SAS. The Dell service tag says that the machine has SAS drives in it, yet lspci is claiming UltraSCSI320. I rebuilt the kernel and am testing it out now.
> ...

 

Well, the last build didn't work. I'm pretty sure that it's hitting the drives, now that I think about it. The first time I rebooted this with the new kernel, it took about 530 seconds to come back. Subsequent reboots have been about 250 seconds. I'm thinking that the first one triggered a e2fsck on the root drive since it hadn't been run in a few months (server after all).

----------

## randalla

 *randalla wrote:*   

>  *randalla wrote:*    *Mike Hunt wrote:*   Could you test the new kernel .config on a similar machine to which you do have physical access? 
> 
> Yeah, that's probably what I'm going to end up having to do. I've rebuilt the kernel, and painstakingly compared the make menuconfig between a working 2.6.30 and this new 2.6.31. Since the 2.6.30 works like a charm, it makes me wonder what on earth is happening in the 2.6.31. There's differences in the hardware, but I didn't think enough to cause this.
> 
> However, I have a sneaking suspicion that lspci may be lying to me and that I have a different MegaRAID card installed that supports SAS. The Dell service tag says that the machine has SAS drives in it, yet lspci is claiming UltraSCSI320. I rebuilt the kernel and am testing it out now.
> ...

 

Ugh, it's getting late. Just realized that the service tag that I have been looking at is for a different server than the one I'm looking at  :Sad: 

----------

## randalla

 *randalla wrote:*   

>  *randalla wrote:*    *randalla wrote:*    *Mike Hunt wrote:*   Could you test the new kernel .config on a similar machine to which you do have physical access? 
> 
> Yeah, that's probably what I'm going to end up having to do. I've rebuilt the kernel, and painstakingly compared the make menuconfig between a working 2.6.30 and this new 2.6.31. Since the 2.6.30 works like a charm, it makes me wonder what on earth is happening in the 2.6.31. There's differences in the hardware, but I didn't think enough to cause this.
> 
> However, I have a sneaking suspicion that lspci may be lying to me and that I have a different MegaRAID card installed that supports SAS. The Dell service tag says that the machine has SAS drives in it, yet lspci is claiming UltraSCSI320. I rebuilt the kernel and am testing it out now.
> ...

 

Whee, finally. I got it up with 2.6.31-r6. Turns out I was using the wrong MegaRAID driver. I was stupid and using the LSI Logic Legacy MegaRAID Driver instead of the LSI Logic New Generation RAID Device Drivers. Once I put that into play, it worked as expected. Well, one thing is for sure, I'm more than impressed with Linux's ability to completely and totally flame out, and at the same time be recoverable.

I think it's time to go home.

Adam.

----------

