# Computer won't boot with newer kernels [solved]

## Timmer

I'm resurrecting my old laptop for my kid to do remote school with, so it's been about 9 months since it was last updated.  I got everything working again, except the kernel.  It was running 5.4.42 when I started.  I upgraded to 5.8.0-r1, and then later to 5.8.1, but it doesn't boot when using those.  Grub says that it's loading the kernel, and then loading the initramfs.  But then it just hangs there.

I tried the 5.7.8 kernel as well, and have the same problem.  But the 5.4 series (all the way up to 5.4.58 ) works just fine. Obviously, I haven't tried 5.5, or 5.6, since they're not in the tree anymore.  I tried removing my initramfs, but it still hangs (and 5.4.58 still boots), so I'm fairly certain the issue is something to do with the kernel itself.

Is there anything I can do to get some more output to get some sort of idea?  I tried adding the debug flag to my kernel line in grub, but it's not getting that far, so while I have additional output on the 5.4 kernel, I'm still getting nothing on the 5.8 kernel.Last edited by Timmer on Mon Sep 14, 2020 1:34 am; edited 1 time in total

----------

## Banana

How do you upgrade and build the kernel?

I usually do it manually and start with copy over the .config file from the old kernele dir to the new one. Run make olddefconfig and then proceed with the building procress.

----------

## Tony0945

Try this script.  Note that it uses make oldconfig instead of make olddefconfig.

NPROC calculation assumes two threads per core. if your CPU is old, you might just want to set it to a fixed number.

```
#!/bin/bash

#test if running on a bare VT

if [ "$TERM" = linux ]; then

    TERM=xterm     #so make menuconfig will display correctly

fi

#NPROC=$( nproc ) # run nproc to get the number of cores

NPROC=$(( 2 * $(nproc)  ))

cd /usr/src/linux || { echo "Did you forget 'eselect kernel set' ?" >&2 && exit 255 ; }

if [ "$1" != "" ]; then

    ( cp "$1" .config && echo "Config is $1") || exit 255;

else

    echo "Using present kernels built-in config"

    zcat /proc/config.gz >.config

fi

# At this point we should compare kernel versions and make oldconfig if the base version has updated

make oldconfig

make menuconfig

make -j"${NPROC}" || {  echo "make -j${NPROC} failed"; exit 1; }

make -j"${NPROC}"  modules_install || { echo "make modules_install failed";  exit 2; }

make -j"${NPROC}" install && echo "Don't forget to update boot loader menu"

#out of kernel modules

echo "Building out of kernel modules"

emerge @module-rebuild

echo "Done"

```

----------

## Timmer

I copied the 5.4 config from /boot to the new kernel directory, and then did make menuconfig to get defaults for new values.  And then make -j9, make install, make modules_install.  Ultimately, the process is similar to Tony's script, but I'll give that a shot here and post back.

I also tried with just a fresh kernel, and copying over a config from my other computer which is working (though I didn't have much hope for that one, since the hardware is different)

----------

## Tony0945

The script does depend on having these kernel options set. 

```
 $ zgrep CONFIG_IKCONFIG /proc/config.gz

CONFIG_IKCONFIG=y

CONFIG_IKCONFIG_PROC=y
```

They are set if you have /proc/config.gz which is a good thing to have in its own right.

Whatever you do, don't edit .config with a text editor.

----------

## Timmer

Yeah, I was just discovering that, but now lunch is over and it's time to get back to work.  So I'll adjust my 5.4 kernel this evening, and then try 5.8 again then.

----------

## Banana

any changes in the bios which could make new options needed in the kernel?

----------

## Timmer

Okay, so after adding ikconfig to my 5.4 kernel, that script worked like a dream to build a new one.  But after doing so, the system still hung in the same place. So it didn't do anything to help solve the problem.

@Banana, nope, I haven't made any changes to the BIOS on this machine is a long time - probably since ~2015

Any other ideas, or next steps?

----------

## Hu

Are there any interesting differences between the Kconfig of the working and non-working kernels?  Could you try an intervening kernel version, to narrow down which major version first showed the problem?

----------

## Tony0945

Since it's needed for school (distance learning I assume), I'd just stay on stable kernel 5.4.48 which is latest stable.  I removed gentoo-sources from ACCEPT_KEYWORDS. I got tired of building and testing every week.

You do have me curious so I might unmask the tip of the tree and try it out of curiosity.  

Did you take all the defaults make oldconfig offered?

EDIT: 5.8 is available (shrug)[url] https://packages.gentoo.org/packages/sys-kernel/gentoo-sources[/url]

I'd stick as stable as possible on a machine needed for daily work or school.

----------

## Timmer

Yeah, for now I've just masked anything over 5.5.  I did take all the defaults that make oldconfig offered.  There were a lot of changes, as you might expect, but nothing particularly stuck out to me.

@Hu, I did try the available intervening kernels, but all that showed was that the problem was somewhere between 5.4 and 5.7 (since 5.5 and 5.6 aren't in portage anymore, I can't try them)

----------

## NeddySeagoon

Timmer,

Is it not booting or is it booting fine with a blank console?

Can you ssh into the system ?

Your problem report says that the new kernel does not write anything to the console.

That's not the same as not booting.

Please post your 

```
lspci -nnk
```

output and put your new kernel .config onto a pastebin site.

You can get vanillia-sources from kernel.org if you want to try old kernels.

----------

## Timmer

Grub comes up. I select the new kernel.  The screen says

```

Loading kernel

Loading initial ramdisk

```

And then nothing happens.  It just hangs there forever.

vs the old kernel:

Grub comes up.  I select the old kernel. The screen says

```

Loading kernel

Loading initial ramdisk

```

And then the screen flashes, and it shows a whole bunch of kernel debug info, and then the openrc output, and then sddm comes up (ie, it boots normally)

I will post lspci output and kernel configs tonight.Last edited by Timmer on Fri Aug 21, 2020 12:09 am; edited 1 time in total

----------

## Hu

 *Timmer wrote:*   

> @Hu, I did try the available intervening kernels, but all that showed was that the problem was somewhere between 5.4 and 5.7 (since 5.5 and 5.6 aren't in portage anymore, I can't try them)

 I understood that you had tried 5.7.  I wanted you to try 5.5 and 5.6, which you can try despite not being in Portage, although it is a bit more trouble.  You would need to retrieve their ebuilds from history, or get the sources from kernel.org as Neddy hinted.

----------

## Banana

dunno if it releated: 

- Does grub to be rebuild with the new kernel?

- Are the filesystemtypes active in the new kernel (vfat, ext4...)?

- Does some setting change the device ids?

----------

## Timmer

```
$ lspci -nk

00:00.0 0600: 8086:0d04 (rev 08)

        Subsystem: 1558:7410

00:02.0 0300: 8086:0d26 (rev 08)

        Subsystem: 1558:7410

        Kernel driver in use: i915

00:03.0 0403: 8086:0d0c (rev 08)

        Subsystem: 1558:7410

        Kernel driver in use: snd_hda_intel

00:14.0 0c03: 8086:8c31 (rev 04)

        Subsystem: 1558:7410

        Kernel driver in use: xhci_hcd

00:16.0 0780: 8086:8c3a (rev 04)

        Subsystem: 1558:7410

00:19.0 0200: 8086:153b (rev 04)

        Subsystem: 1558:7410

        Kernel driver in use: e1000e

00:1a.0 0c03: 8086:8c2d (rev 04)

        Subsystem: 1558:7410

        Kernel driver in use: ehci-pci

00:1b.0 0403: 8086:8c20 (rev 04)

        Subsystem: 1558:7410

        Kernel driver in use: snd_hda_intel

00:1c.0 0604: 8086:8c10 (rev d4)

        Kernel driver in use: pcieport

00:1c.1 0604: 8086:8c12 (rev d4)

        Kernel driver in use: pcieport

00:1c.3 0604: 8086:8c16 (rev d4)

        Kernel driver in use: pcieport

00:1d.0 0c03: 8086:8c26 (rev 04)

        Subsystem: 1558:7410

        Kernel driver in use: ehci-pci

00:1f.0 0601: 8086:8c4b (rev 04)

        Subsystem: 1558:7410

00:1f.2 0106: 8086:8c03 (rev 04)

        Subsystem: 1558:7410

        Kernel driver in use: ahci

00:1f.3 0c05: 8086:8c22 (rev 04)

        Subsystem: 1558:7410

        Kernel driver in use: i801_smbus

02:00.0 ff00: 10ec:5229 (rev 01)

        Subsystem: 1558:7410

03:00.0 0280: 8086:0887 (rev c4)

        Subsystem: 8086:4062

        Kernel driver in use: iwlwifi

```

config file:

https://gist.github.com/tclimis/78378c485d45b9e29ffc8dc7c17555f2

(did not know about the -k option on lspci.  That's pretty great)  Working on getting 5.5 and 5.6 kernels now.  I'll post back once I have something to share.

----------

## Timmer

So what I've found so far is that 5.5.9 has this problem.  and 5.5.0-5.5.8 won't even build.  There's only 15 options moving to 5.5 from 5.4 though so I'll try playing with them tomorrow and see if I get anywhere.

Thanks everyone for the help so far.

----------

## NeddySeagoon

Timmer,

In an effort to get a console for debug turn on

```
# CONFIG_FB_SIMPLE is not set
```

At the moment your console should start with efifb then switch to inteldrmfb.

You will see that in dmesg.

I have a system where CONFIG_FB_EFI=y produces a blank screen on any kernel later than 4.14, no matter what else is set.

It boots normally, I get a serial console, everything works except the console.

Check that when your system works, you get the efifb to inteldrmfb switch.

If that's there, try turning off CONFIG_FB_EFI.

When you boot a non working kernel, then switch back to a working kernel, does the system do journal recovery?

It will only be in the first boot after the switch.

Journal recovery indicates that your filesystem was mounted and you forced an unclean shutdown when the console was blank.

```
$ lspci -nk
```

you have an 'n' missing. :)

----------

## krinn

 *Tony0945 wrote:*   

> Try this script.  Note that it uses make oldconfig instead of make olddefconfig.

 

You should try put the reminder to run your script after mount /boot

----------

## Tony0945

 *krinn wrote:*   

>  *Tony0945 wrote:*   Try this script.  Note that it uses make oldconfig instead of make olddefconfig. 
> 
> You should try put the reminder to run your script after mount /boot

 

??? Please expand on this. Usually I run it after emerging a kernel update (and selecting it with eselect).

----------

## Timmer

 *NeddySeagoon wrote:*   

> 
> 
> you have an 'n' missing. 

 

So I do!

```
$ lspci -nnk

00:00.0 Host bridge [0600]: Intel Corporation Crystal Well DRAM Controller [8086:0d04] (rev 08)

        Subsystem: CLEVO/KAPOK Computer Crystal Well DRAM Controller [1558:7410]

00:02.0 VGA compatible controller [0300]: Intel Corporation Crystal Well Integrated Graphics Controller [8086:0d26] (rev 08)

        Subsystem: CLEVO/KAPOK Computer Crystal Well Integrated Graphics Controller [1558:7410]

        Kernel driver in use: i915

00:03.0 Audio device [0403]: Intel Corporation Crystal Well HD Audio Controller [8086:0d0c] (rev 08)

        Subsystem: CLEVO/KAPOK Computer Crystal Well HD Audio Controller [1558:7410]

        Kernel driver in use: snd_hda_intel

00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 04)

        Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family USB xHCI [1558:7410]

        Kernel driver in use: xhci_hcd

00:16.0 Communication controller [0780]: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 [8086:8c3a] (rev 04)

        Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family MEI Controller [1558:7410]

00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection I217-V [8086:153b] (rev 04)

        Subsystem: CLEVO/KAPOK Computer Ethernet Connection I217-V [1558:7410]

        Kernel driver in use: e1000e

00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 04)

        Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family USB EHCI [1558:7410]

        Kernel driver in use: ehci-pci

00:1b.0 Audio device [0403]: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller [8086:8c20] (rev 04)

        Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset High Definition Audio Controller [1558:7410]

        Kernel driver in use: snd_hda_intel

00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 [8086:8c10] (rev d4)

        Kernel driver in use: pcieport

00:1c.1 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #2 [8086:8c12] (rev d4)

        Kernel driver in use: pcieport

00:1c.3 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 [8086:8c16] (rev d4)

        Kernel driver in use: pcieport

00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 04)

        Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family USB EHCI [1558:7410]

        Kernel driver in use: ehci-pci

00:1f.0 ISA bridge [0601]: Intel Corporation HM87 Express LPC Controller [8086:8c4b] (rev 04)

        Subsystem: CLEVO/KAPOK Computer HM87 Express LPC Controller [1558:7410]

00:1f.2 SATA controller [0106]: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8086:8c03] (rev 04)

        Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [1558:7410]

        Kernel driver in use: ahci

00:1f.3 SMBus [0c05]: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller [8086:8c22] (rev 04)

        Subsystem: CLEVO/KAPOK Computer 8 Series/C220 Series Chipset Family SMBus Controller [1558:7410]

        Kernel driver in use: i801_smbus

02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5229 PCI Express Card Reader [10ec:5229] (rev 01)

        Subsystem: CLEVO/KAPOK Computer RTS5229 PCI Express Card Reader [1558:7410]

03:00.0 Network controller [0280]: Intel Corporation Centrino Wireless-N 2230 [8086:0887] (rev c4)

        Subsystem: Intel Corporation Centrino Wireless-N 2230 BGN [8086:4062]

        Kernel driver in use: iwlwifi

```

`

----------

## Timmer

 *NeddySeagoon wrote:*   

> 
> 
> In an effort to get a console for debug turn on
> 
> ```
> ...

 

No dice.  It still hangs at "initial ramdisk...", so I think it's not getting to the point of being able to display anything.

 *NeddySeagoon wrote:*   

> 
> 
> At the moment your console should start with efifb then switch to inteldrmfb.
> 
> You will see that in dmesg.
> ...

 

This? (this is from the kernel that does boot)

```
[    2.543309] Console: switching to colour frame buffer device 240x67

[    2.686196] i915 0000:00:02.0: fb0: i915drmfb frame buffer device
```

 *NeddySeagoon wrote:*   

> 
> 
> When you boot a non working kernel, then switch back to a working kernel, does the system do journal recovery?
> 
> 

 

No

----------

## NeddySeagoon

Timmer,

As the journal recovery doesn't happen, you filesystems are clean.

That means that the boot failed before root was mounted.

Its not much but we can rule out booting properly with no console.

----------

## NeddySeagoon

Timmer,

To boot, the kernel needs to be able to read your partition table

```
CONFIG_MSDOS_PARTITION=y

ONFIG_EFI_PARTITION=y
```

Good.

Read your HDD

```
CONFIG_BLOCK=y

# CONFIG_IDE is not set

CONFIG_BLK_DEV_SD=y

CONFIG_ATA=y

CONFIG_SATA_AHCI=y
```

Read your root filesystem

```
CONFIG_EXT4_FS=y

CONFIG_EXT4_USE_FOR_EXT2=y

CONFIG_EXT4_FS_POSIX_ACL=y

CONFIG_EXT4_FS_SECURITY=y
```

I didn't ask if you are using ext4 but thats all thats set. A missing filesystem driver gets you a panic, not a blank screen.

Turn off 

```
CONFIG_NTFS_FS=y
```

and use ntfs-3g. The kernel NTFS is harmless now and not very useful.

```
CONFIG_PROC_FS=y

CONFIG_SYSFS=y

CONFIG_TMPFS=y

CONFIG_TMPFS_POSIX_ACL=y

CONFIG_TMPFS_XATTR=y

CONFIG_DEVTMPFS=y

CONFIG_DEVTMPFS_MOUNT=y
```

are all good to have.

The graphics subsystem needs

```
CONFIG_MTRR=y

CONFIG_X86_PAT=y
```

Does 

```
CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=y
```

apply to you?

The AMD equivalent prevents some AMD systems from booting.

Is it on in a working kernel?

```
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
```

needs ECC RAM.

You have 

```
CONFIG_MATOM=y
```

 Are you sure its an atom?

Building a kernel with the wrong instruction set for your CPU always ends badly.

Try a generic 

```
# CONFIG_GENERIC_CPU is not set
```

 build.

What does 

```
gcc -march=native -v -E - < /dev/null 2>&1 | grep cc1 | perl -pe 's/ -mno-\S+//g; s/^.* - //g;'
```

 say about your CPU?

That's a one liner that makes gcc tell what -march=native means on the CPU its run on.

----------

## krinn

 *Tony0945 wrote:*   

> ??? Please expand on this. Usually I run it after emerging a kernel update (and selecting it with eselect).

 

your script will run make install, which will copy kernel... to /boot

if /boot is not mount before the script and /boot is on a partition, it's typical use case where the user install into the wrong directory and endup booting old kernel

it's not a problem for you if your /boot isn't on its own partition, but some user use /boot on a partition

----------

## Tony0945

 *krinn wrote:*   

> it's not a problem for you if your /boot isn't on its own partition, but some user use /boot on a partition

 

Ah! Yes, that is my setup and on all my machines (maybe not the old k6). But for Generic use you are absolutely correct. It should either warn in the beginning ("Warning! /boot is not mounted. Continue?") or maybe fail or just before make, automount boot. I prefer this former.  Really, separate /boot is archaic on any BIOS newer than 10 or 12 years.  

It was a very good comment. I just didn't understand it.

----------

## ct85711

The key part to remember, /boot is NOT a partition/drive, /boot is only a directory.  You can have /boot redirected to look somewhere else, but in the end /boot is still just a directory.  So you can't really see if /boot is mounted or not, as it is ALWAYS there, as it is still just a directory.

----------

## NeddySeagoon

ct85711,

A script  can check /etc/fstab then check to see if /boot is empty.

A few paranoid users may have a separate boot and not list it in fstab.

It need not even be called boot but that's what make install expects.

Every time a better mousetrap is invented, mice get cleverer. :)

----------

## Tony0945

 *NeddySeagoon wrote:*   

> A script  can check /etc/fstab then check to see if /boot is empty.

 

```
tony@MSI ~ $ mount|grep boot

/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)

tony@MSI ~ $ grep boot /etc/fstab

#/dev/sda1   /boot/efi    vfat      relatime   1 2

LABEL=CT500MX_EFI   /boot/efi    vfat   relatime   1 2

```

 Somebody could use that to determine if /boot is in fstab and if so, is it mounted. Maybe using awk?

Just a thought. Back when I did have a separate /boot (ext2), /etc/fstab mounted it "auto".  Someone who needs to check and is adept at shell will maybe write one and post it here.

----------

## Timmer

Sorry for falling of the face of the planet. Life happened. But I've made some progress.  I think the problem is with loading the Intel framebuffer driver. I enabled the vesa driver, and it leaves grub, but then hangs when switching to the Intel driver. If I take the Intel driver out of the kernel, completely then it finishes booting, but only to a tty (unsurprisingly).

----------

## Timmer

Got it.  It was this problem, and toggling the kernel option in the thread has me running again: https://bbs.archlinux.org/viewtopic.php?pid=1913970#p1913970

----------

