# [SOLVED] fails to reboot, but will startup

## jyoung

Hi,

I have a strange issue where my machine won't reboot with the 'reboot' command, but if I shutdown (either with 'shutdown -h now' or a hard stop with the power button), and then start it up, it starts fine. The bootloader is lilo, and the failure occurs between the lilo boot screen while it says it's loading and the first steps of the actual bootup process. This issue came up after a recent kernel update, which is now 5.12.7 from gentoo-sources.

I'd be grateful for any thoughts.Last edited by jyoung on Thu Jun 17, 2021 4:19 pm; edited 1 time in total

----------

## alamahant

I think its simple

1.Good kernel

2.Good initrd

3.Grub

Why use lilo?who uses lilo nowadays?

Only slackware no?

Also plz plz dont do hard reboots.

 :Smile: 

A lot more info is needed to troubleshoot this.

It could be a combination of many things.

dmesg

is your friend.

Also a pic of the failing boot maybe?

A patebin of your kernel .config also?

Do you use an initrd?

How is your partition layout?

etc

----------

## jyoung

Okay, here is my kernel config

http://www.pastebin.com/1G5d0NEi

And dmesg

http://www.pastebin.com/m40DqKAK

My partition layout is pretty simple, here's part of /etc/fstab:

```
/dev/sda1          /boot   ext4   noauto,noatime      1 2

/dev/sda2           none           swap   sw             0 0

/dev/sda3           /           ext4   noatime         0 1

/dev/sda5      /home   ext4   noatime         0 0
```

Agreed, hard stops are to be avoided, but in this case when the machine is hang before bootup there's no option. I'm not using an initrd. I'm using lilo because when I setup this machine five years ago I knew that I wanted something simple and I hadn't tried lilo yet, and while it was old even at that point the gentoo handbook still described it as reliable. Honestly, I haven't had problems with it since, although it's always possible that this could be the first instance. Here's my lilo config:

```
boot=/dev/sda

prompt

timeout=60

default=gentoo-linux

image=/boot/vmlinuz-5.12.7-gentoo

  label=gentoo-linux

  read-only

  root=/dev/sda3

  append="acpi_osi=Linux"

image=/boot/vmlinuz-5.6.11-gentoo

  label=gentoo-backup

  read-only

  root=/dev/sda3

  append="acpi_osi=Linux"

```

----------

## Jaglover

```
acpi_osi=Linux
```

This is certainly affecting kernel interaction with BIOS, have you tried without it? And don't get carried away, this has nothing to do with your booloader choice. Booloader works only for a fraction of second during boot, bootloader has nothing to do with reboot or shutdown.

----------

## jyoung

Hmm, I can't recall why I added acpi_osi=Linux to lilo.conf. I was five years ago that I set it up... However, this post suggests that it's a reasonable thing to do:

https://askubuntu.com/questions/28848/what-does-the-kernel-boot-parameter-set-acpi-osi-linux-do

Still, no harm in trying without. I setup this new lilo.conf file

```
boot=/dev/sda

prompt

timeout=60

default=gentoo-linux

image=/boot/vmlinuz-5.12.7-gentoo

  label=gentoo-linux

  read-only

  root=/dev/sda3

  append="acpi_osi=Linux"

image=/boot/vmlinuz-5.12.7-gentoo

  label=gentoo-test

  read-only

  root=/dev/sda3

image=/boot/vmlinuz-5.6.11-gentoo

  label=gentoo-backup

  read-only

  root=/dev/sda3

  append="acpi_osi=Linux"
```

Both 'gentoo-linux' and 'gentoo-test' seem to behave the same, that is, they startup normally except after a 'reboot' command.

Some more observations: When the failure occurs, the last thing I see is a message from the lilo screen saying "BIOS data check successful". This message is printed regardless of whether or not the startup is about to fail. In the failure case, it then goes to a blank screen. I also know that this is not just a display issue, and the machine is actually not starting up, since I can't login remotely.

----------

## Logicien

When reboot the system softwares are reinitialised but not the microcode of the devices as it is done with a poweroff. I had a problem like this with an Intel Apu where the reboot was not finished using the kernel module i915. After blacklist i915 Linux started to use the efifb framebuffer and than reboot finished positively.

So this can have to do with the graphic card if you use an Intel integrated video card. Using an Ati/Amd Pcie video card with the Linux radeon module reboot is fine. One device may not be reinitialised correctly on reboot when stuck on the boot process.

----------

## jyoung

From this article it looks like one option might be to pull the microcode into the kernel

http://www.kernel.org/doc/html/latest/x86/microcode.html

If I were to go the route of switching to an efifb framebuffer, this article 

http://www.kernel.org/doc/html/latest/fb/efifb.html

says "The system must be booted via the EFI stub for this to be usable." That seems drastic. What are the advantages of efifb?

----------

## jyoung

Okay, I tried compiling the  appropriate microcode into the kernel as per this tutorial

https://wiki.gentoo.org/wiki/Intel_microcode

But the results are the same.

----------

## Jaglover

jyoung,

the askubuntu article you linked to also tells "Yes, BIOS's usually disable functionality if Windows is not detected", which means if you tell such a braindead BIOS you are running Linux it may misbehave. Maybe lying to it you are running windows makes it listen to the OS? My 2¢.

----------

## jyoung

Alas, no luck. Here's my lilo.conf file:

```
boot=/dev/sda

prompt

timeout=60

default=gentoo-linux

image=/boot/vmlinuz-5.12.7-gentoo

  label=gentoo-linux

  read-only

  root=/dev/sda3

  append="acpi_osi=Linux"

image=/boot/vmlinuz-5.12.7-gentoo

  label=gentoo-test

  read-only

  root=/dev/sda3

  append="acpi_osi=Windows"

image=/boot/vmlinuz-5.6.11-gentoo

  label=gentoo-backup

  read-only

  root=/dev/sda3

  append="acpi_osi=Linux"
```

Even with  append="acpi_osi=Windows" under gentoo-test, the problem persists.

----------

## jyoung

The intel microcode gentoo wiki also states

 *Quote:*   

> If the initramfs USE flag is active the intel-microcode ebuild will automatically install a cpio archive of all microcode into /boot/intel-uc.img.

 

With equery uses intel-microcode I get

```
- - hostonly    : only install ucode(s) supported by currently available (=online) processor(s) 

 - - initramfs   : install a small initramfs for use with CONFIG_MICROCODE_EARLY 

 + + split-ucode : install the split binary ucode files (used by the kernel directly) 

 - - vanilla     : install only microcode updates from Intel's official microcode tarball 
```

So, maybe I should focus on early microcode loading? But, there's no  CONFIG_MICROCODE_EARLY in the .config file, and in menuconfig I can't find any reference to MICROCODE_EARLY with the '/' search.

----------

## Jaglover

If you build it into kernel it will load early, there is no extra option for early loading, see the timestamp in my dmesg.

```
[    0.000000] microcode: microcode updated early to revision 0xea, date = 2021-01-05

```

----------

## Hu

It appears that MICROCODE_EARLY was removed in fe055896c040df571e4ff56fb196d6845130057b in 2015.  However, as Jaglover says, the functionality still exists.  There is just no symbol for excluding it, because early microcode was deemed to be the better approach than supporting late microcode.

----------

## jyoung

Okay, so the microcode is in the kernel and loaded early... but even on a reboot? Logicien, you suggested that the microcode my not be reinitialized on a reboot, and that seems to fit the symptoms here.

----------

## Logicien

With i915 the backlight is always on and I have not found a way to disable it after try I  everything I could. In plus the reboot is slow with it. On a Dell Optiplex 7100 the Dell Efi/Bios logo is not reappearing and the computer stay in an idle state and the screen go to save power mode. Replacing i915 by efifb resolv all problems, backlight is off and reboot is good. But, efifb is not performing in FPS as i915.

Now I use an Amd/Ati Pcie extension card and the radeon module work well. But the integrated  Intel Apu is performing the best in terms of Frames Per Second (FPS). Anyway I think that the cold poweroff is better than a reboot in terms of testing an upgrade for example.

If you use Grub2 and it display properly you can try to pass to it in the /etc/default/grub file the parameter GRUB_GFXPAYLOAD_LINUX=keep and see if Linux use it and boot properly too. Or, use the Linux kernel parameter video= to tell to Linux which resolution use.

----------

## jyoung

Okay, I can setup grub and try out the GRUB_GFXPAYLOAD_LINUX=keep option. lilo is nice and simple, but perhaps we've hit its limits. I should be able to report back on that sometime tomorrow.

With the video= kernel option, would that be the resolution of the monitor? I have to admit that it seems kind of weird to need to put the monitor resolution into the bootloader, but it's easy enough to try.

Agreed, it does seem that a cold restart is preferable! But it would be great to get the reboot ability working. It's sometimes necessary for me to reboot this machine remotely.

----------

## jyoung

This afternoon I switched to grub2 and added GRUB_GFXPAYLOAD_LINUX=keep to /etc/default/grub. Booting through grub works as normal, but rebooting through grub hangs right after the grub menu, when it prints 'Loading Linux 5.12.7-gentoo ...'. It seems like the issue is the same as with lilo.

----------

## Jaglover

I think I have a minor version of this bug. When I reboot then my 2560x1440 display is not detected and comes up 1920x1080, furthermore, 2560x1440 resolution is not available in X, either. I haven't worked on this as I reboot very seldom. 

Have you played with EDID loading option in kernel?

Edit. Have to retract, having a closer look at kernel options I do not see anything what would affect my Intel HD 630 reboot.

----------

## jyoung

When I reboot off the old kernel (5.6.11), the bug does not occur. So either something was messed up in the migration, or there's a bug in the new (5.12.7) source. Or, there was something messed up in the 5.6.11 .config file that, by pure luck, remained asymptomatic until the migration.

When I migrated from 5.6.11 to 5.12.7, I used make oldefconfig. I'm going to try rebuilding 5.12.7 from scratch, and see that works any better.

----------

## Jaglover

I can't imagine a case where I would use olddefconfig, it overwrites (modifies) kernel configuration without even notifying what was done. I certainly do not want such disaster to my kernels, considering how many default options I have to change every time I run oldconfig.

----------

## Tony0945

I agree with Jaglover.  I recently posted a buildscript in "Tips and Tricks". It uses "make oldconfig" not make olddefconfig.

You should have a /boot/config-<somerthing or other> from your working kernel. Just eselect the new kernel and then pass the location of that config as a parameter to that script.

Better yet, boot the working kernel, then eselect the new kernel and just run,. This assumes that you have the config built into the kernel.

See https://www.xaprb.com/blog/2006/05/23/how-to-use-linuxs-proc-config-feature/

If not just run it passing the config I referenced above.Last edited by Tony0945 on Sun Jun 13, 2021 5:15 pm; edited 1 time in total

----------

## jyoung

Indeed, it appears that this thread will read as cautionary tale for those who might opt for make olddefconfig. I just setup a new .config file from scratch, compiled and installed the kernel, and I was able to reboot without issue.

Tony0945, today or tomorrow I'm going to try some of the tips you suggested to make a clean migration from the old kernel to the new one. I'll report back, but I it certainly looks like we're close to solving this issue.

----------

## jyoung

Okay, I'm marking this thread as 'solved'.  The root of the problem was with one of the default options pulled in by 'make olddefconfig'. Thanks a lot to everyone for trouble shooting this with me!

----------

## GDH-gentoo

 *jyoung wrote:*   

> The root of the problem was with one of the default options pulled in by 'make olddefconfig'.

 

For the benefit of future readers, why don't you tell us which option was that and what was the setting that fixed the problem?

----------

## jyoung

That's a good point GDH-gentoo. I just ran diff on the two config files, and the differences are quite numerous. I'd be happy to post the entire list, but I wonder if there's a good way to determine the key differences.

----------

