# Inconsistency when booting new kernel

## NathanZachary

Hello all,

Tonight I updated from gentoo-sources-4.17.14 to 5.0.13.  I used 'make olddefconfig' and then went in to make some modifications (nothing related to device drivers though).  When I rebooted to use the new kernel, it panicked.  I rebooted again, but went into 5.0.13 (recovery mode).  I didn't change anything, but just looked at the output of 'blkid' and such to make sure that nothing had unexpectedly changed.  When I rebooted again, the kernel booted without problem.  Since I didn't change anything between the time that it failed to boot and the time that it booted properly, I'm a bit concerned regarding that inconsistent behaviour.  Any ideas?  I'm happy to post any troubleshooting information that may be helpful.

Thank you in advance.

Cheers,

Nathan Zachary

----------

## lfs0a

panic at PCI bridge to [bus 15-18] around?

my last not-panic kernel is 4.14.83

the later like 4.19.27 4.19.37 will ramdomly panic

panic at 

[        0.453204] pci 0000:00:1e.0 PCI bridge to [bus 15-18]

and i used 'make olddefconfig' too.

4.14.83 works just fine.

----------

## Anon-E-moose

post your config file for 5.0.13

OR 

Check and see if you have 

SCHED_MUQSS, SCHED_MC and RQ_MC set

I had inconsistent booting with the shared runqueue enabled.

Boot one time, hang the next. I turned it off "no sharing"

----------

## NathanZachary

Strange.  I see SCHED_MC, and it is set.  However, I don't see any of the options related to the runqueue (like MuQSS) in 5.0.13.  :Confused: 

----------

## Anon-E-moose

```
~ grep -E "CONFIG_SCHED_|CONFIG_RQ_" .config

CONFIG_SCHED_MUQSS=y

# CONFIG_SCHED_OMIT_FRAME_POINTER is not set

CONFIG_SCHED_SMT=y

# CONFIG_SCHED_MC is not set

CONFIG_RQ_NONE=y

# CONFIG_RQ_SMT is not set

# CONFIG_RQ_SMP is not set

# CONFIG_RQ_ALL is not set
```

Processor Type and Features -> CPU scheduler runqueue sharing (it's just under multicore support)

Edit to add: or as a kernel parm

```
    rqshare=    [X86] Select the MuQSS scheduler runqueue sharing type.

            Format: <string>

            smt -- Share SMT (hyperthread) sibling runqueues

            mc -- Share MC (multicore) sibling runqueues

            smp -- Share SMP runqueues

            none -- So not share any runqueues

            Default value is mc
```

ETA2: I'm assuming you're running MUQSS

----------

## NathanZachary

Yeah, that's the part that is strange to me:

```

/usr/src/linux # ls -lh /usr/src/

total 12K

lrwxrwxrwx  1 root root   19 May  6 21:40 linux -> linux-5.0.13-gentoo

drwxr-xr-x 27 root root 4.0K May  6 21:38 linux-4.17.14-gentoo

drwxr-xr-x 27 root root 4.0K May  7 10:37 linux-5.0.13-gentoo

drwxr-xr-x 25 root root 4.0K May  6 21:36 linux-5.1.0-gentoo

/usr/src/linux # grep -E "CONFIG_SCHED_|CONFIG_RQ_" .config

# CONFIG_SCHED_AUTOGROUP is not set

CONFIG_SCHED_OMIT_FRAME_POINTER=y

CONFIG_SCHED_SMT=y

CONFIG_SCHED_MC=y

CONFIG_SCHED_MC_PRIO=y

CONFIG_SCHED_HRTICK=y

# CONFIG_SCHED_DEBUG is not set

CONFIG_SCHED_INFO=y

# CONFIG_SCHED_STACK_END_CHECK is not set

# CONFIG_SCHED_TRACER is not set

```

----------

## Anon-E-moose

 :Embarassed:   I forgot I run zen kernel w/muqss, you probably don't have the muqss option.

In that case I have no idea what's causing your problem.

In the time where it crashed did it get far enough to leave a /var/log/dmesg file? (you would need to have a rescue cd to get it because it gets overwritten with each boot).

The time it crashed, what did you do before that power off reset, or ctrl-alt-delete/reboot. And was it the same the next time it ran successful?

----------

## NathanZachary

The last thing that I saw was the VFS message about attempting to mount root, and I believe the exact message was:

```
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(8,20)
```

I will probably need to do some further tests when I have time to see if I can get it to fail again.  However, since it is up and running right now, I may just let it go until the next time I do a kernel upgrade.

----------

## NathanZachary

I'm still open to suggestions here.  Without changing anything at all, it will boot about once out of every 6-10 attempts.

That type of inconsistency leads me to believe that there is some type of regression with the 5.0 kernels.

Thoughts?

----------

## richard77

I'm not sure is relevant, but since 4.20 iommu is on by default. You could try to add intel_iommu=off to the kernel command line

----------

## krinn

you could make a diff of working dmesg and non booting one

* for working one, just save dmesg somewhere

* for non working one, you have to reboot into an usb/livecd/dvd in order to keep the "bad" dmesg alive, mount it, and save it

however, while a kernel race is still possible, most of the time, the assumption that "I didn't change anything" is wrong (plug-in an usb device might change boot order, using wrong blkid on "normal" boot command line fails while in your recovery menu the good blkid is use...). But let's assume you're right there, so for me the only answer to this must be in dmesg.

what i would first look is when i get the "unknown-block" message, kernel list the partitions and disks it could see, empty list mean no working driver, and a list just probably say that the command line wasn't good.

----------

## Anon-E-moose

 *richard77 wrote:*   

> I'm not sure is relevant, but since 4.20 iommu is on by default. You could try to add intel_iommu=off to the kernel command line

 

Or just disable it completely with iommu=off, I'm not sure if intel_iommu disables it completely or just the intel specific stuff.

I've long been using iommu=pt though as I run vm's and you need iommu for that.

----------

## NathanZachary

I experimented with various iommu kernel parameters, but they didn't help.

I would like to go to the 5.1 kernels as soon as the nvidia-drivers work with them.

My intention is to build a 5.1 kernel from scratch, but I have to wait for nvidia.

----------

## Anon-E-moose

Paste your 5.0 .config

----------

## NathanZachary

Thanks, Anon-E-moose, for offering to look over the .config:

https://pastebin.com/xk5NmgRY

Cheers,

Nathan Zachary

----------

## Anon-E-moose

A few differences (apart from drivers, intel vs amd, network, etc)

I run CONFIG_PREEMPT_RCU=y vs you CONFIG_TREE_RCU=y but not sure that makes any difference

I run CONFIG_UNINLINE_SPIN_UNLOCK=y vs you

CONFIG_INLINE_SPIN_UNLOCK_IRQ=y

CONFIG_INLINE_READ_UNLOCK=y

CONFIG_INLINE_READ_UNLOCK_IRQ=y

CONFIG_INLINE_WRITE_UNLOCK=y

CONFIG_INLINE_WRITE_UNLOCK_IRQ=y

and these might make a difference simply because the compiler might do something funny with inlining (just a guess)

And I run CONFIG_PREEMPT=y vs you CONFIG_PREEMPT_VOLUNTARY=y but not sure this makes a difference other than responsiveness for things like desktop.

Other than the above, you are using the gcc 9.1 compiler and not sure that makes a difference, but if you have 8.* installed you might use that and see if there are still problems.

----------

## NathanZachary

 *krinn wrote:*   

> you could make a diff of working dmesg and non booting one
> 
> * for working one, just save dmesg somewhere
> 
> * for non working one, you have to reboot into an usb/livecd/dvd in order to keep the "bad" dmesg alive, mount it, and save it
> ...

 

Since it doesn't mount the root volume (on the times that it panics), there won't be a dmesg to look at (even from a live environment).

When I see the "unknown-block" message, I see partitions listed (it shows sdf* partitions, but I don't see one for any of the sdg* partitions [sdg4 is my root]).

----------

## Anon-E-moose

 *NathanZachary wrote:*   

> When I see the "unknown-block" message, I see partitions listed (it shows sdf* partitions, but I don't see one for any of the sdg* partitions [sdg4 is my root]).

 

What devices are sdf, sdg, usb, nfs, ???

----------

## NeddySeagoon

NathanZachary,

Don't count on the kernel naming being reliable.

Use root=PARTUUID= on the kernel line and use PARTUUID or UUID in /etc/fstab.

It may not be /dev/sdg that's missing, it may be a normally lower one in the pecking order, so sdg becomes sdf.

----------

## NathanZachary

 *NeddySeagoon wrote:*   

> NathanZachary,
> 
> Don't count on the kernel naming being reliable.
> 
> Use root=PARTUUID= on the kernel line and use PARTUUID or UUID in /etc/fstab.
> ...

 

Thank you for the replies.  I actually already have the following in /etc/fstab:

```

PARTLABEL=boot      /boot         ext4      noauto,noatime                  0 2

PARTLABEL=swap      none         swap      sw                     0 0

PARTLABEL=rootfs   /         ext4      noatime,discard                  0 1

<snip>

/dev/cdrom      /mnt/cdrom      auto      noauto,ro                  0 0

tmpfs         /var/tmp/portage   tmpfs      size=12G,uid=portage,gid=portage,mode=775,noatime   0 0

```

Something I didn't think about, though, is how it is referenced in /boot/grub/grub.cfg:

```

menuentry 'Gentoo GNU/Linux' --class gentoo --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-2c2f74cd-8c48-46ea-8f04-f75631315a3c' {

        load_video

        if [ "x$grub_platform" = xefi ]; then

                set gfxpayload=keep

        fi

        insmod gzio

        insmod part_gpt

        insmod ext2

        set root='hd1,gpt2'

        if [ x$feature_platform_search_hint = xy ]; then

          search --no-floppy --fs-uuid --set=root --hint-bios=hd1,gpt2 --hint-efi=hd1,gpt2 --hint-baremetal=ahci1,gpt2  fddb287d-f875-4a28-aae1-faacd32a2093

        else

          search --no-floppy --fs-uuid --set=root fddb287d-f875-4a28-aae1-faacd32a2093

        fi

        echo    'Loading Linux 5.0.13-gentoo ...'

        linux   /vmlinuz-5.0.13-gentoo root=/dev/sdb4 ro  intel_idle.max_cstate=0

}

```

However, that's the same as the entry for the 4.17.14 kernel, which is working as intended.  I could try putting in the PARTUUID for the root device in the kernel line of grub.cfg if that could potentially help.

----------

## Anon-E-moose

is it always able to find grub?

If so, before booting, you might see what disks it shows before booting, do this several times until the problem shows, if it's disk showing up out of order then it should show in grub.

https://www.linux.com/learn/how-rescue-non-booting-grub-2-Linux

Edit to add: what are sd* devices? Separate disks or raid volumes. If separate disks, then the above might help find out what's going on, if they're raid volumes, then it's possible that whatever you're using mdadm, etc might be changing the order. Hard to say without knowing what your disk subsystem is.

It would be helpful for you to paste a good boot, (/var/log/dmesg), from both 4.17 and 5.0, so we can see if the order of things has changed.

----------

## NathanZachary

Yes, it's always able to find GRUB and start to boot, but then will panic with the infamous "Unable to mount root fs on unknown block(8,20)".

Looking at dmesg from a successful boot, I don't see much of a difference:

```

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.17.14-gentoo root=/dev/sdb4 ro intel_idle.max_cstate=0

[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.17.14-gentoo root=/dev/sdb4 ro intel_idle.max_cstate=0

[    0.550760] sd 4:0:0:0: Attached scsi generic sg1 type 0

[    0.550778] sd 4:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)

[    0.551157] sd 4:0:0:0: [sda] Write Protect is off

[    0.551307] sd 4:0:0:0: [sda] Mode Sense: 00 3a 00 00

[    0.551324] sd 4:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[    0.592501]  sda: sda1

[    0.593226] sd 4:0:0:0: [sda] Attached SCSI disk

[    0.634253] sd 6:0:0:0: Attached scsi generic sg2 type 0

[    0.634390] sd 6:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/233 GiB)

[    0.634720] sd 6:0:0:0: [sdb] Write Protect is off

[    0.634870] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00

[    0.634892] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[    0.636500]  sdb: sdb1 sdb2 sdb3 sdb4

[    0.637000] sd 6:0:0:0: [sdb] Attached SCSI disk

[    0.670879] sd 7:0:0:0: Attached scsi generic sg3 type 0

[    0.670905] sd 7:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)

[    0.671273] sd 7:0:0:0: [sdc] 4096-byte physical blocks

[    0.671489] sd 7:0:0:0: [sdc] Write Protect is off

[    0.671834] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00

[    0.672013] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[    0.703018] sdhci: Secure Digital Host Controller Interface driver

[    0.703173] sdhci: Copyright(c) Pierre Ossman

[    1.086802]  sdc: sdc1

[    1.087586] sd 7:0:0:0: [sdc] Attached SCSI disk

[    1.123789] EXT4-fs (sdb4): INFO: recovery required on readonly filesystem

[    1.123948] EXT4-fs (sdb4): write access will be enabled during recovery

[    1.424619] EXT4-fs (sdb4): orphan cleanup on readonly fs

[    1.425690] EXT4-fs (sdb4): 3 orphan inodes deleted

[    1.425851] EXT4-fs (sdb4): recovery complete

[    1.431119] EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)

[    2.161752] sd 17:0:0:0: Attached scsi generic sg5 type 0

[    2.180498] sd 17:0:0:1: Attached scsi generic sg6 type 0

[    2.201095] sd 17:0:0:2: Attached scsi generic sg7 type 0

[    2.224531] sd 17:0:0:3: Attached scsi generic sg8 type 0

[    2.300726] sd 17:0:0:0: [sdd] Attached SCSI removable disk

[    2.309823] sd 17:0:0:1: [sde] Attached SCSI removable disk

[    2.317124] sd 17:0:0:2: [sdf] Attached SCSI removable disk

[    2.321412] sd 17:0:0:3: [sdg] Attached SCSI removable disk

[    3.802984] EXT4-fs (sdb4): re-mounted. Opts: discard

[    3.867587] Adding 524284k swap on /dev/sdb3.  Priority:-2 extents:1 across:524284k SS

[    3.943773] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)

[    3.977674] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)

```

```

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.0.13-gentoo root=/dev/sdb4 ro intel_idle.max_cstate=0

[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-5.0.13-gentoo root=/dev/sdb4 ro intel_idle.max_cstate=0

[    0.544732] sd 4:0:0:0: Attached scsi generic sg1 type 0

[    0.544763] sd 4:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)

[    0.545143] sd 4:0:0:0: [sda] Write Protect is off

[    0.545531] sd 4:0:0:0: [sda] Mode Sense: 00 3a 00 00

[    0.545610] sd 4:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[    0.545618] sd 6:0:0:0: Attached scsi generic sg2 type 0

[    0.545676] sd 6:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/233 GiB)

[    0.545688] sd 6:0:0:0: [sdb] Write Protect is off

[    0.545689] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00

[    0.545708] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[    0.547116] sd 7:0:0:0: Attached scsi generic sg3 type 0

[    0.547125]  sdb: sdb1 sdb2 sdb3 sdb4

[    0.547133] sd 7:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)

[    0.547134] sd 7:0:0:0: [sdc] 4096-byte physical blocks

[    0.547146] sd 7:0:0:0: [sdc] Write Protect is off

[    0.547147] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00

[    0.547162] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

[    0.548029] sd 6:0:0:0: [sdb] Attached SCSI disk

[    0.548559]  sdc: sdc1

[    0.549144] sd 7:0:0:0: [sdc] Attached SCSI disk

[    0.585152]  sda: sda1

[    0.585723] sd 4:0:0:0: [sda] Attached SCSI disk

[    0.706034] sdhci: Secure Digital Host Controller Interface driver

[    0.706188] sdhci: Copyright(c) Pierre Ossman

[    0.867676] EXT4-fs (sdb4): mounted filesystem with ordered data mode. Opts: (null)

[    2.411896] sd 17:0:0:0: Attached scsi generic sg5 type 0

[    2.430637] sd 17:0:0:1: Attached scsi generic sg6 type 0

[    2.435230] sd 17:0:0:0: [sdd] Attached SCSI removable disk

[    2.453109] sd 17:0:0:2: Attached scsi generic sg7 type 0

[    2.471624] sd 17:0:0:3: Attached scsi generic sg8 type 0

[    2.481323] sd 17:0:0:1: [sde] Attached SCSI removable disk

[    2.493083] sd 17:0:0:2: [sdf] Attached SCSI removable disk

[    2.504565] sd 17:0:0:3: [sdg] Attached SCSI removable disk

[    2.652581] EXT4-fs (sdb4): re-mounted. Opts: discard

[    2.717874] Adding 524284k swap on /dev/sdb3.  Priority:-2 extents:1 across:524284k SS

[    2.825647] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)

[    2.878508] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)

```

Something that makes me think the order is getting messed up is what I collected from a failed boot of the 5.0.13 kernel.  During the failed boot, one of the last messages I see before the panic related to the following partitions and corresponding PARTUUIDs:

```

sdf2 --> 7878d6c0-c5d7-4508-81c7-cbc7b7f971a8

sdf3 --> 5e95ba2c-c107-4c58-a419-ef021f3a4d80

sdf4 --> ebc14d26-4071-489e-b14d-9ef2caabe89f

```

However, after a successful boot of the 5.0.13 kernel, I see that those are actually referred to as sdb{2,3,4}, respectively:

```

# blkid

/dev/sdb1: PARTLABEL="grub" PARTUUID="0fc5f2ae-4ad9-479c-af6a-056a0c389d0f"

/dev/sdb2: UUID="fddb287d-f875-4a28-aae1-faacd32a2093" TYPE="ext2" PARTLABEL="boot" PARTUUID="7878d6c0-c5d7-4508-81c7-cbc7b7f971a8"

/dev/sdb3: UUID="5e2ed6f0-8d35-4dda-afb4-5e5f8d641ade" TYPE="swap" PARTLABEL="swap" PARTUUID="5e95ba2c-c107-4c58-a419-ef021f3a4d80"

/dev/sdb4: UUID="2c2f74cd-8c48-46ea-8f04-f75631315a3c" TYPE="ext4" PARTLABEL="rootfs" PARTUUID="ebc14d26-4071-489e-b14d-9ef2caabe89f"

/dev/sda1: UUID="3ba15db9-7c87-45d5-a3a6-4609d47aab3f" TYPE="ext4" PARTLABEL="vmdrive" PARTUUID="c5088cb5-3cf0-48d8-b944-6833e28a0abc"

/dev/sdc1: UUID="69ffede7-aa6b-424c-8dc7-bbbdb309180d" TYPE="ext4" PARTLABEL="data" PARTUUID="4ca590d7-0a52-4b80-aaa4-969c6dfa2d72"

```

The other two disks listed there are ones that house my VMs and one for all my data (separate from drive containing /boot, swap, and the root filesystem).

All of the sd* devices are separate disks.

Thanks again for your continued help in troubleshooting this strange problem.

Cheers,

Nathan Zachary

----------

## Anon-E-moose

I see sd[defg] are all labelled removable scsi, what are they, usb, esata, scsi, ???

What it sounds like is the ordering of disks is getting messed up, but only occasionally.

4.17 seems to be grabbing things a little quicker, and in order. 5.0 seems a little choppy.

----------

## NathanZachary

 *Anon-E-moose wrote:*   

> I see sd[defg] are all labelled removable scsi, what are they, usb, esata, scsi, ???
> 
> What it sounds like is the ordering of disks is getting messed up, but only occasionally.
> 
> 4.17 seems to be grabbing things a little quicker, and in order. 5.0 seems a little choppy.
> ...

 

They are all SATA disks (OS drive is an SSD, and the other two are spinning disks).

I am using OpenRC and the default is set for rc_parallel:

```

$ grep 'rc_parallel=' /etc/rc.conf 

#rc_parallel="NO"

```

----------

## Anon-E-moose

What I meant was what type subsystem is sdd-sdg, I have sata drives on the mb sata controller, and as usb drives, the onboard sata controller does it's thing first as a group, then the usb sata's fire up, and they do indeed change order (from time to time).

If these are all sata drives on a sata controller, then are there 2 separate controllers?

Again my mb has a main controller sata 1-6 and a second controller for 2 more sata drives.

ETA: I assume this is a desktop, what is the mb make/model?

ETA2: If you haven't tried it, the root=partuuid= option might get rid of some of the problem, at least it should boot and find root.

https://wiki.gentoo.org/wiki/GRUB -- down toward the bottom

```
If the root= parameter doesn't match the actual configuration, all is not lost. It is possible to edit the lines before booting. How this can be done, is explained in Knowledge Base:Adjusting GRUB settings for a single boot session

To get the USB disk boot without initramfs regardless of the number of installed disks, use a GPT partition table and the root=PARTUUID= kernel parameter as explained in this external link: Mounting root partition by UUID (no initrd needed)

Since kernel 3.8 and newer it is possible to use MBR 32-bit UUID, so it's possible to use a MBR partition table as well.

In this case PARTUUID refer to an MBR partition using the format SSSSSSSS-PP, where SSSSSSSS is a zero-filled hex representation of the 32-bit "NT disk signature", and PP is a zero-filled hex representation of the 1-based partition number.

To get "NT disk signature" one possibility is using fdisk:

root #fdisk -l /dev/sdd

The output will be something like Disk identifier: 0x2d6b036c, so assuming root partition is /dev/sdd2, the resulting line will be root=PARTUUID=2d6b036c-02

More info is available here: Description of PARTUUID feature

Using LABEL or UUID

Kernel boot parameters are real_root=LABEL= or real_root=UUID=.
```

----------

## NathanZachary

The motherboard is an Intel DX58SO2 (no making fun of how old it is  :Razz: ), and it has a "main" SATA controller as well as a Marvell eSATA controller:

```

# lspci | grep -i sata

00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller

02:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 10)

08:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6111/6121 SATA II / PATA Controller (rev b2)

```

I'm going to give the 'root=partuuid=' option a try and see how it goes.  If that fixes the problem, then great.  I'll continue to delve into possible causes for the change, but it is more important to get it fixed.  I don't see any reason to not go the partuuid route.

----------

## NathanZachary

Well using the 'root=PARTUUID=' kernel parameter fixed the problem.  I've consistently booted several times now, and am confident that that was the issue as seen by the current mount output:

```

# mount -l | grep '/dev/sd'

/dev/sdf4 on / type ext4 (rw,noatime,discard)

/dev/sdg1 on /home/zach/stuff type ext4 (rw,noatime)

/dev/sda1 on /vmdrive type ext4 (rw,noatime)

```

In previous successful boots, /dev/sdb4 was the root volume.  In this case, since it is /dev/sdf4, it would NOT have booted without the 'root=PARTUUID=' kernel parameter.  I made that the standard going forward by finding the PARTUUID with gdisk:

```

# gdisk /dev/sdf

GPT fdisk (gdisk) version 1.0.4

Partition table scan:

  MBR: protective

  BSD: not present

  APM: not present

  GPT: present

Found valid GPT with protective MBR; using GPT.

Command (? for help): i

Partition number (1-4): 4

Partition GUID code: 0FC63DAF-8483-4772-8E79-3D69D8477DE4 (Linux filesystem)

Partition unique GUID: EBC14D26-4071-489E-B14D-9EF2CAABE89F

First sector: 1579008 (at 771.0 MiB)

Last sector: 488395119 (at 232.9 GiB)

Partition size: 486816112 sectors (232.1 GiB)

Attribute flags: 0000000000000000

Partition name: 'rootfs'

Command (? for help): q

```

the PARTUUID is the "Partition unique GUID" code above, or in my specific case, 'EBC14D26-4071-489E-B14D-9EF2CAABE89F'.  I then added that as an option within GRUB2:

```

# grep 'GRUB_CMDLINE_LINUX=' /etc/default/grub | grep -v '^#'

GRUB_CMDLINE_LINUX="root=PARTUUID=EBC14D26-4071-489E-B14D-9EF2CAABE89F"

```

and regenerated my grub.cfg with `grub-mkconfig -o /boot/grub/grub.cfg`. Now it shows on all kernel lines, overriding the first 'root=' line for all kernels:

```

# grep 'vmlinuz' /boot/grub/grub.cfg

   linux   /vmlinuz-5.0.13-gentoo root=/dev/sdf4 ro root=PARTUUID=EBC14D26-4071-489E-B14D-9EF2CAABE89F

      linux   /vmlinuz-5.0.13-gentoo root=/dev/sdf4 ro root=PARTUUID=EBC14D26-4071-489E-B14D-9EF2CAABE89F

      linux   /vmlinuz-5.0.13-gentoo root=/dev/sdf4 ro single root=PARTUUID=EBC14D26-4071-489E-B14D-9EF2CAABE89F

      linux   /vmlinuz-4.17.14-gentoo root=/dev/sdf4 ro root=PARTUUID=EBC14D26-4071-489E-B14D-9EF2CAABE89F

      linux   /vmlinuz-4.17.14-gentoo root=/dev/sdf4 ro single root=PARTUUID=EBC14D26-4071-489E-B14D-9EF2CAABE89F

```

So, the problem is resolved now, but I am going to continue to investigate to see if I can find what caused the change.  I also noticed that the boot time on 5.0.13 is consistently ~6-8 seconds slower than with the previous 4.17.14 kernel that I was using.  Anyway, thanks for all the help in tracking down the problem, and I am anxious to see whether or not it persists in the 5.1 kernels (once I am able to build the nvidia-drivers with one).

Cheers,

Nathan Zachary

----------

## krinn

 *NathanZachary wrote:*   

> They are all SATA disks (OS drive is an SSD, and the other two are spinning disks).

 

What is important there, is are they all using the same controller?

kernel just do first seen, first served rules (which is actually as simple as that).

if you have controller1: loaded as module, with pciid1

and controller2: buildin kernel, with pciid3 (the point is not pciid3, the point is just showing an id higher than controller1)

kernel see buildin first (because it's build in)

and sda, sdb... are given to disks attach to controller2

if you have buildin controller1 driver (so both controller drivers are buildin), it's still first seen, but first seen is now the lowest pciids in the list:

kernel see controller1 first (because lowest pciid) and sda, sdb... are attach to it

I made a doc that you might like read times ago, trying to explains that: https://forums.gentoo.org/viewtopic-t-1007788.html

You must be in the case where your bios play with pciids (some bios do alter pciids depending on option set, and sometimes, even with something totally different, like enabling a network card or sound card...) and change pciids of controllers

----------

## Anon-E-moose

Glad it's fixed, at least for booting.

With disks that may get reordered (from boot to boot) partuuid is the way to go, at least for grub, I like using labels in /etc/fstab (just because they're shorter)

Just curious, which disks are attached to which sata controllers. 

And no, I don't make fun of running old stuff.

I usually keep my mb's/systems around for years. 

 :Laughing: 

----------

