# Kernel 4.9 works, 4.12 panics b/c of disk [SOLVED]

## jesnow

I can't get my kernel upgrade to work. Currently I'm running 4.9.16 just fine, but getting the new kernel to run is proving frustrating.

It panics with "unable to mount root on unknown-block(x,y)" where x and y are not zero. 

https://www.facebook.com/photo.php?fbid=10213160569503112&set=a.1085637895441.15367.1061190698&type=3

It's not my first kernel panic. Normally I would conclude that it's not recognizing the fs and look to be sure the driver is compiled in. I'm using the exact same everything for the old kernel as for the new, same entries in lilo.conf (sorry not changing it), same .config even (using make oldconfig), but 4.9.16 boots and 4.12.12 does not. 

Here are the relevant .config entries:

```

CONFIG_EXT2_FS=y

# CONFIG_EXT2_FS_XATTR is not set

# CONFIG_EXT2_FS_XIP is not set

CONFIG_EXT3_FS=y

# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set

CONFIG_EXT3_FS_XATTR=y

# CONFIG_EXT3_FS_POSIX_ACL is not set

# CONFIG_EXT3_FS_SECURITY is not set

CONFIG_EXT4_FS=y

# CONFIG_EXT4_FS_POSIX_ACL is not set

# CONFIG_EXT4_FS_SECURITY is not set

# CONFIG_EXT4_DEBUG is not set

```

and I'm pretty sure it's actually an ext4:

```

Merckx src # uname -a

Linux Merckx 4.9.16-gentoo #3 SMP PREEMPT Thu Jun 1 17:53:56 CDT 2017 i686 Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz GenuineIntel GNU/Linux

Merckx src # file -sL /dev/sdb2

/dev/sdb2: Linux rev 1.0 ext4 filesystem data, UUID=419902b4-6b85-4a2c-8e81-82a45d498c9c (needs journal recovery) (extents) (large files) (huge files)

```

Any help would be greatly appreciated. 

JonLast edited by jesnow on Mon Dec 18, 2017 9:32 pm; edited 1 time in total

----------

## Jaglover

What about partition table support.

----------

## jesnow

From what you said on the other thread it looks like it's time to switch to amd64, but I don't want to start that process until I understand what's wrong here. 

```

Merckx linux # grep PARTITION .config

CONFIG_PARTITION_ADVANCED=y

# CONFIG_ACORN_PARTITION is not set

# CONFIG_AIX_PARTITION is not set

# CONFIG_OSF_PARTITION is not set

# CONFIG_AMIGA_PARTITION is not set

# CONFIG_ATARI_PARTITION is not set

CONFIG_MAC_PARTITION=y

CONFIG_MSDOS_PARTITION=y

# CONFIG_MINIX_SUBPARTITION is not set

# CONFIG_SOLARIS_X86_PARTITION is not set

# CONFIG_LDM_PARTITION is not set

# CONFIG_SGI_PARTITION is not set

# CONFIG_ULTRIX_PARTITION is not set

# CONFIG_SUN_PARTITION is not set

# CONFIG_KARMA_PARTITION is not set

CONFIG_EFI_PARTITION=y

# CONFIG_SYSV68_PARTITION is not set

# CONFIG_CMDLINE_PARTITION is not set

```

----------

## NeddySeagoon

jesnow,

Your unknown-block(8,18) is sdb2.  Is that where you expect root to be?

The kernel block device names are allocated on a first come, first served basis, so they are not deterministic.

It all depends on the way the devices are arranged on the various buses.

e.g. leaving a USB block device connected can change the drive ordering.

Post your lilo.conf and post the output of blkid

This is going towards changing lilo.conf to use root=PARTUUID= to define the root device.

I think that's transparent to lilo.

If your drives are swapped by your new kernel, fstab won't work any more either. You should use LABELs or UUIDS there.

The kernel understands PARTUUID without any help but needs an initrd to find root by LABEL or UUID

----------

## Jaglover

If I recall correctly Lilo won't accept PARTUUID.  :Sad:  Maybe there is a way to override Lilo config check.

----------

## jesnow

I'm familiar with both of those issues. /dev/sdb2 is indeed where / is and is indeed where 4.9.16 finds it just fine. 

My lilo.conf has a lot of cruft (commented out old kernels going back a decade, starting with 2.6.16-r11!) but here is the relevant section:

```

#

# Start LILO global section

#

# Faster, but won't work on all systems:

compact

# Should work for most systems, and do not have the sector limit:

lba32

# If lba32 do not work, use linear:

#linear

# MBR to install LILO to:

boot = /dev/sda

map = /boot/.map

install = /boot/boot-menu.b   # Note that for lilo-22.5.5 or later you

                              # do not need boot-{text,menu,bmp}.b in

                              # /boot, as they are linked into the lilo

                              # binary.

large-memory                          

menu-scheme=Wb

prompt

# If you always want to see the prompt with a 15 second timeout:

timeout=150

delay = 50

default=4916

image = /boot/vmlinuz-4.9.16-gentoo

        root = /dev/sdb2

        label = 4916

image = /boot/vmlinuz-4.12.5-gentoo

        root = /dev/sdb2

        label=4125

image = /boot/vmlinuz-4.12.12-gentoo

        root = /dev/sdb2

        label = 41212

```

All of the 4.9's (there are  2 others) work fine. Here too is fstab:

```

# <fs>                  <mountpoint>    <type>          <opts>          <dump/pass>

none                    /proc           proc            defaults        0 0

none                    /dev/shm        tmpfs           nodev,nosuid,noexec     0 0

/dev/sdb2               /               ext4            noatime,defaults 0 1

/dev/sdb1               none            swap            sw              0 0

PARTLABEL=Data          /home           auto            noatime,defaults 0 0

/dev/cdrom              /mnt/cdrom      iso9660         user,unhide,noauto,ro   0 0

```

It's a real headscratcher!

----------

## Jaglover

Then for some reason the new kernel is not enumerating your second drive as sdb. My 2¢.

----------

## eccerr0r

Oh gosh, lilo. haven't used it in so long...

If setting root= doesn't work, you could also try adding it in append=

```
append="root=PARTUUID=xxxxxxxx"
```

to see if this works?

----------

## NeddySeagoon

jesnow,

The kernel can't read the filesystem at 8,18, whatever that is.

What filesystems do you have at 

```
/dev/sd?2
```

----------

## jesnow

on 4.9.16 it is for sure my ext4 root fs on sdb2. But maybe 4.12.12 is pointing at a different partition? Is there some difference in how filesystems are handled in the time between the two?

 *NeddySeagoon wrote:*   

> jesnow,
> 
> The kernel can't read the filesystem at 8,18, whatever that is.
> 
> What filesystems do you have at 
> ...

 

----------

## NeddySeagoon

jesnow,

No, filesystems are the same, except maybe bugfixes.  

By chance, your drives may enumerate in a different order.

What sort of filesystem is on partition 2 of all your other drives?

----------

## jesnow

This is interesting and maybe relevant :

```

Merckx mnt # lsblk -o NAME,kname,MAJ:MIN,fstype

NAME   KNAME MAJ:MIN FSTYPE

sdd    sdd     8:48  

`-sdd1 sdd1    8:49  ext4

sdb    sdb     8:16  

|-sdb2 sdb2    8:18  ext4

|-sdb3 sdb3    8:19  ext4

`-sdb1 sdb1    8:17  swap

sr0    sr0    11:0   

loop0  loop0   7:0   iso9660

sda    sda     8:0   

|-sda2 sda2    8:2   ntfs-3g

|-sda3 sda3    8:3   ntfs-3g

`-sda1 sda1    8:1   ntfs-3g

```

It appears the device number the kernel is trying to boot from doesn't match the one I'm telling lilo to find the boot image on: 8,3 is sda3, not sdb2. This was indeed an issue in the past when lilo would give the warning that the mbr and boot image were on different disks, but would then deal with it gracefully. It seems willing to boot from /dev/sdb2 (as I'm doing right now) as kernel 4.9 but not as 4.12. 

Any idea how to troubleshoot this?

Jon.

----------

## eccerr0r

Well, seems your kernel is missing drivers, is the only thing explaining why 4.9 can see but 4.12 can't see.

Are both sda and sdb on the same controller?  SATA vs USB disks vs accessory PATA controllers?

----------

## NeddySeagoon

jesnow,

```
sdb2 sdb2    8:18  ext4

sda2 sda2    8:2   ntfs-3g
```

Humour me ... build in kernel support for ntfs but NOT the write support.  Write support is useless, 'incomplete' but what is there is mostly harmless.

If your drives are being swapped, the ntfs partition will mount read only, which is safe and you will get a different error.

I don't know what the error will be, read only root and lots of things don't work on ntfs.

The main thing is if you don't get the unknown-block panic, it confirms the drive enumeration order.

The fix is to pass root=PARTUUID=...  in the append statement in lilo.conf

----------

## jesnow

Ok I did it. The drive ordering does indeed seem to be the issue:

```
 VFS: Mounted root (ntfs filesystem) readonly on device 8:18.

.

.

.

Kernel Panic - not syncing: No working init found.

```

Which should indeed be /dev/sdb2 from the table above that shows the output of lsblk. But if it found an ntfs file system there, it can't be /dev/sdb2, and is most likely /dev/sda2.

I will try your fix.

----------

## jesnow

Success. 

I now think that if there is device numbering inconsistency between 4.12 and 4.9, that means that 

the next time I run lilo, I will no longer be able to boot 4.9 kernels without a root=PARTUUID= directive. 

Many thanks for the help. If you're ever in Texas I will buy you a beer, or if I am ever out swimming in the Firth of Forth I will bring you one.

----------

## NeddySeagoon

jesnow,

You can use append PARTUUID with both kernels.

You will need no write /etc/fstab in a device independent way too.

UUID, LABEL or PARTUUID works there.

----------

## jesnow

I tracked the issue down to a printer with a card slot that was being registered as /dev/sda and throwing all the other disks off by 1. 

There are many solutions to this problem, but probably the best one in retrospect is to use one of the many trees in /dev/ that list the disks by device name, device id etc:

```

jesnow@Merckx ~ $ dir /dev/disk/   

total 0

drwxr-xr-x  8 root root   160 Dec 18 15:29 .

drwxr-xr-x 15 root root 13760 Dec 18 15:32 ..

drwxr-xr-x  2 root root   400 Dec 18 15:29 by-id

drwxr-xr-x  2 root root   100 Dec 18 15:29 by-label

drwxr-xr-x  2 root root    60 Dec 18 15:29 by-partlabel

drwxr-xr-x  2 root root   180 Dec 18 15:29 by-partuuid

drwxr-xr-x  2 root root   280 Dec 18 15:29 by-path

drwxr-xr-x  2 root root   180 Dec 18 15:29 by-uuid

jesnow@Merckx ~ $ dir /dev/disk/by-label

total 0

drwxr-xr-x 2 root root 100 Dec 18 15:29 .

drwxr-xr-x 8 root root 160 Dec 18 15:29 ..

lrwxrwxrwx 1 root root  10 Dec 18 15:29 HP_RECOVERY -> ../../sdb3

lrwxrwxrwx 1 root root  10 Dec 18 15:29 OS -> ../../sdb2

lrwxrwxrwx 1 root root  10 Dec 18 15:29 SYSTEM -> ../../sdb1

jesnow@Merckx ~ $ dir /dev/disk/by-uuid 

total 0

drwxr-xr-x 2 root root 180 Dec 18 15:29 .

drwxr-xr-x 8 root root 160 Dec 18 15:29 ..

lrwxrwxrwx 1 root root  10 Dec 18 15:29 419902b4-6b85-4a2c-8e81-82a45d498c9c -> ../../sdc2

lrwxrwxrwx 1 root root  10 Dec 18 15:29 43d3cc9d-2fae-46c4-af2d-a5e6ffe15be4 -> ../../sdc1

lrwxrwxrwx 1 root root  10 Dec 18 15:29 75c774a6-aa35-4d3a-b891-595f17e28908 -> ../../sdd1

lrwxrwxrwx 1 root root  10 Dec 18 15:29 B0766A4A766A1180 -> ../../sdb1

lrwxrwxrwx 1 root root  10 Dec 18 15:29 EE40262E4025FDC9 -> ../../sdb2

lrwxrwxrwx 1 root root  10 Dec 18 15:29 F6A26B6DA26B3173 -> ../../sdb3

lrwxrwxrwx 1 root root  10 Dec 18 15:29 d7e5829d-9396-4762-acf3-936292735960 -> ../../sdc3

jesnow@Merckx ~ $ 

```

Any of these will work in /etc/fstab [EDIT: DO NOT DO THIS see below] and in lilo.conf (I assume grub will use them too). That way the "append=" lines can be avoided, and the insane non-repeatable legacy disk naming system can be jettisoned all together.Last edited by jesnow on Tue Jan 02, 2018 7:17 pm; edited 1 time in total

----------

## NeddySeagoon

jesnow,

Do not write symlink names in /etc/fstab.

There will be a race condition between udev, creating the symlinks and localmount trying to use them.

Some things may not mount at boot.  

LABEL, UUID, PARTUUID are all safe and unambiguous.

Well, LABELs are up to you.

Read the news item

```
$ eselect news read 32

2016-11-04-important_fstab_and_localmount_update

  Title                     Important fstab and localmount update
```

----------

## jesnow

NeddySeagoon, thanks once again for the save. 

What is the correct solution, now that the problem is found? How do I absolutely prevent a usb device from claiming /dev/sda?

[Edit: unplug the offending thing is one solution].

Cheers, 

Jon.Last edited by jesnow on Tue Jan 02, 2018 8:58 pm; edited 1 time in total

----------

## NeddySeagoon

jesnow,

You can't.  Its luck/race conditions.

The correct solution is to use root=PARTUUID= on the kernel command line and the device independent identifiers in /ets/fstab

Then your configuration is totally independent of device names allocated by the kernel.

root=UUID= works if you have an initrd that includes the userspace mount command.

----------

## Ant P.

Build usb-storage as a module. Otherwise, you don't - that's why we have PARTUUID in the first place.

----------

## jesnow

OK, I have now removed all /dev/sd* references from /etc/fstab and /etc/lilo.conf. I guess the will become deprecated, but is still now very much standard in the gentoo documentation. 

Many thanks. Blkid is indeed now my friend. 

Jon.

----------

## Hu

It is standard because it is easy to explain and, for people who have only one block storage device (or who enjoy perfectly consistent enumeration of block storage devices), using sd names works fine.  For anyone who cannot count on predictable enumeration, the techniques in this thread are a good solution.

----------

