# [SOLVED] Weird LVM boot time issue

## twork

I have a machine I've been running in an LVM root for years.  After a recent system update, LVM detection stopped working at boot time.

If I boot from a Gentoo install CD, 'vgchange -a y' works fine, detects my volumes and I can mount them.

Thinking I must have done something wrong with my kernel, I took the config.gz from /proc while my machine was booted from the install CD (kernel version 2.6.38), and fed that to genkernel on my updated, hosed userland (kernel version 2.6.39).  Command line I'm using to build my kernel and initrd:

```
genkernel --lvm --kernel-config=/usr/src/from-gentoo-config --bootloader=grub --real-root=/dev/mapper/pile-dense--root --makeopts="-j5" all
```

That yields the same result.  At boot, the kernel loads, goes through hardware detection, and finally dies with:

```
[...]

>> Scanning for Volume Groups

  Reading all physical volumes.  This may take a while...

>> Activating Volume Groups

>> Determining root device...

!! Block device /dev/pile/dense-root is not a valid root device...

!! Could not find the root block device in .

   Please specify another value or: press Enter for the same, type "shell" for a shell, or "q" to skip...

root block device() ::

```

If I drop to a shell and run 'vgscan', it appears to run successfully, but doesn't find any volumes:

```
# vgscan

  Reading all physical volumes.  This may take a while...

  No volume groups found
```

So I'm not sure what to try next.  Presumably my kernel is good, since it's built to the same spec as a kernel that does work; and apparently genkernel is populating my ramdisk with the LVM binaries.

I thought maybe the kernel was failing to detect my hard drives, but when I drop to the shell after boot failure, '/dev' is properly populated with nodes, and dmesg shows successful drive detection as well.

I found this thread: https://forums.gentoo.org/viewtopic-t-886388-highlight-lvm.html ...which described similar symptoms, but the fix there was to specify md commands to genkernel, and I'm not using any RAID, just an LVM over two differently sized disks.  All the same I did try another round with --mdadm on my genkernel command line, with no change.

Any hints?  More information I can provide?Last edited by twork on Fri Jul 29, 2011 9:58 pm; edited 1 time in total

----------

## Hu

When you drop to the recovery shell, what is the output of lvm pvscan; lvm pvs; lvm vgscan; lvm vgs; lvm lvscan; lvm lvs?

----------

## twork

 *Hu wrote:*   

> When you drop to the recovery shell, what is the output of lvm pvscan; lvm pvs; lvm vgscan; lvm vgs; lvm lvscan; lvm lvs?

 

```
# lvm pvscan

  No matching physical volumes found

```

```
# lvm pvs

```

 (no output at all; "echo $?" returns 0.)

```
# lvm vgscan

  Reading all physical volumes.  This may take a while...

  No volume groups found

```

```
# lvm vgs

  No volume groups found

```

```
# lvm lvscan

  No volume groups found

```

```
# lvm lvs

  No volume groups found

```

----------

## frostschutz

Are your PV devices actually missing (which would point to missing driver for them in your kernel or initramfs)?

If the devices are there but lvm doesn't find them, maybe you excluded them by accident in your lvm.conf?

----------

## twork

 *frostschutz wrote:*   

> Are your PV devices actually missing (which would point to missing driver for them in your kernel or initramfs)?
> 
> If the devices are there but lvm doesn't find them, maybe you excluded them by accident in your lvm.conf?

 

The raw devices (sda, sda1, sdb, etc.) exist under /dev when I drop to the rescue shell.

When I boot to the install CD, it finds my LVM devices without any lvm.conf, and I haven't changed lvm.conf during the course of my latest upgrade.

----------

## frostschutz

And the devices are actually readable?

hexdump -C -n 512 /dev/sda etc. works?

----------

## twork

 *frostschutz wrote:*   

> And the devices are actually readable?
> 
> hexdump -C -n 512 /dev/sda etc. works?

 

Yep.  From the rescue shell:

```
# for d in /dev/sd* ; do hexdump -C -n 512 $d > /dev/null ; echo $? ; done

0

0

0

0

0

```

I also tried setting aside /etc/lvm/lvm.conf from the ramdisk, in case maybe there was something bad in there.  But when I re-ran lvscan with no lvm.conf, the result was the same, no volumes found.

----------

## twork

I may have just found a data point, but I'm not yet sure what to do about it.

Trying to answer the question, "what's different between my on-disk system and the one on the Gentoo install CD", I thought I'd try comparing the two copies of the LVM userland utilities.

What I did:

Booted from the install CD;Mounted my boot and root volumes;Extracted the contents of my initramfs;chroot'ed to my on-disk filesystem;Ran the copy of '/bin/lvm' from the ramdisk, to see if it behaved the same way under the install environment.

When I ran './lvm pvscan' the first time, I got an error:

```
/sys/block: opendir failed: No such file or directory
```

...followed by normal pvscan output:

```
PV /dev/sda2   VG pile   lvm2 [148.95 GB / 148.95 GB free]

PV /dev/sdb1   VG pile   lvm2 [698.63 GB / 109.63 GB free]

Total: 2 [847.59 GB] / in use: 2 [847.59 GB] / in no VG: 0 [0   ]
```

Aha, I failed to mount /sys inside my chroot.  So I backed out of the chroot, did a 'mount -o bind' to have /sys appear under /mnt/gentoo, chrooted back in again, and now the copy of '/bin/lvm' from my ramdisk is failing to detect any volumes again.

Okay, so that suggests that somehow my initramfs is getting a copy of lvm that expects the old, deprecated /sys layout.  That wouldn't be so surprising since the binaries were built while I was running an older kernel, and maybe the tools were built to expect the old /sys... But, when I call /bin/lvm from within my chroot (which should be from the same source and build environment as the static copy in the ramdisk), it works fine.

Now I'm wondering if maybe genkernel did something funky when it built the static copy for the ramdisk, and didn't build lvm2 against the same kernel sources it was using for the kernel itself.  Or something.

----------

## NeddySeagoon

twork,

Don't use Genkernel, roll your own initrd.

You won't need all of that link - just the initrd part.  Leave out the parts you don't need.

----------

## twork

Neddy is on to something.

Since my last post I tried rebuilding the initramfs via genkernel from within my chroot, hoping that it would pick up on my up-to-date kernel sources and build, contents of the current up-to-date /sys, whatever... No joy, the resulting binary in the ramdisk is still busted.

So I unpacked the ramdisk, copied my system's lvm.static over the one genkernel compiled, rebuilt the ramdisk, and now I'm looking at a happily booted machine.

I guess that's a bug in genkernel?  Its guts definitely make an attempt to deal with LVM, but the scripts are such a sub-sourced mess that I got lost trying to trace where it gets all its variable values from.

----------

## NeddySeagoon

twork,

File a bug against genkernel at bugs.gentoo.org please.

----------

## twork

I tried, but my browser is gagging, claiming an invalid cert -- won't even let me click the "I don't care about MitM attacks" button.

I'll wait a while and try again.

----------

## NeddySeagoon

twork,

Gentoo uses CAcert for certifying sites etc.  Their root certificate is not installed by default, so you need to visit cacert.org and install it.

If you visit the forums with https:// you will get the same thing.

----------

