# Multi-device Btrfs root fails to mount without initrd

## the8lack8ox

I know this issue has been touched on briefly in these forums, but I'd like to revive it.

I have a RAID10 array of six HDDs containing a Btrfs filesystem as the third partition on each.  In this Btrfs filesystem, are three subvolumes (root, boot and home).  The system boots via GRUB2 (the only choice it would seem for people with multi-device Btrfs filesystems using LZO compression, unfortunately).  Right now, I have an initrd built by dracut that takes care of the device scanning required to build the Btrfs array, which works fine.

Now I'm looking to get rid of the annoying initrd image, whose sole purpose is to perform that one Btrfs scan.  It's not that it's causing me any trouble, really, my OCD is just getting the better of me here.  I've seen that to make the magic happen, all you need to do, supposedly, is add a device option for each partition in the array via the rootflags kernel parameter.  I also added a rootfstype parameter, just for kicks.  So this gives a kernel command line like this:

/boot/kernel-3.3.5-gentoo root=/dev/sda3 ro rootflags=device=/dev/sda3,device=/dev/sdb3,device=/dev/sdc3,device=/dev/sdd3,device=/dev/sde3,device=/dev/sdf3,subvol=root rootfstype=btrfs quiet

And I reboot and... my root filesystem fails to mount.  The first time it's supposed to be mounted, I mean.  If I take off the quiet, the kernel reports that I have 6 disks to choose from containing 3 partitions each.  Well, great!  That's exactly what it should say!  So, why won't my root filesystem mount?  Any ideas?  I'm lost and don't know how to get any more info out of the kernel.

----------

## massimo

Please refer to [1]. You should have put those parameters/options into the corresponding fstab entry and not into boot line.

[1] btrfs wiki

----------

## the8lack8ox

I've done that, and it doesn't matter anyway.  The problem occurs when the root FS is first being mounted (fstab isn't accessible yet).  The kernel needs to be informed of the geometry of the RAID array, which should be taken care of the device options on rootflags, but it's almost like the btrfs mounter is ignoring those options or not seeing them or something.   :Confused: 

----------

## massimo

Try it with subvol=@root (note the @ which is missing in your case).

----------

## the8lack8ox

Still no good with the subvol=@root.

Starting to look thru kernel code.  Whee!

----------

## py-ro

Kernel can't detect multi volume btrfs (and as far as i know, nobody wants to add it). You need an initrd issuing the btrfs scan, if u want to have / on multivolume btrfs.

----------

## the8lack8ox

I'm getting closer to the source of the problem.  I added a bunch of printk's to my kernel to get some decent error reportage.  It looks like, oddly, that the device files /dev/sda3, /dev/sdb3, and so on aren't there when the btrfs_mount is called.  Right now I'm looking at btrfs_scan_one_device, which calls blkdev_get_by_path and blkdev_get_by_path returns an error.

 *Quote:*   

> Kernel can't detect multi volume btrfs (and as far as i know, nobody wants to add it). You need an initrd issuing the btrfs scan, if u want to have / on multivolume btrfs.

 

Well I want to add it.  :Laughing: 

----------

## ulenrich

Just a question as dracut should do exactly that:

dependency solving

Do you want to develop an alternative to dracut?

----------

## py-ro

The kernel folk don't want this autodetectin in kernel, that was what i tried to say. It is the same reason why metadata > 0.9 are not recognized.

----------

## the8lack8ox

The scan function is written in the kernel already as btrfs_scan_one_device in fs/btrfs/volumes.c.  When you run btrfs device scan without arguments, all it does is run that routine against all possible devices through an ioctl call.

All I want is the device option to work at boot, so I don't have to use the buzz-saw that is initrd to mount my root filesystem.  Easy enough one would think.  I'm not auto-detecting anything.  In fact, I'm just trying to friggin' tell it what devices my filesystem is on (but it doesn't want to listen, sadly).

It is clear to me now that /dev is not fully populated at the time the root filesystem is being mounted.  Interestingly, the device being mounted to / is something called /dev/root, and the device nodes /dev/sda3, /dev/sdb3, and so on are not present at that point in time.  This document is rather interesting and enlightening of the situation.

I was hoping that the fairly new CONFIG_DEVTMPFS_MOUNT, which "Automount[s] devtmpfs at /dev, after the kernel mounted the rootfs", would give access to those devices when attempting to mount the real root fs.  And apparently, it does not, or is not ready to, or whatever.  (What is that option really good for anyway other than causing boot failures when you upgrade udev!  Bah I say!)

I notice that OpenRC/udev do this thing called "populating /dev though uevents" or something like that.  Maybe it doesn't do anything at all; I don't know yet.  Anyway, the question becomes if /dev really is mounted right after rootfs is mounted, how do I populate it so I can use it to mount my real root.  I guess in the worse case I have to create an initramfs with the device nodes already there.  I thought the whole idea of that config was to keep the treadmill going away from the initrd/initramfs stuff.   :Confused: 

----------

## Ant P.

Try enabling CONFIG_SCSI_WAIT_SCAN.

----------

## the8lack8ox

Bad news.  The / filesystem gets mounted before devtmpfs at /dev, and the kernel figures out what is pointed to by the root= parameter separately from everything else.  It pretty much goes like this:

Make /dev/root

Mount /dev/root to /root

Mount devtmpfs to /root/dev

Mount /root to /

At least, that's what I can gather from the final few lines of init/do_mounts.c.  Hence, the device options in rootflags don't work because the device nodes don't exist yet.  It looks like getting that to work would require invasive changes that are not too pleasant.

That's enough for me today.  I've been awake for 42 hours straight.  I think I'll sleep now.

----------

## the8lack8ox

At last! Success!   :Surprised:   I've hacked my kernel just a tad bit to mount a devtmpfs before attempting to mount the root filesystem.  Now I can mount my multi-device Btrfs root filesystem without using an initramfs/initrd!  Here is my patch:

```
--- linux-3.3.5-gentoo.old/init/do_mounts.c   2012-05-15 21:50:19.177935788 -0400

+++ linux-3.3.5-gentoo/init/do_mounts.c   2012-05-15 21:55:01.949906368 -0400

@@ -493,8 +493,13 @@

    }

 #endif

 #ifdef CONFIG_BLOCK

-   create_dev("/dev/root", ROOT_DEV);

-   mount_block_root("/dev/root", root_mountflags);

+   if(saved_root_name[0]) {

+      devtmpfs_mount("dev");

+      mount_block_root(saved_root_name, root_mountflags);

+   } else {

+      create_dev("/dev/root", ROOT_DEV);

+      mount_block_root("/dev/root", root_mountflags);

+   }

 #endif

 }

 

```

Also available here.

----------

## s4e8

You don't need the patch, instead embeded initramfs contains /dev/sdX things plus /dev/console.

----------

## shimbob

Nice, good job. I'm investigating putting / on a multi-device btrfs as well. Glad it's possible.

----------

## Dark_Ebola

 *the8lack8ox wrote:*   

> At last! Success!    I've hacked my kernel just a tad bit to mount a devtmpfs before attempting to mount the root filesystem.  Now I can mount my multi-device Btrfs root filesystem without using an initramfs/initrd!  Here is my patch:
> 
> ```
> --- linux-3.3.5-gentoo.old/init/do_mounts.c   2012-05-15 21:50:19.177935788 -0400
> 
> ...

 

what's up with this patch?

have you talked about it on lkml/btrfs ml ?

I'm not familiar at all with the init process, so I have no clue if what you did is a good/bad thing for other fs ...

I'd like to do the same but I'm reluctant to maintain an out of the tree patch for decades just to boot  :Razz: 

----------

## budee

Thank you for the patch the8lack8ox, you are awesome! Unfortunately it didn't apply when I wanted to update from kernel 3.6 to 3.7, so I went ahead to the newest 3.8 and modified it to work with that. Below is my patch, which is working fine for me with 3.8.2.

```

$ cat /etc/portage/patches/sys-kernel/gentoo-sources/earlydevtmpfs.patch 

--- init/do_mounts.c.orig   2013-03-24 20:49:53.446971127 +0100

+++ init/do_mounts.c   2013-03-24 20:51:46.408237541 +0100

@@ -529,6 +529,7 @@

    create_dev("/dev/root", ROOT_DEV);

    if (saved_root_name[0]) {

       create_dev(saved_root_name, ROOT_DEV);

+      devtmpfs_mount("dev");

       mount_block_root(saved_root_name, root_mountflags);

    } else {

       create_dev("/dev/root", ROOT_DEV);

```

----------

## aventrax

Hello, I found this thread very interesting.

I've got a raid0 btrfs array on three disks and I'm trying to boot it without the initrd.

I applied the devtmpfs patch to mount /dev/ earlier, and I added the complete rootflags specifying the root device composing my root partition..

..without success :/

I'm asking, is it possible to do without the initrd? This thread stops after a while and I'm not sure if somebody has successfully booted the root filesystem on a btrfs raid...

Thanks

M

----------

## gus.j.power

I had exactly the same issue trying to boot of a raid1 btrfs drive pair. I applied budee's patch above against 3.14.14 and I'm up and running without needing an initrd. 

+1 !

----------

## SharkWipf

Had the same issue, managed to get it working thanks to this patch.

Had to modify it a little again to get it to play nice with 4.2, if anyone still runs into this issue, here's the modified patch:   

```
$ cat /etc/portage/patches/sys-kernel/gentoo-sources/earlydevtmpfs.patch

--- init/do_mounts.c.orig   2015-10-27 23:23:09.975919490 +0100

+++ init/do_mounts.c   2015-10-27 23:27:49.718907453 +0100

@@ -542,6 +542,7 @@

              int err = create_dev(saved_root_name, ROOT_DEV);

          if (err < 0)

             pr_emerg("Failed to create %s: %d\n", saved_root_name, err);

+         devtmpfs_mount("dev");

          mount_block_root(saved_root_name, root_mountflags);

       } else {

          int err = create_dev("/dev/root", ROOT_DEV);

```

This, combined with a (built-in) kernel command line specifying the required devices, allowed my kernel to boot as an EFI stub kernel with a btrfs root.

If anyone's still breaking their head over the limited documentation on the kernel command line options, what I ended up using is: 

```
root=/dev/sda2 rootfstype=btrfs rootflags=device=/dev/sda2,device=/dev/sdb2,device=/dev/sdc2
```

(Note that it'd be better to specify partitions by their PARTUUID where possible, but I was being lazy when I set this up)

----------

## s4e8

the simplest way is embed required dev-node into rootfs.

(find /dev -name "sd*"; echo /dev/console; echo /root) | cpio -H newc -o > dev.cpio

then put dev.cpio into CONFIG_INITRAMFS_SOURCE

----------

## SharkWipf

 *s4e8 wrote:*   

> the simplest way is embed required dev-node into rootfs.
> 
> (find /dev -name "sd*"; echo /dev/console; echo /root) | cpio -H newc -o > dev.cpio
> 
> then put dev.cpio into CONFIG_INITRAMFS_SOURCE

 

I decided to give this method a try as the 4.3 kernel seems to have changed things around a bit.

Sadly, that does not seem to work for me, getting a (different) kernel panic from it saying:

```
VFS: Cannot open root device "sda2" or unknown-block(8,2) error -2

Please append a correct "root=" boot option; here are the available partitions:
```

- After which it continues to list all my disks and partitions, including said sda2 (and all the other devices my btrfs root consists of).

Well, I didn't want to go through the hassle of setting up a full (embedded) initramfs because it's kind of a pain with EFI stub booting, but seems I don't have much of a choice.

Ohwell.

----------

## Roman_Gruber

are you sure you can boot such a setup without an initramfs?

one option is to embed the initramfs in the kernel. so you do not see it in your bootloader. but it's still there

----------

## SharkWipf

It is possible to boot initramfs-less according to the btrfs documentation, however the patch mentioned earlier is required to actually make it work kernel-side.

With said patch it works, however updating the patch with each kernel update is a bit of a bother.

I'm currently playing with Dracut to generate an initramfs which I will then either embed into the kernel or package with the --uefi flag.

Sadly Dracut's --uefi flag requires a systemd EFI stub file, so I'm gonna have to either rip that from some other box somewhere or stick with embedding.

It's a bit annoying that the extra complexity of an initramfs is required for something that also works when adding 1 line of code to the kernel though (although I must admit I don't know the other implications of that change).

----------

## bradskins

has anyone gotten this to work with 4.4.0? I have it working fine on 4.3.3.

----------

## Ant P.

Been banging my head against this problem for a few days now... I have a headless, remote server:

/dev/sda and /dev/sdb are both plain MBR disks, with a 512MB partition 1 and the rest as partition 2. sda1 is an ext4 /boot, sda2 is btrfs /, sdb is ignored for now. This setup works fine.

If I mkfs.btrfs sdb2, btrfs device add /dev/sdb2 /, and then try to reboot with that, I get the usual kernel panic message saying it can't find unknown-block(8,2) along with the list of known devices, but that already includes all of the ones above. I've tried adding "rootflags=device=/dev/sda2,device=/dev/sdb2 rootdelay=5" but it didn't help at all. Any idea?

----------

## derk

you should not be mounting dev/sdb2 over top of  /dev/sda2 .. not two root disks only one .. if you want a raid set-up use proper hardware set-up or a software raid .. which will need an initrd/initramfs of some sort .. look on wiki or search for help .. never done this myself

----------

## Ant P.

 *derk wrote:*   

> you should not be mounting dev/sdb2 over top of  /dev/sda2 .. not two root disks only one .. if you want a raid set-up use proper hardware set-up or a software raid .. which will need an initrd/initramfs of some sort .. look on wiki or search for help .. never done this myself

 

Maybe I didn't make myself clear enough. This is a Btrfs multidevice volume in RAID1 profile. It mounts perfectly fine from the server's debian netboot recovery environment, but not at boot. I've looked at about 50 different wikis already, nothing says this shouldn't work.

----------

## derk

so you are not using mdadm ?   with mount points /dev/md0, /dev/md1 etc  .. what is the debian environment using to manage the disk arrays?  does it actually load mdadm? or something else?

I have used a server with hardware raid ..under mandrake in the past and it still used mdadm in the boot process with an initrd ..to manage the device .. boot was on the raid device itself .. not sure about the formatting as I did not set it up. was pre-ext4 ..

----------

## derk

have you seen this wiki:

https://wiki.archlinux.org/index.php/GRUB#RAID

FYI

----------

## Ant P.

 *derk wrote:*   

> so you are not using mdadm ?   with mount points /dev/md0, /dev/md1 etc  ..

 

No.

 *Quote:*   

> what is the debian environment using to manage the disk arrays?  does it actually load mdadm? or something else?

 

ahci and btrfs. I can `mount /dev/sda2 /mnt/gentoo` and it'll DWIM. `btrfs fi show /mnt/gentoo` shows both disks active when they are.

The Gentoo kernel can also interact fine with the 2-disk volume while it's running. Just not mount it as root in that state.

I've tried both GRUB2 and LILO with various combinations of root= device/UUIDs, none worked.

----------

## derk

adjust as per example::  rootflags=device=/dev/sda2,device=/dev/sdb2  rootfstype=btrfs 

as per: https://wiki.gentoo.org/wiki/Btrfs/System_Root_Guidej

----------

## brepro

Error checking has been added into the patched code segment (int err = create_dev ...)

  New patch:

```

gentoo - /etc/portage/patches/sys-kernel/gentoo-sources# cat earlydevtmpfs.patch 

--- init/do_mounts.c.orig       2016-04-25 14:27:42.264114898 +1000

+++ init/do_mounts.c    2016-04-25 14:28:11.443822879 +1000

@@ -542,6 +542,7 @@

                int err = create_dev(saved_root_name, ROOT_DEV);

                        if (err < 0)

                                pr_emerg("Failed to create %s: %d\n", saved_root_name, err);

+                       devtmpfs_mount("dev");

                        mount_block_root(saved_root_name, root_mountflags);

                } else {

                        int err = create_dev("/dev/root", ROOT_DEV);

```

----------

## Chiitoo

Merged 7 posts before the one above this one, as the topic seems to be about the same issue (and as was requested by Ant P.).

----------

