# raid not starting up not consistant in boots [solved]

## DaggyStyle

hello all, I have a strange issue, I've reinstalled my system and for some reason my raid 5 doesn't starts up ok every boot.

my setup is as follows:

/boot on raid1 /dev/md1 (200mb) composed of /dev/sd[a-d]1

/ on raid1 /dev/md2 (2Gb) composed of /dev/sd[a-d]2

/var, /opt and /usr among other things on raid5 /dev/md3 (200mb) composed of /dev/sd[a-d]3

for some reasons the raid5 doesn't boots in sum boots, here are some outputs:

dmesg from defunct boot: http://bpaste.net/show/93684/

dmesg from good boot: http://bpaste.net/show/93687/

mdadm --examine: http://bpaste.net/show/93685/

I'm not using initramfs althoguh the raid5 has metadata 1.2, afaik, because my root is metadata 0.9, autoraid detection works and raid1 and 5 are built in to the kernel.

that is how I worked before the reinstall and it didn't caused issue.

I'm using kernel 3.8.8 and my config can be found here: http://bpaste.net/show/93692/

any hints on why this is happening?

thanks.

----------

## NeddySeagoon

DaggyStyle,

Both dmesgs contain

```
[    2.487007] md: Autodetecting RAID arrays.

[    2.517127] md: invalid raid superblock magic on sda3

[    2.517163] md: sda3 does not have a valid v0.90 superblock, not importing!

[    2.521342] usb 1-1.6.1: new high-speed USB device number 9 using ehci-pci

[    2.548294] md: invalid raid superblock magic on sdb3

[    2.548330] md: sdb3 does not have a valid v0.90 superblock, not importing!

[    2.581454] md: invalid raid superblock magic on sdc3

[    2.581490] md: sdc3 does not have a valid v0.90 superblock, not importing!

[    2.615020] md: invalid raid superblock magic on sdd3

[    2.615056] md: sdd3 does not have a valid v0.90 superblock, not importing!
```

The one that work s also has 

```
[    3.777733] md: bind<sdd3>

[    3.785956] md: bind<sdb3>

[    3.804730] md: bind<sdc3>

[    3.854603] md: bind<sda3>

[    3.872058] md/raid:md3: device sda3 operational as raid disk 0

[    3.872060] md/raid:md3: device sdc3 operational as raid disk 2

[    3.872061] md/raid:md3: device sdb3 operational as raid disk 1

[    3.872061] md/raid:md3: device sdd3 operational as raid disk 3

[    3.872211] md/raid:md3: allocated 4314kB

[    3.872225] md/raid:md3: raid level 5 active with 4 out of 4 devices, algorithm 2
```

Your 'everything else'  md3 always fails to auto assemble, which is because 

```
/dev/sda3:

          Magic : a92b4efc

        Version : 1.2
```

it has a raid version 1.2 superblock.

From this I surmise that mdadm sometimes assembles md3 ans sometimes not.  A separate /usr not mounted before udev starts is no supported, are you using udev?

Changing the raid superblock version demands that you backup all the data, destroy the raid, fix it then restore your data.

Its time to learn to roll your own initrd.

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,

 

Hello NeddySeagoon,

I've assumed you'll be the one answering me  :Smile: 

 *NeddySeagoon wrote:*   

> 
> 
> Both dmesgs contain
> 
> ```
> ...

 

yes, I'm using latest udev, I was using udev-174 (pre whole LP gone mad debacle) and with a almost identical setup (boot, root on raid1, 0.9 and rest on 1.2 raid5) and all worked well, at one point I've decided to check what is the status of separate /usr support on newer udevs, ssuominen answered me that it is supported, so I took the plunge and the system booted without a problem.

up until the new reinstall, I've never encountered scenario in which the raid5 didn't got assemble. if it isn't supported, why the auto assemble is in consistent? *NeddySeagoon wrote:*   

> 
> 
> Changing the raid superblock version demands that you backup all the data, destroy the raid, fix it then restore your data.
> 
> 

 

I've assumed that.

 *NeddySeagoon wrote:*   

> Its time to learn to roll your own initrd.

 

I usually refrain from using initrd, the two main reasons is the creation of it (not sure how complicated it) and the memory issue, I'm not willing to use initramfs because it stays in the memory post boot and this memory block cannot be used anymore. not sure if initrd is the same.

moreover, from what I understand initrd is deprecated and is due to be replaced by initramfs later on which is (again) something I'm not willing to do.

can you shed some more light on this?

Thanks.

----------

## NeddySeagoon

DaggyStyle,

You appear to have a race condition.  When md3 is not started when the attempt to mount /usr happens, your boot fails.

As md3 cannot be auto assembled, it must be assembled by a call to mdadm during the boot process.

In which runlevel is mdadm ?

If its in default, move it to boot and see what happens.

Making an initrd for your case is fairly easy.

You need two files.

The first is a list of things to include in the initrd, the second is the init script to go in the initrd.

Then you feed the file list to a script that is provided in the kernel.

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> You appear to have a race condition.  When md3 is not started when the attempt to mount /usr happens, your boot fails.
> 
> As md3 cannot be auto assembled, it must be assembled by a call to mdadm during the boot process.
> ...

 

it is in boot level

 *NeddySeagoon wrote:*   

> 
> 
> Making an initrd for your case is fairly easy.
> 
> You need two files.
> ...

 

don't know what I need to include all I want it to assemble my raids and than continue to boot as normal, maybe add a rescue mode but no need to boot any special modules, the kernel already loads all.

----------

## NeddySeagoon

DaggyStyle,

Heres my  /root/initrd directory contents. Its just two files init and initramfs_list.

initramfs_list contains the list of files to be included in the initrd.

```
# directory structure

dir /proc       755 0 0

dir /usr        755 0 0

dir /bin        755 0 0

dir /sys        755 0 0

dir /var        755 0 0

#dir /lib        755 0 0

dir /lib64      755 0 0

dir /sbin       755 0 0

dir /mnt        755 0 0

dir /mnt/root   755 0 0

dir /etc        755 0 0

dir /root       700 0 0

dir /dev        755 0 0

# busybox

file /bin/busybox /bin/busybox  755 0 0

# for raid on lvm

file /sbin/mdadm                /sbin/mdadm              755 0 0 

file /sbin/lvm.static           /sbin/lvm.static         755 0 0 

# libraries required by /sbin/fsck.ext4 and /sbin/fsck

slink   /lib                            /lib64                          777 0 0

file    /lib64/ld-linux-x86-64.so.2     /lib64/ld-linux-x86-64.so.2     755 0 0

file    /lib64/libext2fs.so.2           /lib64/libext2fs.so.2           755 0 0

file    /lib64/libcom_err.so.2          /lib64/libcom_err.so.2          755 0 0

file    /lib64/libpthread.so.0          /lib64/libpthread.so.0          755 0 0

file    /lib64/libblkid.so.1            /lib64/libblkid.so.1            755 0 0

file    /lib64/libuuid.so.1             /lib64/libuuid.so.1             755 0 0

file    /lib64/libe2p.so.2              /lib64/libe2p.so.2              755 0 0

file    /lib64/libc.so.6                /lib64/libc.so.6                755 0 0

file    /lib64/libmount.so.1            /lib64/libmount.so.1            755 0 0

file    /sbin/fsck              /sbin/fsck                      755 0 0

file    /sbin/fsck.ext4         /sbin/fsck.ext4                 755 0 0

# our init script

file    /init                   /root/initrd/init               755 0 0
```

Trap for the unwary, a few packages need the static USe flag.  This is a bad thing to set globally.

```
# static bits and pieces for an initrd

sys-fs/lvm2 static

sys-fs/mdadm static

sys-apps/busybox static
```

The init script contains

```
#!/bin/busybox sh

rescue_shell() {

    echo "$@"

    echo "Something went wrong. Dropping you to a shell."

    /bin/busybox --install -s

    exec /bin/sh

}

# allow the use of UUIDs or filesystem lables

uuidlabel_root() {

    for cmd in $(cat /proc/cmdline) ; do

        case $cmd in

        root=*)

            type=$(echo $cmd | cut -d= -f2)

            echo "Mounting rootfs"

            if [ $type == "LABEL" ] || [ $type == "UUID" ] ; then

                uuid=$(echo $cmd | cut -d= -f3)

                mount -o ro $(findfs "$type"="$uuid") /mnt/root

            else

                mount -o ro $(echo $cmd | cut -d= -f2) /mnt/root

            fi

            ;;

        esac

    done

}

check_filesystem() {

    # most of code coming from /etc/init.d/fsck

    local fsck_opts= check_extra= RC_UNAME=$(uname -s)

    # FIXME : get_bootparam forcefsck

    if [ -e /forcefsck ]; then

        fsck_opts="$fsck_opts -f"

        check_extra="(check forced)"

    fi

    echo "Checking local filesystem $check_extra : $1"

    if [ "$RC_UNAME" = Linux ]; then

        fsck_opts="$fsck_opts -C0 -T"

    fi

    trap : INT QUIT

    # using our own fsck, not the builtin one from busybox

    /sbin/fsck -p $fsck_opts $1

    ret_val=$?

    case $ret_val in

        0)      return 0;;

        1)      echo "Filesystem repaired"; return 0;;

        2|3)    if [ "$RC_UNAME" = Linux ]; then

                        echo "Filesystem repaired, but reboot needed"

                        reboot -f

                else

                        rescue_shell "Filesystem still have errors; manual fsck required"

                fi;;

        4)      if [ "$RC_UNAME" = Linux ]; then

                        rescue_shell "Fileystem errors left uncorrected, aborting"

                else

                        echo "Filesystem repaired, but reboot needed"

                        reboot

                fi;;

        8)      echo "Operational error"; return 0;;

        16)     echo "Use or Syntax Error"; return 16;;

        32)     echo "fsck interrupted";;

        127)    echo "Shared Library Error"; sleep 20; return 0;;

        *)      echo $ret_val; echo "Some random fsck error - continuing anyway"; sleep 20; return 0;;

    esac

# rescue_shell can't find tty so its broken

    rescue_shell

}

# start for real here

# temporarily mount proc and sys

mount -t proc none /proc

mount -t sysfs none /sys

mount -t devtmpfs none /dev

# disable kernel messages from popping onto the screen

###echo 0 > /proc/sys/kernel/printk

# clear the screen

###clear

# assemble the raid set(s) - they got renumbered from md1, md5 and md6

# not needed on SSD but we may want to maintain it

# /boot

/sbin/mdadm --assemble /dev/md125 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

# don't care if /boot fails to assemble

# not needed on SSD

# /  (root)  I wimped out of root on lvm for this box

/sbin/mdadm --assemble /dev/md126 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sdd5 || rescue_shell

# if root won't assemble, we are stuck

# LVM for everything else

# /home and everything portge related

/sbin/mdadm --assemble /dev/md127 /dev/sda6 /dev/sdb6 /dev/sdc6 /dev/sdd6 || rescue_shell

# and if the LVM space won't assemble there is no /usr or /var so we are really in a mess

# TODO could auto cope with degraded raid operation

# lvm runs as whatever its called as

ln -s /sbin/lvm.static /sbin/vgchange

# everything on the SDD

/sbin/vgchange -ay ssd || rescue_shell

# start the vg volume group - /home and everything for portage - need not die here

/sbin/vgchange -ay vg || rescue_shell

# if this failed we have no /usr or /var

# get here with raid sets assembled and logical volumes available

# mounting rootfs on /mnt/root

uuidlabel_root || rescue_shell "Error with uuidlabel_root"

# space separated list of mountpoints that ...

mountpoints="/usr /var"

# ... we want to find in /etc/fstab ...

ln -s /mnt/root/etc/fstab /etc/fstab

# ... to check filesystems and mount our devices.

for m in $mountpoints ; do

#echo $m

    check_filesystem $m

    echo "Mounting $m"

    # mount the device and ...

    mount $m || rescue_shell "Error while mounting $m"

    # ... move the tree to its final location

    mount --move $m "/mnt/root"$m || rescue_shell "Error while moving $m"

done

echo "All done. Switching to real root."

# clean up. The init process will remount proc sys and dev later

umount /proc

umount /sys

umount /dev

# switch to the real root and execute init

exec switch_root /mnt/root /sbin/init
```

I have root on LVM, a separate /usr and /var, so you can trim it down quite a bit.

The hard bit is running 

```
scripts/gen_initramfs_list.sh 
```

in the kernel tree.

The good thing about this initrd, is that its kernel agnostic - there are no kernel modules so it rarely needs updating.

This is from a hew install but it has its origins in a 2009 initrd which still works.

----------

## DaggyStyle

Thanks for the info.

three small question,

I assume that there is a pkg that creates the initrd, is it mkinitrd?

all I need is to assemble the raid, can I trim down the init script to do that (skip the check and mounts)?

I see that I need static mdadm, is it ok to hand compile is outside of the tree and copy the resulting and the config to the initrd?

----------

## NeddySeagoon

DaggyStyle,

1) Its  *NeddySeagoon wrote:*   

> The hard bit is running
> 
> ```
> scripts/gen_initramfs_list.sh
> ```
> ...

 

2) With the ongoing /usr merge, its only a matter of time until you are forced to mount /usr in the initrd.

Yo may as will do that now. You will also need to fsck /usr before you mount it.

3) Not really.  The script that builds the initrd copies the system mdadm and busybox.

Put

```
sys-fs/mdadm static

sys-apps/busybox static
```

in your package.use and have those two packages static system wide.

If you do anything else, you will forget next time you build the initrd, which may be a long time away.

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> 1) Its  *NeddySeagoon wrote:*   The hard bit is running
> 
> ```
> ...

 

I see, btw, the output is initrd or initramfs?

 *NeddySeagoon wrote:*   

> 
> 
> 2) With the ongoing /usr merge, its only a matter of time until you are forced to mount /usr in the initrd.
> 
> Yo may as will do that now. You will also need to fsck /usr before you mount it.

 

ongoing /usr merge? can you please elaborate more?

 *NeddySeagoon wrote:*   

> 
> 
> 3) Not really.  The script that builds the initrd copies the system mdadm and busybox.
> 
> Put
> ...

 

actually I don't fancy building busybox and mdadm as static as part of the tree, that is why I'm thinking of compiling them by hand out of the tree.

----------

## NeddySeagoon

DaggyStyle,

Code that was in /, /lib, /sbin and others is slowly being moved to the same places but in /usr.

There will be little left on /

udev somewhat unfairly got the blame for the breakage this causes when the devs dropped support for retrying things that failed because /usr was not yet mounted.

The real cause was the move of code from / to /usr, whih is nothing to do with udev.  udev had been papering over the cracks, then stopped.

If a separate /usr works for you now, its only a matter of time until it fails. 

The script builds an initramfs, its a cpio archive.

busybox should be static if its installed at all.  Its a rescue toolkit.  If your dynamic linking or any of its libs are broken, it won't work as  rescue shell.

You only need build the initrd stuff with USE=static long enough to make the initrd.  You can revert it after.

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> Code that was in /, /lib, /sbin and others is slowly being moved to the same places but in /usr.
> 
> There will be little left on /
> ...

 

I understand, your script does it for all partitions, right?

will doing it to root and usr is enough?

 *NeddySeagoon wrote:*   

> The script builds an initramfs, its a cpio archive.

 

mmm no good, I need initrd, lets assume the following, I want to detect the raids, have rescue shell, root is reiserfs and usr is ext4.

I need busybox, mdadm, ext4.fchk and reiserfs check fs all static.

as initramfs doesn't release the memory it takes, I'm loosing memory and I'm not willing to do that.

for me this is a bloatware that I'm not supporting, initrd will load the stuff I need and release the memory later on.

how can I generate an initrd?

 *NeddySeagoon wrote:*   

> busybox should be static if its installed at all.  Its a rescue toolkit.  If your dynamic linking or any of its libs are broken, it won't work as  rescue shell.
> 
> You only need build the initrd stuff with USE=static long enough to make the initrd.  You can revert it after.

 

as said, I'd rather builds all the tools statically out of the tree (not install them anywhere, just compile) and copy it into the initrd

Thanks.

----------

## NeddySeagoon

DaggyStyle,

Root is checked after its mounted read only, since fsck, is stored on root and with no initrd, there is no way to check root until its mounted.

That does not change when you use and initrd, so you only need fsck for /usr

If you want to build the static stuff out of the tree, you may do so.  Change the corresponding entries in initramfs_list to point to your static programs. 

You can make an initrd if you wish too.  Its an ext2 root filesystem in a file that is gzipped before being moved to /boot.

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> Root is checked after its mounted read only, since fsck, is stored on root and with no initrd, there is no way to check root until its mounted.
> 
> That does not change when you use and initrd, so you only need fsck for /usr
> ...

 

ok, what is the process of creating the initrd? I assume that creating the img, create the fs, mount it, create a root like structure, copy the files like you did and then?

Thanks.

----------

## NeddySeagoon

DaggyStyle,

Either make the initrd in an empty file formatted with ext2, or make it in a spare ext2 formatted partiton.

Put all the files there in the right directory structure, with the right owners, groups and permissions, they are all listed in the _list file.

gzip the file and put the gzip version into /boot so you use your bootloader to load it.

Make sure your kernel has initrd and ext2 support.  ext2 must be build in.

Tell the kernel that root=/dev/ram0 real_root=/dev/md...

Cross your fingers and reboot.  Its not that bad.

As your kernel have everything in it to boot but not reliably, its easy enough to have another go without booting a live distro.

----------

## DaggyStyle

ok, thanks, btw, is it possible to run rm -rf on all unnecessary file before I run switchroot if I use initramfs?

----------

## NeddySeagoon

DaggyStyle,

Probably - I've not tried.

----------

## NeddySeagoon

DaggyStyle,

Probably - I've not tried.

----------

## DaggyStyle

my problem with initramfs was the amount of memory that the initramfs takes as it says in the memory, if I can remove all rest of the files, I might be able to live with it.

----------

## NeddySeagoon

DaggyStyle,

I'm not sure if rm the files frees the RAM or not.

-- edit --

Interesting exercise ... read the initramfs_list file, determine the size of the initrd ext2 filesystem

make the initrd filesystem file, format it as ext2 copy everything over and gzip it to a destination file of your choice.

Late thought. mdadm and busybox need not be static as the dynamic loader is present in the initramfs for mount and fsck.

You would just need to add the libs identified by lddtree busybox and lddtree mdadm

----------

## Goverp

Assuming you're using busybox's switch_root, it already does rm -f on the initramfs before the switch.

----------

## NeddySeagoon

Goverp.

```
exec switch_root /mnt/root /sbin/init
```

so thats a yes.

----------

## DaggyStyle

I'm working on this, btw, how can I add disk/by-label support to the initramfs?

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> I'm not sure if rm the files frees the RAM or not.
> 
> -- edit --
> ...

 

initramfs size if ~140mb I keeping 140 mb in memory just of boot is a bad practice imho.

----------

## NeddySeagoon

DaggyStyle,

The script I posted supports disk by lable and UUIDs.

140Mb for an initrd ?

That would be huge

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> The script I posted supports disk by lable and UUIDs.
> 
> 140Mb for an initrd ?
> ...

 

140 unpacked.

well I don't see the partitions of the raid 5 and I don't have /dev/disk/by-label so I've got an error there, any idea what I'm missing? I've took your script.

----------

## NeddySeagoon

DaggyStyle,

Busybox should have dropped you to a shell.

Can you use mdadm -a to start your raid sets?

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> Busybox should have dropped you to a shell.
> 
> Can you use mdadm -a to start your raid sets?

 

it dropped to shell.

in order to get the raids up, I'm copying mdadm.conf to the initramfs and run mdadm --assamble --scan.

the raids are up after than but I don't see the partitions in my raid5 (usr is one of them), if I skip the usr mount  the system boots ok.

I'd still want to future proof the /usr mount.

----------

## NeddySeagoon

DaggyStyle,

You mean /dev/mdXpY, as in the md device is partitioned? 

I've never done that.  Its a fairly new feature, so I have always used LVM on raid to achieve the same thing.

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> You mean /dev/mdXpY, as in the md device is partitioned? 
> 
> I've never done that.  Its a fairly new feature, so I have always used LVM on raid to achieve the same thing.

 

yup, thats one, anyway, I don't have even labels for the lvm volumes or the non partitioned raids.

----------

## Goverp

 *DaggyStyle wrote:*   

> ...
> 
> the raids are up after than but I don't see the partitions in my raid5 (usr is one of them), if I skip the usr mount  the system boots ok.
> 
> ....

 

I've used partitions in my RAID 5 array for a couple of years and hit the same situation when I set up my initramfs.  Follow the mdadm --assemble--scan with mdadm --detail --scan and the components appear.

My guess is a partitioned RAID array is much cheaper to run than separate arrays for each partition.  In my case, I left some space in the array unallocated for future expansion.  By putting some partitions at the low end of the array and some at the high end, it's then possible to enlarge two partitions without affecting the others.

----------

## DaggyStyle

 *Goverp wrote:*   

>  *DaggyStyle wrote:*   ...
> 
> the raids are up after than but I don't see the partitions in my raid5 (usr is one of them), if I skip the usr mount  the system boots ok.
> 
> .... 
> ...

 

will do, hope it helps, still I'd like to have the ability to use labels rather than direct nods

----------

## DaggyStyle

mdadm --detail --scan did the trick and found the partitions within the raid, labels are still mia.

any ideas?

----------

## Goverp

 *DaggyStyle wrote:*   

> mdadm --detail --scan did the trick and found the partitions within the raid, labels are still mia.
> 
> any ideas?

 

Sorry, I'm being thick, but where are you expecting labels to show up? 

FWIW, my initramfs init script uses findfs to convert labels into device names, using the same code as the various initramfs items in the wiki.  But I guess that's not what you're after.

----------

## DaggyStyle

ok, adding the following script to init solved the issue.

```

#!/bin/bash

mkdir -p /dev/disk/by-label

while read partition; do

   DEV="$(echo ${partition} | awk '{print $4}')"

   DEV_PATH="/dev/${DEV}"

   if [ -z "${DEV}" -o ! -e "${DEV_PATH}" ]; then

      continue

   fi

   LABEL=$(blkid ${DEV_PATH} | tr ' ' '\n' | grep "LABEL=" | cut -d= -f2 | sed 's/"//g')

   if [ ! -z "${LABEL}" ]; then

      ln -s ${DEV_PATH} /dev/disk/by-label/${LABEL}

   fi

done < "/proc/partitions"

```

and right before dev unmounts I run

```

rm -rf /dev/disk/by-label
```

----------

