# md devices being assembled before multipath [SOLVED]

## wildbug

I've recently installed a SAS2 disk array, and I'm having some issues bringing it up (correctly) on boot.

The root filesystem is on a RAID1 (motherboard SATA).  The new storage array is in a JBOD attached to two LSI HBAs.  multipath is used to map the two paths to one device, those multipath devices are assembled into four 9-disk RAID6 volumes which are part of an LVM2 volume group.  A single logical volume exists in this volume group and is formatted with XFS.

This works fine when assembled manually but doesn't come up correctly when rebooting.  The problem is that the md devices start before the multipath devices are created; AFAICT multipath fails because the devices are already in use by md.

I've turned off md autodetect in the kernel, and I edited the "before" line in /etc/init.d/multipath to include mdraid.  The root device is assembled with a "md=127,/dev/sda1,/dev/sdb1" kernel parameter in grub.conf.

You can see that the md devices are already assembled when /etc/init.d/mdraid is run.  Here's an excerpt from /var/log/rc.log:

```
rc boot logging started at Fri Jul 22 19:50:01 2011

 * Setting system clock using the hardware clock [UTC] ... [ ok ]

 * Loading module dm_multipath ... [ ok ]

 * Autoloaded 1 module(s)

 * Activating Multipath devices ... [ ok ]

 * Starting up RAID devices ...

mdadm: /dev/md10 is already in use.

mdadm: /dev/md11 is already in use.

mdadm: /dev/md12 is already in use.

mdadm: /dev/md/126 is already in use.

 [ !! ]

 * Setting up the Logical Volume Manager ... [ ok ]

 * Checking local filesystems  ...

/sbin/fsck.xfs: XFS file system.

/sbin/fsck.xfs: UUID=a5c7abf9-d2bc-4d30-bf08-df08215c48c1 does not exist

/sbin/fsck.xfs: XFS file system.

 * Operational error

 [ !! ]

(...continues)
```

In dmesg I can see "md: bind<sd*>" lines interspersed between the SCSI discoveries.  Excerpt:

```
[   17.230502] scsi 6:0:26:0: Direct-Access     SEAGATE  ST32000444SS     0006 PQ: 0 ANSI: 5

[   17.230508] scsi 6:0:26:0: SSP: handle(0x0025), sas_addr(0x5000c50033f6f6ce), phy(14), device_name(0x00c50050cef6f633)

[   17.230511] scsi 6:0:26:0: SSP: enclosure_logical_id(0x500304800000007f), slot(6)

[   17.230515] scsi 6:0:26:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(6), cmd_que(1)

[   17.231283] sd 6:0:23:0: [sdax] Attached SCSI disk

[   17.231454] sd 6:0:25:0: [sdaz] Write cache: enabled, read cache: enabled, supports DPO and FUA

[   17.232243]  sday: unknown partition table

[   17.232835] sd 6:0:26:0: [sdba] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)

[   17.233159] sd 6:0:26:0: Attached scsi generic sg54 type 0

[   17.234782] sd 6:0:26:0: [sdba] Write Protect is off

[   17.234785] sd 6:0:26:0: [sdba] Mode Sense: d7 00 10 08

[   17.235817] md: bind<sdi>

```

Later in the dmesg output md assembles its devices despite having autoassemble turned off both in the kernel and on the kernel boot line.  What gives?Last edited by wildbug on Fri Mar 01, 2013 5:39 pm; edited 1 time in total

----------

## wildbug

udev's doing this, isn't it?

```
# rc-update show sysinit

                devfs | sysinit

                dmesg | sysinit

                 udev | sysinit

# find /lib64/udev -name "*md*"

/lib64/udev/rules.d/64-md-raid.rules
```

----------

## NeddySeagoon

wildbug,

Kernel auto assembly works with raid superblocks version 0.90 only. Thats not been the default for about 6 months now.

That change has caused a lot of people who were expecting raid auto assembly to just work issues.

What version raid suberblocsl do you have ?

Try 

```
$ sudo /sbin/mdadm -E /dev/sda1

Password: 

/dev/sda1:

          Magic : a92b4efc

        Version : 0.90.00

           UUID : 9392926d:64086e7a:86638283:4138a597

  Creation Time : Sat Apr 11 16:34:40 2009

     Raid Level : raid1

{snip]
```

on one of the volumes that you donate to raid. I would expect your version to be 1.2, in which case your raid sets must be being assembled by mdadm somewhere.

Kernel auto assembly looks like this in dmesg

```
[    2.380529] md: Waiting for all devices to be available before autodetect

[    2.380874] md: If you don't use raid, use raid=noautodetect

[    2.381485] md: Autodetecting RAID arrays.

[    2.503208] md: Scanned 12 and added 12 devices.

[    2.504567] md: autorun ...

[snip]
```

----------

## wildbug

Neddy, thanks for the reply, but despite my erroneous reply (and subsequent retraction) in the other thread, I realize that autoassemble only works for 0.90 superblocks.  In fact it was my experience described above that made me think that autoassemble was working despite non-0.90 superblocks.  However, the root device DOES have v0.90 superblocks and has been autoassembling correctly for months now.  But with the recent addition of devices that require multipath to be active before md, I've intentionally turned off autodetect and just added a kernel parameter to assemble the root device (as detailed in my OP).

I'm not trying to get arrays to autoassemble; I'm trying to STOP them from being assembled before multipath is activated.  I turned off raid autodetect in the kernel (and redundant "raid=noautodetect", just in case); non-root arrays are being assembled between kernel boot and the boot runlevel, which is why I'm now wondering if udev is responsible.

(For the record, my root device is 0.90 and the other md devices are a mixture of 1.1 and 1.2.  Not that it's relevant.)

Here's the root device being correctly assembled (from dmesg):

```
[    6.797329] md: Skipping autodetection of RAID arrays. (raid=autodetect will force)

[    6.797975] md: Loading md127: /dev/sda1

[    6.798894] md: bind<sda1>

[    6.799383] md: bind<sdb1>

[    6.800027] bio: create slab <bio-1> at 1

[    6.800490] md/raid1:md127: active with 2 out of 2 mirrors

[    6.800847] md127: detected capacity change from 0 to 224063848448

[    6.801501]  md127: unknown partition table

[    6.803088]  md127: unknown partition table

[    6.829729] XFS (md127): Mounting Filesystem

[    6.845067] usb 3-1: new low speed USB device number 2 using ohci_hcd

[    6.859377] XFS (md127): Ending clean mount

[    6.859741] VFS: Mounted root (xfs filesystem) readonly on device 9:127.

```

----------

## NeddySeagoon

wildbug,

We have established that its not the kernel doing auto assemble of your raid 1.1/1.2 raid sets, which is a step in the right direction.

What does 

```
rc-update show
```

produce ?

mdadm should not be listed or it will be started in its sequence by the start up scripts.

Just because a service is not listed in rc-update show does not mean it is not running.

----------

## wildbug

```
# rc-update show

             bootmisc | boot                                          

          consolefont | boot                                          

           consolekit |      default                                  

                cupsd |      default                                  

                 dbus |      default                                  

                devfs |                                        sysinit

        device-mapper | boot                                          

                dmesg |                                        sysinit

                 fsck | boot                                          

             hostname | boot                                          

              hwclock | boot                                          

              keymaps | boot                                          

            killprocs |                        shutdown               

                local |      default nonetwork                        

           localmount | boot                                          

                  lvm | boot                                          

                mdadm |      default                                  

               mdraid | boot                                          

              modules | boot                                          

             mount-ro |                        shutdown               

                 mtab | boot                                          

            multipath | boot                                          

           multipathd |      default                                  

             net.eth0 |      default                                  

             net.eth1 |      default                                  

               net.lo | boot                                          

             netmount |      default                                  

                  nfs |      default                                  

             nfsmount |      default                                  

                 ntpd | boot                                          

              pbs_mom |      default                                  

            pbs_sched |      default                                  

           pbs_server |      default                                  

               procfs | boot                                          

                 root | boot                                          

            savecache |                        shutdown               

                 sshd |      default                                  

                 swap | boot                                          

               sysctl | boot                                          

         termencoding | boot                                          

                 udev |                                        sysinit

       udev-postmount |      default                                  

                 upsd |      default                                  

               upsdrv |      default                                  

               upsmon |      default                                  

              urandom | boot                                          

                  vgl |      default                                  

                  xdm |      default                                  

               xinetd |      default
```

FYI, /etc/init.d/mdadm is the monitoring daemon; there is no assembly.  /etc/init.d/mdraid is the one containing "mdadm -As".

But look at my OP again, specifically the first code block.  That's output from the rc_logger for the boot runlevel.  You can see the order of services being executed -- hwclock, modules, multipath, mdraid, etc.  When it hits mdraid, it declares that the md devices are already assembled.  That means that something is putting them together AFTER kernel autodetect and BEFORE mdraid.

When I first posted this, I didn't realize there was a sysinit runlevel before boot.  There are three services in sysinit -- devfs, dmesg, and udev.  It's possible that udev or devfs is triggering the md devices (which is what I was getting at in my second post).  I'm not intimately familiar with either of those.

----------

## wildbug

Here's my complete dmesg:  http://pastebin.com/raw.php?i=QGq7wit4

Here's an excerpt of the array assembly timeline:

```
[    6.778542] md/raid1:md127: active with 2 out of 2 mirrors

[    7.618775] udev[2822]: starting version 164

[    7.772809] md/raid1:md126: active with 2 out of 2 mirrors

[    8.282715] md/raid:md10: raid level 6 active with 9 out of 9 devices, algorithm 2

[    8.401377] md/raid:md11: raid level 6 active with 9 out of 9 devices, algorithm 2

[   17.420690] md/raid:md125: raid level 5 active with 6 out of 6 devices, algorithm 2

[   17.845425] md/raid:md12: raid level 6 active with 9 out of 9 devices, algorithm 2

[   18.108825] md/raid:md13: raid level 6 active with 9 out of 9 devices, algorithm 2

[   20.816985] device-mapper: multipath: version 1.3.0 loaded

[   21.021549] device-mapper: table: 253:0: multipath: error getting device
```

----------

## wildbug

Yep, udev is the culprit.  The rule /lib/udev/rules.d/64-md-raid.rules (supplied by sys-fs/mdadm) calls "mdadm --incremental" on the device.  If I remove that file, md arrays are not automatically assembled.

Now I have to figure out how to fix this in an upgrade-friendly way.  I'd like to make that rule ignore disks attached via the HBAs.  Could this be possible by creating a custom rule in /etc/udev/rules.d and without deleting/editing the "official" /lib/udev/rules.d/64-md-raid.rules?

----------

## NeddySeagoon

wildbug,

If you create a rule in a file with a lower number than /lib/udev/rules.d/64-md-raid.rules ?  /not etc?... ?

Say 03-md-raid.rules, that does nothing, it will be run before  64-md-raid.rules and will not be affected by updates either.

It must match the same thing(s) as in 64-md-raid.rules.

udev will trigger, execute your rule that does nothing, then you have full manual control.

----------

## wildbug

I think I've solved this.  I can't reboot to test right now as I currently have users running some long simulations, but udevadm test seems to produce correct results.  I'll mark the thread as solved once I can reboot and confirm.

This is what I did:

The server in question has two LSI 9200-8e HBAs and an onboard SAS controller with the same chipset (LSI2008).  As I only want to include devices connected to the HBAs, I used "udevadm info" to find differences between the device trees of drives attached to the motherboard and the HBAs.  At one level I found differences -- ATTRS{subsystem_vendor} and ATTRS{subsystem_device}.  I could now identify the correct devices.

The next part was overriding the array assembly.  I finally realized that there was only one line in 64-md-raid.rules that I had to circumvent:

```
ENV{ID_FS_TYPE}=="linux_raid_member", ACTION=="add", RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
```

If either ENV{ID_FS_TYPE} or ACTION fail to match, then mdadm isn't executed.  ENV variables are settable, so I created my own rule to unset ENV{ID_FS_TYPE} just before 64-md-raid.rules is consulted.  It also has to be after /lib/udev/rules.d/60-persistent-storage.rules as this is where that variable is originally set (as identified from udevadm test output).

```

# /etc/udev/rules.d/63-lsi-9200-8e.rules

KERNEL=="sd*", ACTION=="add", DRIVERS=="mpt2sas", ATTRS{subsystem_vendor}=="0x1000", ATTRS{subsystem_device}=="0x3080", ENV{ID_FS_TYPE}=""
```

Using udevadm test on devices not attached to the HBAs, I can see the "mdadm --incremental" line; HBA-attached devices no longer include it.  That should mean success.   :Smile: 

----------

## dmitryilyin

You'll have more luck with better server distribution (assuming you are working with server) Debian or RedHat like.

They use advanced initramfs and much better suited for production servers.

Gentoo is good for learning linux, development and experimenting)

----------

## wildbug

So I finally got around to rebooting...    :Smile: 

I think I have this sorted out.  Custom udev rules are not necessary.

What happens is that udev runs in the sysinit bootlevel; /lib/udev/rules.d/64-md-raid.rules is part of this process, and it uses mdadm in incremental mode to attempt to automatically assemble RAID devices as components are discovered.  However, it does respect /etc/mdadm.conf, so that file can be used to control behavior during this step.  Whitelisting devices using a DEVICE line wasn't sufficent; it seems only ARRAY lines are affected by this.  An AUTO line set to blacklist all arrays in conjunction with selectively whitelisting arrays in ARRAY lines with "auto=yes" will work.  Setting "devices=/dev/mapper/*" in the ARRAY lines was also necessary.

/etc/mdadm.conf

```
AUTO -all

DEVICE /dev/sd[ab]*

DEVICE /dev/mapper/*

ARRAY /dev/md10 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx devices=/dev/mapper/* spares=1 spare-group=mp_spares

ARRAY /dev/md11 UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx devices=/dev/mapper/* spares=1 spare-group=mp_spares

ARRAY /dev/md/root UUID=xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxxxxxx auto=yes
```

The dm-multipath kernel module must be loaded (if you've built it as a module) or /etc/init.d/multipath will fail to start.

/etc/conf.d/modules

```
modules="dm_multipath"
```

And mdraid, which will be used to bring up the arrays listed in mdadm.conf, needs to be started after multipath.

/etc/conf.d/mdraid

```
rc_need="multipath"
```

----------

