# mdadm-3.2.4 skips many partitions but still starts...!

## Havin_it

Hello,

So I just upgraded to mdadm-3.2.4 and gentoo-sources-3.3.5 at the same time, followed by a reboot. (The dreaded udev-182 happened before the previous reboot, and has been fine, just to eliminate that.) The result was that none of my 3 mdadm/raid5 volumes came up properly. Downgrading to mdadm-3.2.3-r1 (still with the newer kernel) has got things working again, so mdadm definitely seems to be the baddie.

My array layout is best explained with my /etc/mdadm.conf:

```
ARRAY /dev/md0 devices=/dev/sdb1,/dev/sdc1,/dev/sdd1

ARRAY /dev/md1 devices=/dev/sda2,/dev/sdb2,/dev/sdc2,/dev/sdd2

ARRAY /dev/md2 devices=/dev/sdb3,/dev/sdc3,/dev/sdd3
```

Note I don't have DEVICE lines in this file, and kernel autodetection is still enabled; IIRC these were intended as a backup in case the kernel/udev changed the raid device names, which has happened before.

The root partition is on /dev/sda1, not part of any array. Nothing "system" is on the arrays, apart from /var/tmp and /home (in fact these are in luks volumes on one of the arrays, but I digress).

Now I'm back to a working system (that is, including the stuff that depends on the arrays) I need to use it for the next few hours, but I can revert to the broken mdadm later tonight and get any output you request to help with diagnosis.

What I can say (from memory; output's gone out the top of my console buffer now sadly) is that /proc/mdstat was showing only one partition for each raid volume, and was flagged 'inactive'. However mdadm had not failed, and nagios did not see anything to complain about (that seems very wrong...). One obvious indicator that things were wrong was that the partitions on one of the arrays (md1p1 and md1p2, the other arrays are used as raw devices) did not show up in /dev.

Something else that I noticed while troubleshooting, and just seems odd (although it's the same with all combos of new/old kernel/mdadm) is the contents of /dev/disk/by-{part}uuid:

```
hazel linux # ls -l /dev/disk/by-{part,}uuid

/dev/disk/by-partuuid:

total 0

lrwxrwxrwx 1 root root 10 May 14 11:05 34debc60-f880-4808-acba-fd5da4d105f4 -> ../../sda1

lrwxrwxrwx 1 root root 10 May 14 11:05 5d1cf95c-0dd3-4d85-974d-6fb0dc33ede8 -> ../../sdc3

lrwxrwxrwx 1 root root 10 May 14 11:05 6377b145-5947-4c11-b953-3b94348e057c -> ../../sdc2

lrwxrwxrwx 1 root root 10 May 14 11:05 6a266299-683b-44a8-b022-d1605c0044f5 -> ../../sdc1

lrwxrwxrwx 1 root root 10 May 14 11:05 8f5aeb1b-2351-431d-b2f3-ddbd38ce7042 -> ../../sda2

/dev/disk/by-uuid:

total 0

lrwxrwxrwx 1 root root 11 May 14 11:05 1a22285c-aa39-49d6-b75e-65a30aa7ae76 -> ../../md1p2

lrwxrwxrwx 1 root root 10 May 14 11:05 233eec75-460b-4505-9d20-b7ce2a5517fc -> ../../dm-0

lrwxrwxrwx 1 root root 10 May 14 11:05 5b8bee5f-7482-4054-b095-23f0dafe9cf0 -> ../../dm-1

lrwxrwxrwx 1 root root  9 May 14 11:05 8177c540-d573-4b6e-be97-a179b177eda8 -> ../../md2

lrwxrwxrwx 1 root root 10 May 14 11:05 96ca2c9f-76cf-469d-b3e4-a1e65ff04b1e -> ../../dm-2

lrwxrwxrwx 1 root root 11 May 14 11:05 9d1182dc-f309-4ff8-9824-dd4ae7ab7fd6 -> ../../md1p1

lrwxrwxrwx 1 root root 10 May 14 11:05 c1297020-b05c-4656-84b1-e91eba898163 -> ../../sda1
```

Why such a strange cross-section of the devices? These may have always been like this, but I have a funny feeling (I'll check later) that the partitions shown in by-partuuid are the same (and only) ones that showed up in /proc/mdstat.

The drives sd[b,c,d] are identical in hardware and (GPT) partitioning. I achieved this by setting up one drive and copying its partition table to the others. I did give them new GPT labels, but perhaps there's something else I failed to do there, that causes confusion? If so, why did it only become a problem now?

TIA for any ideas on this one. Just let me know what output/config you'd like to see and I'll post it later.

----------

## LordVan

you could try specifying the UUIDs (can be generated quite nicely with mdadm --examine )

----------

## Havin_it

Hi LordVan, thanks for the reply  :Very Happy: 

I'll certainly try this, though won't the missing /dev symlinks be an issue for that?

Just to sort of answer my own query above, I checked all the UUIDs and they seem OK: the array ones are correctly grouped, the device ones are all unique.

```
hazel ~ # for d in b1 c1 d1 a2 b2 c2 d2 b3 c3 d3; do echo sd$d:; mdadm -E /dev/sd$d |grep UUID; done

#1st array

sdb1:

     Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb

    Device UUID : 61249d56:ae319f04:9589ef69:8ac6dc21

sdc1:

     Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb

    Device UUID : d02b97d2:046c7db7:4f01603d:9d81af26

sdd1:

     Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb

    Device UUID : 00a0cc11:6ce72287:d891130c:2a062177

#2nd array

sda2:

     Array UUID : c988ae5c:f643b427:37db5db0:6627531a

    Device UUID : d028712d:d0933028:55d972ff:08c8d432

sdb2:

     Array UUID : c988ae5c:f643b427:37db5db0:6627531a

    Device UUID : 23e2de36:26083619:fb16d0d9:67afda4c

sdc2:

     Array UUID : c988ae5c:f643b427:37db5db0:6627531a

    Device UUID : 5c348138:d89fa82a:a07a15a3:cebcd0da

sdd2:

     Array UUID : c988ae5c:f643b427:37db5db0:6627531a

    Device UUID : bfb91875:2484f14d:262954cd:de178c08

#3rd array

sdb3:

     Array UUID : 8e9c0244:726bfd56:30dbfdde:3181043f

    Device UUID : 7bb1e742:c1c40936:68d95775:d2d770eb

sdc3:

     Array UUID : 8e9c0244:726bfd56:30dbfdde:3181043f

    Device UUID : 2cc1655e:6e47f94b:3efcdd57:2e813ca0

sdd3:

     Array UUID : 8e9c0244:726bfd56:30dbfdde:3181043f

    Device UUID : 79479224:99f98f54:0bde2c88:19ea6721
```

Also, the manpage isn't clear on this: should my ARRAY lines actually work/do anything without DEVICE lines before them?

----------

## LordVan

no clue sorry.

here are the lines i appended to my mdadm.conf (output from mdadm)

```
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=4b8679b7:f94e0498:655a214d:2935de26

ARRAY /dev/md1 level=raid1 num-devices=2 UUID=311983c7:5faad243:7720c9cf:fa81c470

ARRAY /dev/md2 level=raid1 num-devices=2 UUID=569e8bb3:43e94fde:36e678bd:2460358c

ARRAY /dev/md3 level=raid1 num-devices=2 UUID=9b3fab54:844b372c:0e61ae67:ef761534

```

those work for me so you can use them as example

----------

## Havin_it

Thanks, that does guide me a bit. I thought it was all the device UUIDs that went in, but I take it those are just the array UUIDs.

BTW, do you mean the above is output directly from mdadm? If so, what's the full command if I wanted to do likewise?

----------

## LordVan

I looked it up now (and tried it again since I wanted to make sure:

```
mdadm --examine --scan
```

----------

## Havin_it

OK, I changed mdadm conf to:

```
ARRAY /dev/md0 metadata=1.2 UUID=3505e7ec:202fabce:86aee957:c134a8cb name=hazel:0

ARRAY /dev/md1 metadata=1.2 UUID=c988ae5c:f643b427:37db5db0:6627531a name=hazel:1

ARRAY /dev/md2 metadata=1.2 UUID=8e9c0244:726bfd56:30dbfdde:3181043f name=hazel:2
```

Also, I added "raid=noautodetect" to my kernel boot params, and added mdraid to the boot runlevel per the einfo message on the latest ebuild (I only had mdadm there before).

No improvement  :Sad: 

Here's an example of /proc/mdstat in broken mode:

```
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [faulty] 

md1 : inactive sdb2[1](S)

      223689728 blocks super 1.2

       

md2 : inactive sdd3[3](S)

      261621760 blocks super 1.2

       

md0 : inactive sdc1[1](S)

      3070976 blocks super 1.2

       

unused devices: <none>

```

I say "example" because the three partitions that appear seem to be different every time.

----------

## Havin_it

Next thing I tried: commenting-out everything in mdadm.conf, kernel autodetection still turned off. Result:

```
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [faulty] 

md125 : active raid5 sdb1[0] sdc1[1]

      6141696 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [UU_]

      

md126 : active raid5 sdb3[0] sdd3[3]

      523243008 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [U_U]

      

md127 : inactive sda2[0](S)

      223689728 blocks super 1.2

       

md0 : inactive sdd1[3](S)

      3070976 blocks super 1.2

       

md2 : inactive sdc3[1](S)

      261621760 blocks super 1.2

       

md1 : inactive sdb2[1](S)

      223689728 blocks super 1.2

       

unused devices: <none>
```

So, a whole different muddle. Is any of this helping?

EDIT: And here it is with autodetect turned back on:

```
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [faulty] 

md125 : active raid5 sdb1[0] sdd1[3]

      6141696 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [U_U]

      

md126 : active raid5 sdb3[0] sdd3[3]

      523243008 blocks super 1.2 level 5, 128k chunk, algorithm 2 [3/2] [U_U]

      

md127 : inactive sda2[0](S)

      223689728 blocks super 1.2

       

md2 : inactive sdc3[1](S)

      261621760 blocks super 1.2

       

md1 : inactive sdb2[1](S)

      223689728 blocks super 1.2

       

md0 : inactive sdc1[1](S)

      3070976 blocks super 1.2

       

unused devices: <none>
```

Note that in both cases the drives that are assigned are different, and some are missing.

----------

## Havin_it

Right, I think I've now tried every combo of settings I could think of. Finally, I did:

mdadm-3.2.3-r1

kernel version: 3.3.5 (newer)

kernel autodetect: on

mdraid in boot runlevel: yes

mdadm.conf: empty

Works perfectly, device names as before.

I don't think I'm doing anything wrong here, so I'm gonna bugreport it.

----------

## djdunn

this is all problems with superblocks most likely, especially the version 1.2 that doesn't work with kernel autodetection, there's also a thing with mdadm naming stuff md126 and md127 so on, i just gave in and let mdadm win that fight i didn't care that much and changed my systems appropriately. if you go in and wipe and rewrite your superblocks you can probably do it with the newer versions

----------

## Havin_it

From my findings before, I assumed that the kernel had been doing the assembly (since I didn't have mdraid in init, only mdadm). But then I read this in the kernel docs:

 */usr/src/linux/Documentation/md.txt wrote:*   

> Boot time autodetection of RAID arrays
> 
> --------------------------------------
> 
> When md is compiled into the kernel (not as module), partitions of
> ...

 

No idea what "type 0xfd" or "type 0 superblock" mean though, anyone?

FWIW, here's the full info from one partition and its array:

```
hazel ~ # mdadm -E /dev/sdb1

/dev/sdb1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 3505e7ec:202fabce:86aee957:c134a8cb

           Name : hazel:0  (local to host hazel)

  Creation Time : Tue Feb 14 18:58:30 2012

     Raid Level : raid5

   Raid Devices : 3

 Avail Dev Size : 6141952 (2.93 GiB 3.14 GB)

     Array Size : 12283392 (5.86 GiB 6.29 GB)

  Used Dev Size : 6141696 (2.93 GiB 3.14 GB)

    Data Offset : 2048 sectors

   Super Offset : 8 sectors

          State : clean

    Device UUID : 61249d56:ae319f04:9589ef69:8ac6dc21

    Update Time : Wed May 16 10:55:10 2012

       Checksum : 73f249d8 - correct

         Events : 18

         Layout : left-symmetric

     Chunk Size : 128K

   Device Role : Active device 0

   Array State : AAA ('A' == active, '.' == missing)

hazel ~ # mdadm -D /dev/md0

/dev/md0:

        Version : 1.2

  Creation Time : Tue Feb 14 18:58:30 2012

     Raid Level : raid5

     Array Size : 6141696 (5.86 GiB 6.29 GB)

  Used Dev Size : 3070848 (2.93 GiB 3.14 GB)

   Raid Devices : 3

  Total Devices : 3

    Persistence : Superblock is persistent

    Update Time : Wed May 16 10:55:10 2012

          State : clean 

 Active Devices : 3

Working Devices : 3

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 128K

           Name : hazel:0  (local to host hazel)

           UUID : 3505e7ec:202fabce:86aee957:c134a8cb

         Events : 18

    Number   Major   Minor   RaidDevice State

       0       8       17        0      active sync   /dev/sdb1

       1       8       33        1      active sync   /dev/sdc1

       3       8       49        2      active sync   /dev/sdd1
```

----------

## Havin_it

Also should mention that even with the new/bad mdadm, if I stop all the malformed arrays then issue mdadm -As (as the mdraid initscript does), it assembles perfectly, so whatever goes wrong is specific to the boot/init process.

Curiouser and curiouser...

----------

## djdunn

type 0xfd is partition type linux raid

autodetection via kernel does not work with superblock version 1.2

superblock version 1.2 you can only boot / on raid by running mdadm with an initramfs NOT dmraid

----------

## Havin_it

 *djdunn wrote:*   

> type 0xfd is partition type linux raid
> 
> autodetection via kernel does not work with superblock version 1.2
> 
> superblock version 1.2 you can only boot / on raid by running mdadm with an initramfs NOT dmraid

 

OK, so in my previous setting (or now) the kernel was not doing autodetection, but as at that point mdadm was not doing assembly either (mdraid was not in boot runlevel; mdadm initscript doesn't assemble arrays, just monitors them) how did/does it work? I take it udev must be involved, but it does seem conclusive that mdadm is the package that's to blame...

Huh?

What exactly does udev do as part of this process?

PS: not sure if you were being specific or illustrative, but just to reiterate my / isn't on raid. Also isn't dmraid a different package?

----------

## esperto

I've just had the same problem, upgraded the mdadm to version 3.2.4 and when I rebooted the md0 was inactive and only showed my sdd1 partition as part of it, I've immediately rolled back to version 3.2.3-r1 and everything was back to normal, definatly something fishy here.

below is my what I have in mdadm.conf

```

ARRAY /dev/md0 metadata=1.2 name=htpc:0 UUID=b1aa480d:af00e6fd:c35876b8:00ae9e55

```

I'm currently running kernel 3.2.12 from gentoo-sources and I don' t have mdraid on boot.

What should I do? just keep the 3.2.3-r1 version for now and wait for a new update or clean out mdadm.conf and add mdraid to boot sequence? I'm afraid removing the array line from mdadm.conf will screw up my raid.

----------

## Havin_it

Hi esperto, nice to hear I'm not alone   :Very Happy: 

If your issue is the same, I don't think it'll matter what you do if you upgrade again: I've flipped every variable I could think of and still no joy. If you're willing to try it again though, please try them yourself and add your findings to my bug report. Useful info would be:

* does it work using only mdadm (ie if you add raid=noautodetect to your GRUB boot command-line and add mdraid to boot runlevel)? In your case, does this work with mdadm-3.2.3-r1 either?

* Any change if you comment-out the ARRAY line in mdadm.conf? (For me, with 3.2.3-r1 it still works anyway.)

* With mdadm-3.2.4, if you do mdadm -S <each array device mentioned in mdstat>, then mdadm -As, does the array assemble correctly?

* Are your RAID partitions GPT or MS-DOS? How many in the array, what RAID level, etc.

----------

## rcb1974

I have the same problem after upgrading mdadm.

I'm now using mdadm 3.2.4 and vanilla sources 3.3.6.

All my v0.9 superblock software RAID1 arrays (except the /dev/md0 root volume) no longer get autoassembled.

In order to mount the arrays, I first have to stop them, and then assemble them.

Example:

```
mdadm --stop /dev/md1

mdadm --assemble /dev/md1

mount /dev/md1
```

----------

## Havin_it

Interesting - I guess if md0 is your root that you have an initramfs with mdadm in, can you think of any other differences between md1 (and any others) and md0?

Also, does md1 (and others) come up correctly if you just do "mdadm -As"?

----------

