# mdadm superblock problem

## drescherjm

I had a problem with a gentoo server that has been in production for 3 to 5 years now (updated udev but forgot to remove deprecated sysfs).

I could not fix the problem in the running system so I tried to boot from a sysrescue cd and hit a big problem. The machine has 6 750GB hard drives that each have 5 partitions. 4 of these partitions are mdadm raid members. The problem is that during boot 2 of the disks were detected as whole disk members on md0 instead of /dev/sda1 and /dev/sdd1 which were partition members of the md0 raid 1 array. The problem appears that my disks have superblocks (probably from testing before depoyment) when they should not have them.

```
datastore1 ~ # mdadm -E /dev/sda

/dev/sda:

          Magic : a92b4efc

        Version : 0.90.00

           UUID : 89328e5a:110661e4:4dd9d63e:8a6c4e0e

  Creation Time : Fri Oct 12 13:21:08 2007

     Raid Level : raid5

  Used Dev Size : 732574464 (698.64 GiB 750.16 GB)

     Array Size : 2197723392 (2095.91 GiB 2250.47 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0

    Update Time : Mon Oct 15 13:35:42 2007

          State : clean

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

       Checksum : d578e9c2 - correct

         Events : 18

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     0       8        0        0      active sync   /dev/sda

   0     0       8        0        0      active sync   /dev/sda

   1     1       8       16        1      active sync   /dev/sdb

   2     2       8       32        2      active sync   /dev/sdc

   3     3       8       48        3      active sync   /dev/sdd
```

```
datastore1 ~ # mdadm -E /dev/sda1

/dev/sda1:

          Magic : a92b4efc

        Version : 0.90.03

           UUID : 7acd778f:ed62583c:a2ef05c9:d06c0a48

  Creation Time : Thu Jun 15 00:12:24 2006

     Raid Level : raid1

  Used Dev Size : 256896 (250.92 MiB 263.06 MB)

     Array Size : 256896 (250.92 MiB 263.06 MB)

   Raid Devices : 6

  Total Devices : 6

Preferred Minor : 0

    Update Time : Tue Sep 21 17:02:39 2010

          State : clean

 Active Devices : 6

Working Devices : 6

 Failed Devices : 0

  Spare Devices : 0

       Checksum : 15e41b5b - correct

         Events : 445

      Number   Major   Minor   RaidDevice State

this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1

   1     1       8       17        1      active sync   /dev/sdb1

   2     2       8       33        2      active sync   /dev/sdc1

   3     3       8       49        3      active sync   /dev/sdd1

   4     4       8       65        4      active sync   /dev/sde1

   5     5       8       81        5      active sync   /dev/sdf1

```

Where is the superblock stored for full disks? Can I safely zero that without corrupting the working arrays?

Here are the arrays when they are properly assembled. 

```
datastore1 ~ # cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4]

md0 : active raid1 sdf1[5] sdd1[3] sde1[4] sdb1[1] sdc1[2] sda1[0]

      256896 blocks [6/6] [UUUUUU]

md2 : active raid6 sdf5[5] sdd5[3] sde5[4] sdb5[1] sdc5[2] sda5[0]

      1199283200 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]

md3 : active raid6 sdf6[5] sdd6[3] sde6[4] sdb6[1] sdc6[2] sda6[0]

      1680013056 blocks level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md1 : active raid6 sdf3[5] sdd3[3] sde3[4] sdb3[1] sdc3[2] sda3[0]

      46909440 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]

unused devices: <none>

```

----------

## krinn

 *drescherjm wrote:*   

> 
> 
> Where is the superblock stored for full disks? Can I safely zero that without corrupting the working arrays?
> 
> 

 

I don' t have the answer, but re-asking differently your question should gave you the answer. Here's my version of your question.

Do i have datas on the array i'm ok to loose ?

For your main problem if i get it correclty, mdadm badly detect your members of the array and you are trying to alter the disks (the member of the arrays) so mdadm will autodetect correclty who is who ?

I don't use mdadm, but i really doubt you cannot split (split as "unload", "unset", not as destroy") the array to rebuild it manually.

This way you will get back the array, access it and correct your udev trouble.

Of course it will only work if mdadm allow you to do that, but that seems so basic that it will be a shame it don't.

That method should allow you to correct your gentoo, and more: it should remove your question.

----------

## drescherjm

 *Quote:*   

> Do i have datas on the array i'm ok to loose ? 

 

There is between 4 to 5 TB of data that should be all on tape but it would take a long time to make a second backup.

 *Quote:*   

> mdadm badly detect your members of the array and you are trying to alter the disks (the member of the arrays) so mdadm will autodetect correclty who is who ? 

 

I am trying to remove the super blocks off the whole disk raid members so that mdadm detects the correct arrays instead of finding arrays that do not currently exist.

 *Quote:*   

> This way you will get back the array, access it and correct your udev trouble. 

 

I fixed the udev problem and the system actually boots correctly with the current kernel provided genkernel mdadm support is in the initrd. If I boot from a livecd the autodetection of the non-existent arrays causes a mess with the arrays. 

/dev/sda and /dev/sdd go into a single array md0 that will not start because of missing members.

the other three raid 6 arrays start but with 4 out 6 raid members. 

I figured out how to recover from this situation. 

1. stop all 4 arrays

```
mdadm --manage /dev/md0 --stop

mdadm --manage /dev/md1 --stop

mdadm --manage /dev/md2 --stop

mdadm --manage /dev/md3 --stop

```

2. force the kernel to reload the partition table from /dev/sda and /dev/sdd 

```
sfdisk -R /dev/sda

sfdisk -R /dev/sdd
```

3. Reassemble md0

```
mdadm -A /dev/md0 /dev/sd[abcdef]1

```

4. Add the missing raid members to the other disks.

```
mdadm --manage /dev/md1 --add /dev/sda3

mdadm --manage /dev/md1 --add /dev/sdd3

```

After this I can chroot into the system and perform maintenance.. 

BTW, I did not readd the drives to the other two arrays to save time since these are data only.

----------

## krinn

You should backup before altering your array. It's not like it's a critical thing you must do, it's a cosmetic feature you wish working with a livecd. Is it worth the 4-5TB gambling ?

If you are looking for an alternate but faster way of testing that (as secure as it could be without a backup).

I would consider picking one disk as my "test disk", of course, one that is affect by the issue.

Then i would snapshot that drive (sorry i don't have the command in mind, but i'm sure dd is a tool that can do that easy).

This way, i backup the snapshot of the drive, lowering datas to backup to the drive capacity (750G for your case), lol of course not backing that on the array itself, you need find the place elsewhere.

Then setting my array (if possible) and my gentoo to work RO on the array (trying my best to avoid the array writing infos about a failure that might coming next)

Alter the target disk with your modifications.

Now if you boot the array/gentoo and modification prevent the array from working: the array should be RO, all disks except the modified one should be ok. Then restoring (dd or other tool you use to make the snapshot) that disk should get your array back to the previous state (hmm, well, in theory)

This is tricky, this is risky, this is what i always done because i'm lazzy to backup.

As a foot note: you shouldn't asking advice from unknow user on a forum, what risk do they get? 0, i will still sleep very well if your array is dead, i might get ban as retaliation, woooooo, poor me.

But thinking about your side, you have lost 4-5G of datas, and worst you've put a production server in a stop state for some time.

So even someone here tell you : "don't worry, alter it, it's ok it will works", you should still think if anything goes wrong, and many things can goes wrong when tweaking stuff: who will face the result ? Getting fired for this kind of stuff is possible.

Krinn gains a level, wisdom+1

----------

## drescherjm

 *Quote:*   

> You should backup before altering your array. It's not like it's a critical thing you must do, it's a cosmetic feature you wish working with a livecd. Is it worth the 4-5TB gambling ? 

 

I will most likely do that. The issue with backups is the data is subdivided into 10 to 20 projects/grants (we do medical imaging research). And the backup procedure generally is that the data gets backed up to tape manually per project just after the data is added to the project. This way is efficient for restoring data since we know where to look and do not have to worry about name collisions in 10s of millions of files. Normally data gets added at 10 to 100GB a 0 to 4 times a month. The problem with this method is people do not always tell me when they created an entirely new project and its not easy to tell what is an is not backed up. 

 *Quote:*   

> As a foot note: you shouldn't asking advice from unknow user on a forum, what risk do they get? 0, i will still sleep very well if your array is dead, i might get ban as retaliation, woooooo, poor me. 

 

I was thinking about that when I posted. I am > 90% sure that if I zero the superblocks on the whole disk members that all will be well but I better be more careful.

----------

## drescherjm

With the help of wikipedia I found the location of the superblock

https://raid.wiki.kernel.org/index.php/RAID_superblock_formats#The_version-0.90_Superblock_Format

```
datastore1 ~ # hexdump -s 750156242944 -C /dev/sda

aea8cbe000  aa aa aa aa aa aa aa aa  aa aa aa aa aa aa aa aa  |................|

*

aea8cc0000  fc 4e 2b a9 00 00 00 00  5a 00 00 00 00 00 00 00  |.N+.....Z.......|

aea8cc0010  00 00 00 00 5a 8e 32 89  04 ad 0f 47 05 00 00 00  |....Z.2....G....|

aea8cc0020  00 33 aa 2b 04 00 00 00  04 00 00 00 00 00 00 00  |.3.+............|

aea8cc0030  00 00 00 00 e4 61 06 11  3e d6 d9 4d 0e 4e 6c 8a  |.....a..>..M.Nl.|

aea8cc0040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
```

And it is in the 2.6MB (5104 sectors) that are after the last partition.

```
datastore1 ~ # fdisk -l /dev/sda

Disk /dev/sda: 750.2 GB, 750156374016 bytes

255 heads, 63 sectors/track, 91201 cylinders, total 1465149168 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *          63      514079      257008+  fd  Linux raid autodetect

/dev/sda2          514080     2024189      755055   82  Linux swap / Solaris

/dev/sda3         2024190    25479089    11727450   fd  Linux raid autodetect

/dev/sda4        25479090  1465144064   719832487+   5  Extended

/dev/sda5        25479153   625137344   299829096   fd  Linux raid autodetect

/dev/sda6       625137408  1465144064   420003328+  fd  Linux raid autodetect

```

I am 99% sure I can just corrupt this superblock and all will be well. I know mdadm has a zero superblock option but I am concerned that there may be more than 1 superblock (even though the doc does not mention that).. Or is there only 1? I guess I can verify that using virtual box..

Felling confident I knew what I was doing (or at least writing zeros past the end of the last partition would be safe) I went ahead and zapped the superblock:

aea8cc0000 = 750156251136 bytes

```

datastore1 ~ # dd of=/dev/sda if=/dev/zero bs=1 count=4096 seek=750156251136

4096+0 records in

4096+0 records out

4096 bytes (4.1 kB) copied, 0.0112831 s, 363 kB/s

datastore1 ~ # mdadm -E /dev/sda

mdadm: No md superblock detected on /dev/sda.

```

Note: I did save the superblock before attempting this..

And to verify that I did not mess anything up:

```

datastore1 ~ # mdadm -E /dev/sda6

/dev/sda6:

          Magic : a92b4efc

        Version : 0.90.00

           UUID : 19ded0b8:91d05f83:5b8944ff:62ae5209

  Creation Time : Mon Oct 22 14:15:28 2007

     Raid Level : raid6

  Used Dev Size : 420003264 (400.55 GiB 430.08 GB)

     Array Size : 1680013056 (1602.19 GiB 1720.33 GB)

   Raid Devices : 6

  Total Devices : 6

Preferred Minor : 3

    Update Time : Tue Sep 21 17:02:49 2010

          State : clean

 Active Devices : 6

Working Devices : 6

 Failed Devices : 0

  Spare Devices : 0

       Checksum : bfd1dc5e - correct

         Events : 6

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     0       8        6        0      active sync   /dev/sda6

   0     0       8        6        0      active sync   /dev/sda6

   1     1       8       22        1      active sync   /dev/sdb6

   2     2       8       38        2      active sync   /dev/sdc6

   3     3       8       54        3      active sync   /dev/sdd6

   4     4       8       70        4      active sync   /dev/sde6

   5     5       8       86        5      active sync   /dev/sdf6

datastore1 ~ # echo check > /sys/block/md3/md/sync_action

datastore1 ~ # cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4]

md0 : active raid1 sdf1[5] sdd1[3] sde1[4] sdb1[1] sdc1[2] sda1[0]

      256896 blocks [6/6] [UUUUUU]

md2 : active raid6 sdf5[5] sdd5[3] sde5[4] sdb5[1] sdc5[2] sda5[0]

      1199283200 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]

md3 : active raid6 sdf6[5] sdd6[3] sde6[4] sdb6[1] sdc6[2] sda6[0]

      1680013056 blocks level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]

      [>....................]  check =  0.0% (164776/420003264) finish=169.8min speed=41194K/sec

md1 : active raid6 sdf3[5] sdd3[3] sde3[4] sdb3[1] sdc3[2] sda3[0]

      46909440 blocks level 6, 256k chunk, algorithm 2 [6/6] [UUUUUU]

unused devices: <none>

```

----------

