# [SOLVED] RAID5 disappeared on reboot after initial creation

## sirlark

I had two 1.5Tb drives in a RAID1 configuration. I bought a third identical drive, and tried to migrate the RAID1 to a RAID5 configuration. Everything seemed to go as planned, but when I rebooted none of my RAID partitions came back up. Here's the process I followed

Starting with a RAID1 (md0) composed of /dev/sdb1 and /dev/sdc1

[list=]

[*]Installed new drive which detected as /dev/sdc, bumping original sdc to sdd

[*]partitioned entire sdc as linux raid partition

[*]Set /dev/sdc1 to failed

[*]Created a new partial RAID5 from (md1) from /dev/sdc1 and /dev/sdc2

[*]Copied everything from md0 to md1

[*]Stopped md0

[*]Added sdb1 to md1

[/list]

At this point everything seemed to be working fine. Then I rebooted. Note that I forgot to make changes to /etc/mdadm.conf, so my old array (md0) was still in the config file

Upon reboot the only md device was md127 and it wasn't started. mdadm reported that it was a raid 1 composed of sdb and sdd. I tried to assemble/create/start the raid5 array but was forced to stop md127 first. Then I recreated the raid5 array

```
mdadm --verbose -C -n 3 -l 5 /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1
```

It still would not mount (bad superblock) and /proc/mdstat noted it was recovering (even though it shut down cleanly). I immediately stopped the array, but now I have no idea about what to do next.

Any and all help appreciated.

----------

## DawgG

i think this is caused by newer versions of mdadm; i've had this problem myself. try recreating the the array with "old-style" option; sth like

```
mdadm (...) -e0.90 (...)
```

 After that i got the "old" devnames again. Just be sure to backup your data before experimenting with any of that.

GOOD LUCK!

----------

## NeddySeagoon

sirlark,

Run 

```
mdadm -E /dev/...
```

on each of the partitions that are donated to your raid set and post the output.

----------

## sirlark

Here it is

```
root@bragi ~ # mdadm -E /dev/sd[bcd]1

/dev/sdb1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 51c1a2d8:f1bf241e:53c73de4:20967e3f

           Name : bragi:0  (local to host bragi)

  Creation Time : Mon Oct 24 14:16:09 2011

     Raid Level : raid5

   Raid Devices : 3

 Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB)

     Array Size : 5860538368 (2794.52 GiB 3000.60 GB)

  Used Dev Size : 2930269184 (1397.26 GiB 1500.30 GB)

    Data Offset : 2048 sectors

   Super Offset : 8 sectors

          State : clean

    Device UUID : bdd1b6d5:ebf8d3b9:b1a29309:21b4d7cb

    Update Time : Wed Oct 26 18:34:42 2011

       Checksum : 8a1b5180 - correct

         Events : 3

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 0

   Array State : AAA ('A' == active, '.' == missing)

/dev/sdc1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 51c1a2d8:f1bf241e:53c73de4:20967e3f

           Name : bragi:0  (local to host bragi)

  Creation Time : Mon Oct 24 14:16:09 2011

     Raid Level : raid5

   Raid Devices : 3

 Avail Dev Size : 2930275057 (1397.26 GiB 1500.30 GB)

     Array Size : 5860538368 (2794.52 GiB 3000.60 GB)

  Used Dev Size : 2930269184 (1397.26 GiB 1500.30 GB)

    Data Offset : 2048 sectors

   Super Offset : 8 sectors

          State : clean

    Device UUID : eb43c15a:2847d68e:e5b5e1f6:27e5ae10

    Update Time : Wed Oct 26 18:34:42 2011

       Checksum : 164d6a15 - correct

         Events : 3

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 1

   Array State : AAA ('A' == active, '.' == missing)

/dev/sdd1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x2

     Array UUID : 51c1a2d8:f1bf241e:53c73de4:20967e3f

           Name : bragi:0  (local to host bragi)

  Creation Time : Mon Oct 24 14:16:09 2011

     Raid Level : raid5

   Raid Devices : 3

 Avail Dev Size : 2930269954 (1397.26 GiB 1500.30 GB)

     Array Size : 5860538368 (2794.52 GiB 3000.60 GB)

  Used Dev Size : 2930269184 (1397.26 GiB 1500.30 GB)

    Data Offset : 2048 sectors

   Super Offset : 8 sectors

Recovery Offset : 13618816 sectors

          State : clean

    Device UUID : e4003eff:1e15c213:635a5df6:0172a71e

    Update Time : Wed Oct 26 18:34:42 2011

       Checksum : 4df9e0f1 - correct

         Events : 3

         Layout : left-symmetric

     Chunk Size : 512K

   Device Role : Active device 2

   Array State : AAA ('A' == active, '.' == missing)

```

I can assemble the array just fine using mdadm -A /dev/md0 /dev/sd[bcd]1 but the file system still won't mount, and cat /proc/mdstat lists the array as recovering. I don't know why it would need to recover, there were never any failed drives in it. Does 'recovering' mean it's overwriting the old file system with something else? And surely even if it is recovering, the file system should be mountable, assuming it is still there.

I suppose that in a nutshell, what I'm asking is, will letting the recovery complete nuke my data, or restore it?

----------

## NeddySeagoon

sirlark,

```
        Version : 1.2

    Feature Map : 0x0

     Array UUID : 51c1a2d8:f1bf241e:53c73de4:20967e3f 
```

That confirms that you have raid superblock version 1.2, so kernel autoassemble will not work for you.

On the good news side the Array UUID is identical for all three drives, so mdadm things they are all part of the same raid set.

Is the word recovering or rebuilding?

I've never seen recovering.  rebuilding is what happens when you add a new drive to a raid set that is running in degraded mode, the kernel generates the redundant data to match the other drives.

It is worrying that you can't mount your filesystem while the raid set rebuilds.  I seem to remember that you created the raid set twice.  One in degraded mode and again when you had problems later.  Raid5 sets always rebuild the first time they are brought up complete as the redundent data needs to match, even for unused space. The kernel has no way of knowing was used and unused space at the block level, so its all rebuilt. That much is normal.

What filesystem did you have on the raid set?  

It may be worth trying to mount it using an alternate filesystem superblock.  Do not run fsck.  Thats a last resort as it often makes things worse, not better.

If you have a backup, or can make a backup, its not quite so risky but there are other things to try first.

What mount command are you using and what is the error?

What does  file -s /dev/md0  say about your filesystem ?

----------

## sirlark

NeaddySeagoon,

This will probably cover most of your questions

```
root@bragi ~ # cat /proc/mdstat 

Personalities : [raid1] [raid6] [raid5] [raid4] 

unused devices: <none>

root@bragi ~ # mdadm -A --no-degraded /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1

mdadm: /dev/md0 has been started with 2 drives (out of 3) and 1 rebuilding.

root@bragi ~ # cat /proc/mdstat 

Personalities : [raid1] [raid6] [raid5] [raid4] 

md0 : active raid5 sdb1[0] sdd1[3] sdc1[1]

      2930269184 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]

      [>....................]  recovery =  0.4% (6837900/1465134592) finish=852.9min speed=28492K/sec

      

unused devices: <none>

root@bragi ~ # mdadm -S /dev/md0

mdadm: stopped /dev/md0

```

 *Quote:*   

> It is worrying that you can't mount your filesystem while the raid set rebuilds. I seem to remember that you created the raid set twice. One in degraded mode and again when you had problems later. Raid5 sets always rebuild the first time they are brought up complete as the redundent data needs to match, even for unused space. The kernel has no way of knowing was used and unused space at the block level, so its all rebuilt. That much is normal.

 

Yup, it is worrying... I did create the raid set twice. The first time I created it degraded, copied over data from the remaining drive of the original raid1. Then I stopped the raid 1, and added the free drive to the raid5. I waited for the syncing to complete, then rebooted. That was when the problems started. I then created the array again (although I gather I should have assembled it). I think this is the real problem.

The filesystem on the raid5 was ext4

 *Quote:*   

> What mount command are you using and what is the error?
> 
> What does file -s /dev/md0 say about your filesystem ?

 

```

root@bragi ~ # mdadm -A --no-degraded /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1; mount /dev/md0 /mnt/temp; file -s /dev/md0; mdadm -S /dev/md0

mdadm: /dev/md0 has been started with 2 drives (out of 3) and 1 rebuilding.

mount: wrong fs type, bad option, bad superblock on /dev/md0,

       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try

       dmesg | tail  or so

/dev/md0: Linux rev 536871098.3822 ext4 filesystem data, UUID=89851068-e036-4db5-bb47-195842344d49 (compressed) (extents) (huge files)

mdadm: stopped /dev/md0

root@bragi ~ # dmesg |tail -n 11

[304236.265416]  md0: unknown partition table

[304236.378860] EXT4-fs (md0): Couldn't mount because of unsupported optional features (1fd00001)

[304236.776274] md: md_do_sync() got signal ... exiting

[304236.789309] md0: detected capacity change from 3000595644416 to 0

[304236.789318] md: md0 stopped.

[304236.789324] md: unbind<sdb1>

[304236.793276] md: export_rdev(sdb1)

[304236.793305] md: unbind<sdd1>

[304236.797270] md: export_rdev(sdd1)

[304236.797290] md: unbind<sdc1>

[304236.806270] md: export_rdev(sdc1)

```

The first line shouldn't be a problem according to http://osdir.com/ml/linux-raid/2009-11/msg00344.html

That second line from dmesg seems encouraging though. It's not a raid issue that's preventing the mount, it's some or other ext4 feature I enabled. Now to find out which one... Of course, I'm also banging my head against the wall and table because if I'd read dmesg when it first failed to mount I could have saved myself a lot of grief (I think)

At this point it seems to me that it would be safe to let the raid recovery proceed, right?

Thanks for all the help btw

----------

## sirlark

I'm still having no luck trying to mount the file system, specifically I can't figure out after much googling what feature is missing that I can't mount the ext4 partition. Also, seeing as how I could mount it initially (before the reboot) and having not changed the kernel, I don't think the error message is accurate. Here's what I think might have happened:

The first time I created the array (in degraded state) I used the following command:

```
mdadm --create -l 5 -n 3 /dev/md1 /dev/sdc1 /dev/sdd1 missing
```

After reboot, and the failure to autodetect (which I now understand) I recreated the array using

```
mdadm --verbose -C -n 3 -l 5 /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1
```

I assume that if I had assembled mdadm would have been 'smart enough' to get the order of the disks right, but seeing as how I created, I essentially forced a change in layout. My next planned step is to recreate the array using the initial ordering

```
mdadm --create -l 5 -n 3 /dev/md1 /dev/sdc1 /dev/sdd1 /dev/sdb1
```

and attempt to mount my file system using an alternate superblock. Main question is, can I start the array and force it not recover/rebuild/resync... i.e. something like read only mode. I don't mean the file system, I mean stop the kernel screwing up the underlying disks if this is the wrong thing to do

----------

## sirlark

It turns out that the missing feature actually is because the RAID was screwed up but not as badly as I thought. Clearly the bits on the superblock that ext4fs was trying to look at were way out of whack, probably because the were parity bits or something. Anyway, it turns out that when I created the array the second time after reboot, I listed the devices in alphabetical order, not the order I originally used. I assumed that mdadm would use the superblocks of the devices to figure out the layout for itself, but it didn't; possibly because I used create instead of assemble. I created the array again, this time using only the first two devices, according to the original order, and viola! there was my file system. I nuked the third device from orbit (dd if=/dev/zero of=/dev/sdc) repartitioned and added again, set up my mdadm.conf correctly, and everything is now hunky dory again. Thanks for the help everyone.

----------

