# [solved] recover existing raid1 after mdadm --create?

## njuk-njuk

during a recent upgrade to my gentoo system, i had an issue with the kernel not booting and needing to be rebuilt.  i booted to a livecd and was planning to chroot into my system but ran into a problem...

i have a basic raid1 setup: two identical disks, each with a /boot, swap, and / partition.

i had to mount my raid1 partitions and, since it had been many, many moons the last time i had to chroot from a livecd, i used mdadm incorrectly. 

instead of executing "mdadm --assemble" (for existing partitions) i mistakenly executed "mdadm --create" (for new partitions).  unfortunately, too, i pointed at both disks, so the "create" ran on both raid partitions; otherwise, i guess, i could have rebuilt the screwed one with the pristine one.

therefore, it appears i've hosed my system.  before i go through the arduous process of rebuilding the server from scratch, however, i wanted to check if there might be a means of recovering the partitions.

thanks for any help!Last edited by njuk-njuk on Mon Feb 04, 2013 4:34 pm; edited 1 time in total

----------

## BradN

If you haven't written to the raid, your data should still be intact (in at least as far as it was intact when these problems first started).

Try creating a new raid-1 with the exact same settings as you originally created them (drive order shouldn't be important on raid-1 but try to get that the same if at all convenient).  Make it using the original superblock type (I'm not sure if it matters but just in case).

If that doesn't work, shut down that raid, and create a new raid-1 targeted to two devices like before but only add one of the drives.  Use the word "missing" in place of the other drive.  See if you can mount that or get any useful data out.

If that still doesn't work, try it with the other drive.  Still no luck?  Well, one would have to do a little investigating to see how much data is still present and if it seems in the correct place or not.

----------

## prairiecity

As a fallback position, before you any further, you can just make sure that the RAID array isn't running (mdadm -S /dev/md0) where md0 = the RAID array to stop.  Cat /proc/mdstat to verify that the array isn't active.  Then you can mount either half of the RAID1 array as a regular filesystem (read-only is recommended) as long as you explicitly specify the filesystem type (e.g., mount -t ext2 -o ro /dev/hda1 /mnt) where hda1 = the raw partition, and access the files that way.  It doesn't work with RAID geometries other than RAID1 for obvious reasons, but it works with RAID1.

Under RAID1, the only difference between the underlying filesystem and the RAIDed version is that the RAIDed version has a RAID superblock tacked on at the end.  You can see the sector offset to the RAID superblock with mdadm --example /dev/hda1.

----------

## NeddySeagoon

njuk-njuk,

The default raid superblock type changed areound May last year from 0.90 to 1.2.

As well as containing slightly different data, the newer superblock goes on a different part of the drive.  Further, the default chunk size has changed from 64k to 512k

I don't know what else has changed.

Don't do any more with both drives.  Just in case you make things worse.

mdadm --create <chunk> <superblock> /dev/md0 /dev/sda1 missing is worth a try.

You may have destroyed a piece of the filesystem by using a different raid superblock version, so you may need to tell mount to use an alternate filesystem superblock.

As I don't know the layout of you data on the drive, I can't predict what the damage might be.

----------

## njuk-njuk

 *prairiecity wrote:*   

> ... you can mount either half of the RAID1 array as a regular filesystem (read-only is recommended) as long as you explicitly specify the filesystem type (e.g., mount -t ext2 -o ro /dev/hda1 /mnt) where hda1 = the raw partition, and access the files that way.  It doesn't work with RAID geometries other than RAID1 for obvious reasons, but it works with RAID1 ...

 

thanks so much to everyone who replied.  to start out with, i tried prairecity's approach, since this jived with what i knew and seemed the quickest one to try.

all the partitions on my system are ext3.

i was able to mount my /boot partition without issue; however, i ran across the following issue when mounting my root (/) partition...(the results in dmesg)...

```
EXT3-fs error (device sda6): ext3_check_descriptors: Block bitmap for group 0 not in group (block 2838187772)!

EXT3-fs (sda6): error: group descriptors corrupted
```

is this recoverable?

----------

## NeddySeagoon

njuk-njuk,

Please post the fdisk -l for one of the drives and explain which partitions are donated to raid sets and for what purpose.

Knowing the file systems in use will help later.

If you have been trying mdadm --create again, please post the exact command you used.

----------

## njuk-njuk

sorry.  here is the output of fdisk -l /dev/sda, which is one of two identical drives on my raid1.  /dev/sdb is the other drive, which has identical partition table settings.

```
Disk /dev/sda: 750.2 GB, 750156374016 bytes

255 heads, 63 sectors/track, 91201 cylinders, total 1465149168 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xc8dec8de

   Device Boot      Start         End      Blocks   Id  System

/dev/sda1               1      192779       96389+  fd  Linux raid autodetect

/dev/sda2   *      192780    78316874    39062047+   7  HPFS/NTFS/exFAT

/dev/sda3        78316875    97851914     9767520    c  W95 FAT32 (LBA)

/dev/sda4        97851915  1465144064   683646075    5  Extended

/dev/sda5        97851916   113483159     7815622   fd  Linux raid autodetect

/dev/sda6       113483161  1465144064   675830452   fd  Linux raid autodetect
```

/dev/sda1 is /boot

/dev/sda5 is swap

/dev/sda6 is /

as i said in my previous reply, /boot appears intact as it can be mounted fine, but / produces the errors reported when i try to mount it.

i have not yet tried the mdadm --create again, as i first wanted to check if i could simply mount the partitions; this seemed the least intrusive.

----------

## NeddySeagoon

njuk-njuk,

Your should not try to mount the underlying partitions that are donated to a raid set.  If you must, use the  -o ro option to mount as you really don't want any writes to the partition that the kernel md system is not aware of.

It may be useful to post the output of 

```
mdadm -E /dev/sda[156]
```

this  will tell what mdadm sees now.  If /dev/sdb is still connected, to the system, repeat the command for sdb too..

That you can mount /dev/sda1 points to it having a version 0.9 raid superblock.  Were it version 1.2, the superblock would be where mount expects the filesystem to start. mount wouldn't like that at all.

If /dev/sda6 has a version 1.2 raid superblock, you must assemble the raid, then mount the raid device, even if the raid device only has a single partition in it. mount then looks at the right place to find the filesystem.

```
mdadm -A /dev/mdX /dev/sda6 missing 

mount -o ro /dev/mdX
```

might work.  You need to fill in the X

You may get errors about mdadm assembling but not running mdX.  If so, you need to tell mdadm to run the degraded raid.

----------

## njuk-njuk

thanks, NeddySeagoon, for your clear explanations.

first, i am fairly certain that i instantiated this raid configuration several years ago.  this would suggest version 0.90 and, as you say, confirmed by the fact that i was able to mount /boot.  (btw: i did mount read-only to avoid inadvertent writes to the partition.)  however, as you'll see in the output below, it says v1.2, which is probably due to the mistaken "mdadm --create" which caused the problem i'm currently in.

here is the output of "mdadm --examine", first for /dev/sda[156], followed by /dev/sdb[156] ...

```
/dev/sda1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 2fc35efa:9b61837e:1677a10f:0a734169

           Name : Gentoo-11:1  (local to host Gentoo-11)

  Creation Time : Tue Jan 22 11:31:04 2013

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 192755 (94.13 MiB 98.69 MB)

     Array Size : 192754 (94.13 MiB 98.69 MB)

  Used Dev Size : 192754 (94.13 MiB 98.69 MB)

    Data Offset : 24 sectors

   Super Offset : 8 sectors

          State : clean

    Device UUID : d3aa5dd1:04a59c66:de5c70d4:d7c86e3a

    Update Time : Tue Jan 22 11:31:07 2013

       Checksum : 28dbb62d - correct

         Events : 17

   Device Role : Active device 0

   Array State : AA ('A' == active, '.' == missing)

/dev/sda5:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 3291288c:70348fee:bd1bc3cf:139e4f5f

           Name : Gentoo-11:5  (local to host Gentoo-11)

  Creation Time : Tue Jan 22 11:31:17 2013

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 15629196 (7.45 GiB 8.00 GB)

     Array Size : 15629172 (7.45 GiB 8.00 GB)

  Used Dev Size : 15629172 (7.45 GiB 8.00 GB)

    Data Offset : 2048 sectors

   Super Offset : 8 sectors

          State : clean

    Device UUID : dac1dd92:3d6b52dd:a25ebced:ce108093

    Update Time : Tue Jan 22 11:32:46 2013

       Checksum : 8d4f6b24 - correct

         Events : 17

   Device Role : Active device 0

   Array State : AA ('A' == active, '.' == missing)

/dev/sda6:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 68b773d0:e319c231:316a70b0:88b94362

           Name : Gentoo-11:6  (local to host Gentoo-11)

  Creation Time : Tue Jan 22 11:31:30 2013

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 1351658856 (644.52 GiB 692.05 GB)

     Array Size : 1351658584 (644.52 GiB 692.05 GB)

  Used Dev Size : 1351658584 (644.52 GiB 692.05 GB)

    Data Offset : 2048 sectors

   Super Offset : 8 sectors

          State : clean

    Device UUID : 1aa3c455:38f19b3f:6e146af7:e4fd0196

    Update Time : Tue Jan 22 11:44:43 2013

       Checksum : c9115061 - correct

         Events : 5

   Device Role : Active device 0

   Array State : A. ('A' == active, '.' == missing)
```

```
/dev/sdb1:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 2fc35efa:9b61837e:1677a10f:0a734169

           Name : Gentoo-11:1  (local to host Gentoo-11)

  Creation Time : Tue Jan 22 11:31:04 2013

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 192755 (94.13 MiB 98.69 MB)

     Array Size : 192754 (94.13 MiB 98.69 MB)

  Used Dev Size : 192754 (94.13 MiB 98.69 MB)

    Data Offset : 24 sectors

   Super Offset : 8 sectors

          State : clean

    Device UUID : cf9b3a0b:0466158e:be792d52:d5d7494c

    Update Time : Tue Jan 22 11:31:07 2013

       Checksum : 19c99407 - correct

         Events : 17

   Device Role : Active device 1

   Array State : AA ('A' == active, '.' == missing)

/dev/sdb5:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 3291288c:70348fee:bd1bc3cf:139e4f5f

           Name : Gentoo-11:5  (local to host Gentoo-11)

  Creation Time : Tue Jan 22 11:31:17 2013

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 15629196 (7.45 GiB 8.00 GB)

     Array Size : 15629172 (7.45 GiB 8.00 GB)

  Used Dev Size : 15629172 (7.45 GiB 8.00 GB)

    Data Offset : 2048 sectors

   Super Offset : 8 sectors

          State : clean

    Device UUID : 4ff67c50:c0cdb014:c997ad7d:fe177b38

    Update Time : Tue Jan 22 11:32:46 2013

       Checksum : b7394272 - correct

         Events : 17

   Device Role : Active device 1

   Array State : AA ('A' == active, '.' == missing)

/dev/sdb6:

          Magic : a92b4efc

        Version : 1.2

    Feature Map : 0x0

     Array UUID : 68b773d0:e319c231:316a70b0:88b94362

           Name : Gentoo-11:6  (local to host Gentoo-11)

  Creation Time : Tue Jan 22 11:31:30 2013

     Raid Level : raid1

   Raid Devices : 2

 Avail Dev Size : 1351658856 (644.52 GiB 692.05 GB)

     Array Size : 1351658584 (644.52 GiB 692.05 GB)

  Used Dev Size : 1351658584 (644.52 GiB 692.05 GB)

    Data Offset : 2048 sectors

   Super Offset : 8 sectors

          State : active

    Device UUID : 208fc758:a40139b2:8358f965:d817d7be

    Update Time : Tue Jan 22 11:40:50 2013

       Checksum : db66ae71 - correct

         Events : 3

   Device Role : Active device 1

   Array State : AA ('A' == active, '.' == missing)
```

----------

## NeddySeagoon

njuk-njuk,

The raid entries for sd[ab][12] agree.  Look at the 

```
    Update Time : Tue Jan 22 11:32:46 2013

       Checksum : b7394272 - correct

         Events : 17 
```

entries.

for the third raid 

```
    Update Time : Tue Jan 22 11:40:50 2013

       Checksum : db66ae71 - correct

         Events : 3

    Update Time : Tue Jan 22 11:44:43 2013

       Checksum : c9115061 - correct

         Events : 5 
```

 The update tomes and event counts differ. IF you try to assemble this raid, the out of date drive wiil be overwritten with the later information. 

/dev/sda6 is the newer of the two.

How did you assemble the raid before you used --create ?

This may help us determine the correct superblock.  Did the kernel do it, because you have 

```
[*]     Autodetect RAID arrays during kernel boot
```

 and the partitions marked as type

```
fd  Linux raid autodetect
```

in the partition table or did you use an initrd to assemble the raid before you mounted root ?

It matters because kernel raid auto assembly only works with version 0.90 superblocks. 

This wiki page is worth a read.  Other than needing to be updated with the default raid superblock, it contains a lot of useful information.  In particularly, it shows that the ver 0.9 superblock is at the end of the disk and the ver 1.2.is 4k from the beginning of the disk.

This probably means that your ver 0.9 superblocks are still there as they are outside of the filesystem you made on the array but that  the ver 1.2 superblock you added trampled 260 bytes starting 4k from the start of the device.

This wiki shows that your unwanted ver 1.2 raid superblock has been written in the middle of the Group Descriptors of the first block group.

This is a bad think but fdisk may be able to fix it.  However, DO NOT just run fdisk, at least, not yet anyway.  As your raid superblock is inside your filesystem, if fdisk works, your raid superblock will be gone.

If your ver 0.9 superblock was present when your ran mdadm --create, mdadm should have warned you. Did it?

----------

## njuk-njuk

sorry for the belated reply, NeddySeagoon.  for some reason, i didn't catch that you had posted more information.  thanks, again, for your help.

 *Quote:*   

> The raid entries for sd[ab][15] agree ... [but not] the third raid.

 

i'm not entirely sure why the update times of the 3rd raid array --- sd[ab]6, my root partition --- are not in full agreement.  especially why sdb6 is slightly newer than sda6.  what i'm guessing, though, is that i may have attempted to work with one of the arrays without the other.

 *Quote:*   

> How did you assemble the raid before you used --create ?

 

the initial configuration of the raids were through following the Gentoo Wiki instructions: Software RAID Install and RAID/Software.  i followed the fairly standard setup.

upon skimming these wiki pages now, it appears that these have been updated to cover the 1.2 vs. 0.9 mdadm versions --- the initial configuration was several year ago, though, but i don't recall this versioning being so prominently mentioned.

 *Quote:*   

> How did you assemble the raid before you used --create ? ... It matters because kernel raid auto assembly only works with version 0.90 superblocks. 

 

i am fairly certain i am not using 'initrd' to assemble the raids --- i don't use genkernel.

 *Quote:*   

> If your ver 0.9 superblock was present when your ran mdadm --create, mdadm should have warned you. Did it?

 

i did receive some notice when i ran the "mdadm --create", though don't recall exactly what it said.  all i can recollect was that the message spanned several lines, and i am pretty sure i had to say "yes" to proceed.  (i recognize that i should have been more careful but was somewhat distracted at the time.)

----------

## NeddySeagoon

njuk-njuk,

OK, lets assume you followed the defaults when you made the raid originally, so you have a version 0.9 superblock.

Hmm ... with version 0.9 raid superblocks being at the end of the volume, before you write anything anywhere. the filesystem starts in the normal unraided place, so even without recreating the version 0.9 raid superblock, mount may work with the use of an alternate filesystem superblock.  Try that first. before you write anything anywhere. Do not forget the use of -o ro

Your filesystem probably uses 4k blocks, but trial and error with alternate filesystem superbolck numbers is safe.

There are two options.

a) you use dd to fetch the last 256Kb of /dev/sda6 to a file and email me the file.  I'll look at it for the remains of your original superblock.

b) as you accepted the default originally we assume it was version 0.9 and recreate the raid on that basis, but only use one drive meanwhile.  

```
mdadm --create /dev/mdX --metadata=0.90  --raid-devices=2 --level=1 /dev/sda6 missing
```

Will recreate your raid1 set in degraded mode with /dev/sda6 in the first slot and the second slot missing.

As you are using raid1, the changed --chunk size does not apply to you.

You need to choose X ... you may also need to use --force, as mdadm will spot the raid ver 1.2 superblock.

Provided the raid set runs, look in 

```
cat /proc/mdstat
```

 you can try mounting it read only.  The -o ro is important. We know its broken.

If the normal mount fails, try feeding mount an alternate superblock.

Read up on the sb= in man mount , under ext2 options.

Do not run fsck.

If this fails, there is one more desperate measure to try - it would be good if you can image one of your mirrors first, if you have the space.

----------

## njuk-njuk

as always, thanks NeddySeagoon  for putting in the effort to help me on this.

i decided to start out with option (b).

to test things, i tried this on my /boot partition (/dev/sda1), which we already knew was mountable.

```
mdadm --create /dev/md1 --metadata=0.90 --raid-devices=2 --level=1 /dev/sda1 missing
```

the command informed me that (1) there was already a filesystem on the partition and (2) it appeared to be part of a raid.  i was required to enter "yes" to complete the operation.  "cat /proc/mdstat" confirmed the creation of the degraded array.  furthermore, i was able to mount the partition read-only --- i'm on a flashkey version of a live-install, so created /mnt/boot as a mountpoint.

```
mount -o ro /dev/md1 /mnt/boot
```

i then did the same procedure with /dev/sda6, my root (/) partition.  [had to say "yes" to prompt warning me of existing filesystem and raid setup.]

```
mdadm --create /dev/md6 --metadata=0.90 --raid-devices=2 --level=1 /dev/sda6 missing
```

verified creation of degraded raid.

```
cat /proc/mdstat
```

however, when i tried to mount the new device...

```
mount -o ro /dev/md6 /mnt/gentoo
```

...it failed (as expected) with the following error message (which was the same as when i earlier tried to mount /dev/sda6 directly).

```
mount: wrong fs type, bad option, bad superblock on /dev/md6,

       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try

       dmesg | tail  or so
```

so, this lead me to try to mount with an alternate superblock, as suggested.

to determine alternate superblocks, i ran mkfs.ext3 with the -n option, so that a dryrun was performed not actual filesystem creation. for those following along: make sure to include the -n switch, otherwise your filesystem will be erased.  note, also, from what i understand, this procedure will only be useful if one is using the same command that was used when initially creating the filesystem.  in my case, i didn't recall doing anything fancy when setting up the raid.

```
mkfs.ext3 -n /dev/md6
```

sorry that i didn't capture the output of this dryrun.  however, the two important parts of the output are the "Block size" and "Superblock backups stored on blocks".  in my case, block size was 4096 (4k), which meant i needed to multiple the reported alternate superblocks by "4" to get a value appropriate for mount's "sb=" value (which assumes 1k units).

my first alternate superblock was 32768, which is 131072 when multiplied by 4.  i then used this for my "sb" value in mount.

```
mount -o ro,sb=131072 /dev/md6 /mnt/gentoo
```

voila! i was able to mount without issue and can now see the underlying contents of the filesystem.

so...what should i do next with this setup?  currently, only /dev/sda has been reverted back to 0.90 metadata, so /dev/sdb needs similar modifications.  what other steps must i take?

thanks!

----------

## NeddySeagoon

njuk-njuk,

The next step is to run fsck on the degraded /dev/md6.  It should fix your trashed first superblock by copying it from a backup.

Hopefully - thats all it will do. Make sure fsck prompts you to agree changes. Say Y when it wants to fix the superblock.

With your filesystem superblock fixed, try mount again, without the alternate superblock, still read only and have a good look around to make sure your data is intact.

----------

## njuk-njuk

 *NeddySeagoon wrote:*   

> ... With your filesystem superblock fixed, try mount again, without the alternate superblock, still read only and have a good look around to make sure your data is intact.

 

in the process of doing this now.

assuming everything goes well, then do i simply add the other device to the currently degraded raid and assume the new, clean device will be used to update the older one?

----------

## NeddySeagoon

njuk-njuk,

Thats exactly right.

mdadm will sync the added partition to the existing partition, in place of the missing element.

Watch  /proc/mdstat before and during the process to see whats going on.

----------

## njuk-njuk

ok.  happy to report that it seems like everything is working fine now.  i got stuck for half of a day with a kernel panic on boot-up after i got the raids seemingly back in order.  i first thought it might have to do somehow with the initial raid problems, but i ended up tracking it to a disconnect on my part.

i couldn't remember what /dev/mdX values on used on the system.  when i fixed the raid setup, i was using the partition number to correspond with the md number.  for example, my boot partition is /dev/sd[ab]1, so i assembled the corresponding raid as /dev/md1.  turns out, though, that on my system i was using /dev/md0 (boot), /dev/md1 (swap), and /dev/md2 (root).  my boot configuration in grub pointed root to /dev/md2 but it was actually now /dev/md6.  i had trouble catching this because the error output (before the kernel panic) had already scrolled off the screen.  anyway, with lots of mucking about, i was able to figure out the problem.  i could have reassembled the arrays to match what i had before, but decided to leave them as-is --- /dev/md1 (boot), /dev/md5 (swap), and /dev/md6 (root) --- and update grub and my system configurations accordingly.  this meant i had to fix /etc/fstab, /etc/mdadm.conf, and a weekly cron script i have to do data scrubbing.

now, everything seems to be fine.  i finally got back to the business of fixing my misconfigured kernel --- the problem that existed prior to my raid debacle --- and am in the finishing touches of my system update.

thanks so much, NeddySeagoon, for walking me through things!  while it took me a bit of time to get through everything, it saved me from reinstalling the system.

for those following along, this is what i did for the final step in getting the second partition assembled into the raid --- i wasn't sure the metadata flag was necessary, but figured it couldn't hurt.  as you may recall from a previous post, i had already fixed /dev/sda[156] by assembling them in their respective (degraded) arrays.  the following was how i brought in /dev/sdb[156]; the older, now out-of-date partitions were sync'd from the data on the corresponding /dev/sda partions.

```
mdadm --assemble --metadata=0.90 /dev/md1 /dev/sd[ab]1

mdadm --assemble --metadata=0.90 /dev/md5 /dev/sd[ab]5

mdadm --assemble --metadata=0.90 /dev/md6 /dev/sd[ab]6

```

----------

