# Failed RAID1 device, cant login

## njcwotx

I posted earlier about mounting a RAID1 with the live CD.  I managed to get the mirror to start up with

```
mdadm --assemble /dev/hda3 /dev/hdb3
```

However, it appears that the hda3 partition is failed and hdb3 is listed as good, the only problem is we can't seem to get it to mount up and we get lots of superblock errors on hda3.  Im dont have lots of experience with software mirroing and could use some help.

If I use the live cd, I can actually see files when I mount the drive but I can't chroot to it as it says the drive is degraded.

----------

## NeddySeagoon

njcwotx,

When you assemble the raid set, your should get a /dev/mdX which is what you mount.

You may mount one part of a raid1 set as its underlying partition as long as you make it read only.

A read/write mount will get the two parts out of sync, so next time you try to assemble the raid set, it will start in degraded mode.

I'm not sure which part wins, the old original raid part or the altered part.

You should never operate on the underlying partitions of a kernel raid set, even if you can.

----------

## njcwotx

Ok here is what I have so far.  While some of it is straight forward, its hard for me to tell the big picture here.

```
mdadm -A --update=resync --run /dev/hda3 /dev/hdb3
```

```
livecd / # cat /proc/mdstat

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

md127 : active raid1 hdb3[1]

      77587200 blocks [2/1] [_U]

unused devices: <none>

```

```
livecd / # mdadm --detail --scan /dev/md127

/dev/md127:

        Version : 00.90.03

  Creation Time : Fri May 28 11:56:18 2004

     Raid Level : raid1

     Array Size : 77587200 (73.99 GiB 79.45 GB)

  Used Dev Size : 77587200 (73.99 GiB 79.45 GB)

   Raid Devices : 2

  Total Devices : 1

Preferred Minor : 127

    Persistence : Superblock is persistent

    Update Time : Fri Aug 22 18:40:05 2008

          State : clean, degraded

 Active Devices : 1

Working Devices : 1

 Failed Devices : 0

  Spare Devices : 0

           UUID : 24df6bde:40d81b6a:b55d906c:1196c4e1

         Events : 0.192

    Number   Major   Minor   RaidDevice State

       0       0        0        0      removed

       1       3       67        1      active sync   /dev/hdb3

```

I tried to mount the /dev/md127 and get this

```
md: md127 stopped.

md: bind<hdb3>

md: md127: raid array is not clean -- starting background reconstruction

raid1: raid set md127 active with 1 out of 2 mirrors

md: md127 stopped.

md: unbind<hdb3>

md: export_rdev(hdb3)

md: md127 stopped.

md: bind<hdb3>

md: md127: raid array is not clean -- starting background reconstruction

raid1: raid set md127 active with 1 out of 2 mirrors

XFS: bad magic number

XFS: SB validate failed

```

----------

## NeddySeagoon

njcwotx,

Let it complete the reconstruction. When thats done, it should be running on both drives again.

While the two halves are not synchronised, the raid is in degraded mode. You can still use it that way and the kernel will sort it out.

It may take several hours to rebuild as one drive has to be copied to the other and the bandwidth used for this process is deliberately limited, or you would not be able to use the volume while it was running.

----------

## njcwotx

OK cool, is there anyway I can check its rebuild progress?

----------

## NeddySeagoon

njcwotx,

I think it appears in /proc/mdstat

----------

## njcwotx

I posted the output of mdstat and it shows no progress..take a look at it and tell me what you think.

mdstat --detail shows its "removed"

From the output, which drive is failed?  when I mdadm --detail /dev/hda3 it give info, if i mdadm --detail /dev/hdb3 it says its not an md device...

I am not sure its actually rebuilding, I do a mdstat --monitor /dev/md127 and see no action.

PS, lets assume that hda is physically failed, when we pull out the plug it wont activate the mirror.  If we install a blank disk, can we rebuild it normally?  I am not sure of how this will work.

----------

## NeddySeagoon

njcwotx,

Your dmesg says (or said) 

```
md: md127: raid array is not clean -- starting background reconstruction
```

I'm still on raidtools. I will need to update to mdadm one day, raidtools is long gone from portage.

From reading the mdadm man page, it has a command (near the bottom) that returns how far reconstruction has got.

----------

## Mad Merlin

```
cat /proc/mdstat
```

 will indeed show the progress of the reconstruction, if it's taking place.

----------

## njcwotx

in that case, I dont think its reconstructing.  Currently, I am cloning the mirrors into VMWare ESX server to have the ability to work with this more if I kill it.  We have backups, but the original developers who made the application on it are not available permanently  :Smile:  so I am preferring to get this mirror back.

some more questions:

1.  Ok, so from the output, which mirror is failed?  I looks like /dev/hda3 on the output of the /dev/md127, but when I use mdadm --scan /dev/hda3 I get info on a mirror and mdadm --scan /dev/hdb3 I get info its not part of a mirror set.

2.  This is an old install of gentoo, done by some developers who are long gone.  A replacement app has been on the wish list of my development group but a new one has not materialized yet.  The original tools were raidtools and I dont have the startraid command.  Could this be why the mirror wont mount up?

3.  I have tried to remove the physical hda drive but I get a kernel panic, maybe im looking at the wrong drive...

The server is remote to me.  I will be going out tommorrow in approx 18 hrs I might be physically at it and sometimes that is easier.  Thanks for the input.

----------

## NeddySeagoon

njcwotx,

You posted

```
           UUID : 24df6bde:40d81b6a:b55d906c:1196c4e1

         Events : 0.192

    Number   Major   Minor   RaidDevice State

       0       0        0        0      removed

       1       3       67        1      active sync   /dev/hdb3 
```

which shows that hdb3 is active.

Provided fdisk shows the partition types as fd, the kernel should start the raid set on boot. Your raid superblocks are persisteant.

dmesg will show something like

```
[    2.559974] md: considering sdb1 ...

[    2.561614] md:  adding sdb1 ...

[    2.563265] md:  adding sda1 ...

[    2.564896] md: created md0

[    2.566479] md: bind<sda1>

[    2.568023] md: bind<sdb1>

[    2.569521] md: running: <sdb1><sda1>

[    2.571078] raid1: raid set md0 active with 2 out of 2 mirrors

[    2.572665] md: ... autorun DONE.
```

which is my raid1 /boot being started. I guess you will have an error message there.

raidtools and mdadm should be interchangable, so missing raidtools is probably not the issue

----------

## njcwotx

Stranger now....

Ok, I got the system to boot up finally, now when I login, I see 2 separate raid sets!

The hda3 set is md0 which is actually the correct version, and the hdb3 now shows up as raid set md127.  md127 is the label that came up when I tried to mount the raid from the boot cd, now it seems to think it supposed to stay that way.  mdadm tool is not active on the original so, how to I go about telling it in raidtools world that I want to make the hdb3 partition forget about md127.  the raidtab is still the original....when I boot to cd and do a mdadm --details /dev/hdb3 it says its preferred mirror is 127....

PS, I tried to see if mkraid was there and its not there either.

Is is just easier to wipe the hdb drive and let the mirror set fix itself?  I really have pucker factor wiping one side of the mirror!

=====the stuff I see===========

Where the heck does it get the md127 from?  I cant find any /etc config file with that in it, it must be stored somehow in the partition table?  no executable commands I can find to modify this.  Maybe something can be done from the boot disk side?

```
cat /proc/mdstat

Personalities : [raid1] [multipath]

read_ahead 1024 sectors

md0 : active raid1 ide/host0/bus0/target0/lun0/part3[0]

      77587200 blocks [2/1] [U_]

md127 : active raid1 ide/host0/bus0/target1/lun0/part3[1]

      77587200 blocks [2/1] [_U]

unused devices: <none>

```

```
cat /etc/raidtab

raiddev /dev/md0

raid-level 1

persistent-superblock 1

nr-raid-disks 2

chunk-size 32

device /dev/hda3

raid-disk 0

device /dev/hdb3

raid-disk 1

```

```
dmesg |grep md

Kernel command line: root=/dev/md0

md: raid1 personality registered as nr 3

md: multipath personality registered as nr 7

md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27

md: Autodetecting RAID arrays.

md: autorun ...

md: considering ide/host0/bus0/target1/lun0/part3 ...

md:  adding ide/host0/bus0/target1/lun0/part3 ...

md: ide/host0/bus0/target0/lun0/part3 has same UUID as ide/host0/bus0/target1/lun0/part3, but superblocks differ ...

md: created md127

md: bind<ide/host0/bus0/target1/lun0/part3,1>

md: running: <ide/host0/bus0/target1/lun0/part3>

md: ide/host0/bus0/target1/lun0/part3's event counter: 000000d9

md: RAID level 1 does not need chunksize! Continuing anyway.

md127: max total readahead window set to 124k

md127: 1 data-disks, max readahead per data-disk: 124k

raid1: md127, not all disks are operational -- trying to recover array

raid1: raid set md127 active with 1 out of 2 mirrors

md: recovery thread got woken up ...

md127: no spare disk to reconstruct array! -- continuing in degraded mode

md: recovery thread finished ...

md: updating md127 RAID superblock on device

md: ide/host0/bus0/target1/lun0/part3 [events: 000000da]<6>(write) ide/host0/bus0/target1/lun0/part3's sb offset: 77587200

md: considering ide/host0/bus0/target0/lun0/part3 ...

md:  adding ide/host0/bus0/target0/lun0/part3 ...

md: created md0

md: bind<ide/host0/bus0/target0/lun0/part3,1>

md: running: <ide/host0/bus0/target0/lun0/part3>

md: ide/host0/bus0/target0/lun0/part3's event counter: 00000072

md: RAID level 1 does not need chunksize! Continuing anyway.

md0: max total readahead window set to 124k

md0: 1 data-disks, max readahead per data-disk: 124k

raid1: md0, not all disks are operational -- trying to recover array

raid1: raid set md0 active with 1 out of 2 mirrors

md: recovery thread got woken up ...

md0: no spare disk to reconstruct array! -- continuing in degraded mode

md127: no spare disk to reconstruct array! -- continuing in degraded mode

md: recovery thread finished ...

md: updating md0 RAID superblock on device

md: ide/host0/bus0/target0/lun0/part3 [events: 00000073]<6>(write) ide/host0/bus0/target0/lun0/part3's sb offset: 77587200

md: ... autorun DONE.

md: swapper(pid 1) used obsolete MD ioctl, upgrade your software to use new ictls.

reiserfs: checking transaction log (device md(9,0)) ...

for (md(9,0))

md(9,0):Using r5 hash to sort names

```

----------

## njcwotx

OK, I emerge mdadm and can now use those tools on the box...now to clear the md127 stuff but how...of to rtfm land.

----------

## njcwotx

Hot Diggity!!!!!!!!!!!!!!!!!!!!!!!!!!!!

I had to emerge mdadm and after a lot of manpage reading and prayer I manged to get rid of md127, here are a few of my commands I pulled from history.  Thanks for the help/confidence guys.

```
  548  mdadm --stop /dev/md127

  554  mdadm /dev/md0 -f /dev/hdb3

  556  mdadm --manage -r /dev/md127

  561  mdadm /dev/md0 --add /dev/hdb
```

```
 cat /proc/mdstat

Personalities : [raid1] [multipath]

read_ahead 1024 sectors

md0 : active raid1 ide/host0/bus0/target1/lun0/disc[2] ide/host0/bus0/target0/lun0/part3[0]

      77587200 blocks [2/1] [U_]

      [>....................]  recovery =  0.3% (273728/77587200) finish=32.9min speed=39104K/sec

unused devices: <none>

```

----------

