# fixing my mdadm RAID6 array

## russK

My array went screwy

It has 9 devices, 8 were active, 1 spare

The devices in order of Role 0-7 and spare:  /dev/sdj1 /dev/sdp1 /dev/sdb1 /dev/sdc1 /dev/sdl1 /dev/sdk1 /dev/sdo1 /dev/sdn1 /dev/sdh1 

'mdadm -E'  | grep "Array State" reports this:

```
# for sd in /dev/sdj1 /dev/sdp1 /dev/sdb1 /dev/sdc1 /dev/sdl1 /dev/sdk1 /dev/sdo1 /dev/sdn1 /dev/sdh1 ; do mdadm -E $sd ; done | grep "Array State"

   Array State : A.AAAA.A ('A' == active, '.' == missing, 'R' == replacing)

   Array State : AAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

   Array State : A.AAAA.A ('A' == active, '.' == missing, 'R' == replacing)

   Array State : A.AAAA.A ('A' == active, '.' == missing, 'R' == replacing)

   Array State : A.AAAA.A ('A' == active, '.' == missing, 'R' == replacing)

   Array State : A.AAAA.A ('A' == active, '.' == missing, 'R' == replacing)

   Array State : AAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

   Array State : AAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

   Array State : A.AAAA.A ('A' == active, '.' == missing, 'R' == replacing)

```

This is also interesting, the 3 devices that have AAAAAAAA state also have an older "Update Time" and "Events" count (sdp1 sdo1, sdh1).

I don't know what hiccup caused this.  All drives appear healthy with smartctl.

I have enough other storage to copy all devices if necessary.

Note I subsequently stopped the array.  One attempt to reassemble said something like "not enough to start" (I wish I captured the exact message).

Any suggestions for how to proceed?  

Thanks in advance   :Very Happy: 

----------

## eccerr0r

This is weird, looks like a few disks dropped.  So looks like sdp1, sdo1, and sdn1 spontaneously disappeared from the array and hence they were kept with proper array state...  Seems like one of these disks got dropped, your sdh1 got drafted into service and became a bona fide member, and then the other two disks, perhaps one after another, got dropped for whatever reason.

Theoretically you should be able to reassemble the array as you have 6 of 8 devices working and valid.  The disks with AAAAAAAA meant they were spontaneously dropped from the array and never updated that they were dropped - from power outage or cable disconnect, don't know what happened.

If you have the disk space you should copy the disks that are "A.AAAA.A". The AAAAAAAA disk are "old" and you should keep them out for now.  After getting a backup copy you should be able to reassemble the array using just the "A.AAAA.A" disks if they had the same timestamp on them with two "missing" members.  Once that is done you should reintroduce the other three disks - since they are now "old" data, you probably don't care about the data on them, but if you wish you can make a copy of the disk (probably of little value however).  Zero the superblock when you're sure you don't care of the contents of each of those disks (mdadm --zero-superblock /dev/diskyoudontcareaboutcontents1), and then re-add them to the array.  It should then be able to rebuild the array when it notice it has unused spares.

Sigh... I just got a disk dropped from my raid5 array today, found out about it about an hour after it got dropped.  Same troubled disk slot, then again I knew I added a flaky disk.  No hot spares in my array.  Currently reintroducing the flaky disk to the array for the heck of it... didn't stop the array, just removed (mdadm --remove) and readded the disk (mdadm --add).

----------

## russK

eccerr0r,

Thanks so much for the reply.   This agrees with what I was thinking, I'm going to back up the devices with A.AAAA.A and proceed basically like that.

Here goes nothing   :Very Happy: 

Thanks!

----------

## eccerr0r

If you already have a backup as you should, I think it's generally safe to try to assemble as long as you don't --force.  In any case, more backups are always good.  To be honest, never --force and don't overwrite any disk by mistake...these are the datakillers if you're not careful.

I suspect that your array should reassemble fine once you get the right incantation with the missing disks right, just don't try to assemble with the kicked disks, it'll just cause more pain.

Be especially careful when zeroing the superblock of the kicked disks.  Do NOT attempt to zero them until you are able to assemble the 6 good disks of the array.  You may not even need to zero them to be honest, just be especially careful with the commands.

----------

## russK

It took some time copy all devices and now I'm back to it.

I'm beginning to understand that the spare was probably not promoted to an active role, when I try to assemble:[

```
# mdadm --assemble /dev/md127 /dev/sdj1 /dev/sdb1 /dev/sdc1 /dev/sdl1 /dev/sdk1 /dev/sdh1

mdadm: /dev/md127 assembled from 5 drives and 1 spare - not enough to start the array.
```

If I assemble one at a time incrementally, the last two produce the same message:

```
# mdadm --incremental /dev/sdk1

mdadm: /dev/sdk1 attached to /dev/md127, not enough to start (5).

# mdadm --incremental /dev/sdh1

mdadm: /dev/sdh1 attached to /dev/md127, not enough to start (5).

```

The Device Role for /dev/sdh1 does indeed say spare.  So unless there's a way to promote sdh1 now, I might be stuck trying to salvage data from the other 3.

Thoughts?

----------

## NeddySeagoon

russK

What does 

```
mdadm -E /dev/sdj1 /dev/sdp1 /dev/sdb1 /dev/sdc1 /dev/sdl1 /dev/sdk1 /dev/sdo1 /dev/sdn1 /dev/sdh1
```

tell?.

All of it please. A pastebin may be better than posting. There will be quite a lot of output.

There is a risk that the spare did not complete being brought online before it was needed, so as you say, its not a member of the raid set.

Do you have a backup, or space to make images of the raid set members?

----------

## russK

NeddySeagoon,

Any backup that I have of the filesystems is, uh, old   :Embarassed: 

But I do have plenty of other storage and already made images of the devices, so even if I experiment and make a mistake I should at least be able to get back to this state.

Here is the result of

```
mdadm -E /dev/sdj1 /dev/sdp1 /dev/sdb1 /dev/sdc1 /dev/sdl1 /dev/sdk1 /dev/sdo1 /dev/sdn1 /dev/sdh1
```

http://dpaste.com/EKZGP29QR

Thanks for your attention

----------

## NeddySeagoon

russK,

I've downloaded your pastebin to grep it.

Thu distillation looks like ...

```
$ grep -i  -e event -e \/sd -e update russK_raid6.txt

/dev/sdj1:

    Update Time : Sat Dec  4 19:54:11 2021

         Events : 37020

/dev/sdp1:

    Update Time : Sat Dec  4 13:35:15 2021

         Events : 36606

/dev/sdb1:

    Update Time : Sat Dec  4 19:54:11 2021

         Events : 37020

/dev/sdc1:

    Update Time : Sat Dec  4 19:54:11 2021

         Events : 37020

/dev/sdl1:

    Update Time : Sat Dec  4 19:54:11 2021

         Events : 37020

/dev/sdk1:

    Update Time : Sat Dec  4 19:54:11 2021

         Events : 37020

/dev/sdo1:

    Update Time : Sat Dec  4 13:35:15 2021

         Events : 36606

/dev/sdn1:

    Update Time : Sat Dec  4 13:35:15 2021

         Events : 36606

/dev/sdh1:

    Update Time : Sat Dec  4 19:54:11 2021

         Events : 37020
```

We also know that  /dev/sdh1 was the spare.

It looks like the that at Sat Dec  4 13:35:15 2021something horrible happened and your raid split into two.

The spare may or may not have already been in use at the time. 

At at 

```
Sat Dec  4 19:54:11 2021
```

about 7 hours later, it gave up.

Lets assume that the spare was up to date and in use.

Any writes since the earlier time will be incomplete ... and there have been about 400.

Do you know what was written in that timeframe?

Anything that has been damaged is is steps of the 

```
Chunk Size : 512K
```

multiplied by the minimal number of drives in the array. 

Think very carefully about the next steps. If you can make partition images you can practice on the images.

Files are fine as loop devices can be assembled into raid sets.

FIles/partitions on the same spindle work too. If mdadm detect that that, it will shout at you.

When you force assemble the set with the minimum number of elements, mdadm you updates the event count on the odd drive(s).

You get to look around but you might not like what you see. As there are no redundant elements, nothing gets resynched.

That's good.  From memory, mdadm also has a read only option. That's good too. It will stop any journal replays on the underlying filesystem.

However, if mount wants to do a journal replay, it may fail. Mounting with the read only option does not prevent journal replays from changing the filesystem.

Are the drives themselves OK?

If not things get a lot harder.

What does 

```
smartctl -x /dev/sdj /dev/sdp /dev/sdb /dev/sdc /dev/sdl /dev/sdk /dev/sdo /dev/sdn /dev/sdh
```

tell?

That's one for a pastebin its going to be big. Its the extended internal drive status and error log for each drive.

I have had two drives drop out of a raid5 15 min apart due to bad sectors. I was lucky. I lost 4kb in the middle of my media collection.

The backup is to rip all the DVDs again.

The authoritative guide is compulsory reading.

The main take aways are "Don't Panic", and "don't write anything". Right now, we are still in the understanding what went wrong and what have we got to work with phase.

Right now, write nothing and make some images to play with.

----------

## eccerr0r

Still trying to wrap my head around what actually happened.  If three drives in a 8+1 RAID6 suddenly drop, unless one of those that dropped was the +1 spare, it should have been game over, do not pass go, do not collect $200.  Apparently the spare was still in the array, but since we lost too many non-spare drives there was no way for it to rebuild - it should have been immediately stopped writing to any disk and go to limp mode.

How another 400+ transactions still made it to the remaining disks baffles me...  did RAID6 do something bad and try to recover from a 3-disk loss perhaps?

Now I think a --force may be needed to reassemble at this point and yes you'll now have the "fun" task of finding out what got corrupted, a lot of stuff may be corrupted at this point.

TBH single bad sectors are the better way to drop a disk, at least there's a way to know what the corrupted data is (the bad sector!)... having good disks drop due to someone tripping over some cords... now who knows what was being written at the time.

RAID is not backup!

----------

## NeddySeagoon

eccerr0r,

That's why I want to see the SMART logs ... before a --force.

----------

## eccerr0r

russK did mention:

 *russK wrote:*   

> I don't know what hiccup caused this.  All drives appear healthy with smartctl.

 

so I'm not sure if there were any bad sectors unfortunately, so it's leaning towards a "oops" deal... :(

----------

## NeddySeagoon

eccerr0r,

Trust but verify :)

But yeah, its heading that way.

----------

## russK

Thanks guys,

Here is the smartctl output from 'for d in /dev/sdj /dev/sdp /dev/sdb /dev/sdc /dev/sdl /dev/sdk /dev/sdo /dev/sdn /dev/sdh ; do printf "==================== %s ===============\n" $d &&  smartctl -x $d ; done'

http://dpaste.com/9PV5774KJ

I am suspecting the incident may have been due to an ESD event, I'm taking measures to reduce static in this work area.

Yes RAID is not backup  :Smile: 

I am lax with my backups; this is not critical data although may have sentimental and otherwise useful value, I am going to take the recovery process slowly to avoid more damage.

I'm also kicking myself for not configuring my log rotation better because I rebooted the machine a few times logs from mdadm monitor are likely gone.  So many lessons learned here.

NeddySeagoon the link you provided suggests it is outdated but no matter.

I once had a similar issue to this one and successfully used readonly approach with overlays so I intend to try that when diving in to recover since I have plenty of other storage to work with.  Unless you have other suggestions.

Regards

----------

## eccerr0r

nice hodgepodge of disks including laptop SMR disks, hoping those didn't get dropped due to write speed, shouldn't be as this only becomes an issue on rebuild -- and the rebuild couldn't have happened due to loss of 3 disks?

----------

## NeddySeagoon

russK,

Yes, the link is outdated but its good reading. It also links to a more current document.

The drives look healthy

```
$ grep -i -e "dev/sd" -e Reallocated -e Current_Pending_Sector -e Reallocated_Event_Count russK_smart.txt

==================== /dev/sdj ===============

  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0

196 Reallocated_Event_Count -O--CK   200   200   000    -    0

197 Current_Pending_Sector  -O--CK   200   200   000    -    0

==================== /dev/sdp ===============

  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0

196 Reallocated_Event_Count -O--CK   200   200   000    -    0

197 Current_Pending_Sector  -O--CK   200   200   000    -    0

==================== /dev/sdb ===============

  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0

196 Reallocated_Event_Count -O--CK   200   200   000    -    0

197 Current_Pending_Sector  -O--CK   200   200   000    -    0

==================== /dev/sdc ===============

  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0

196 Reallocated_Event_Count -O--CK   200   200   000    -    0

197 Current_Pending_Sector  -O--CK   200   200   000    -    0

==================== /dev/sdl ===============

  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0

196 Reallocated_Event_Count -O--CK   200   200   000    -    0

197 Current_Pending_Sector  -O--CK   200   200   000    -    0

==================== /dev/sdk ===============

  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0

196 Reallocated_Event_Count -O--CK   200   200   000    -    0

197 Current_Pending_Sector  -O--CK   200   200   000    -    0

==================== /dev/sdo ===============

  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0

196 Reallocated_Event_Count -O--CK   200   200   000    -    0

197 Current_Pending_Sector  -O--CK   200   200   000    -    0

==================== /dev/sdn ===============

  5 Reallocated_Sector_Ct   PO--CK   100   100   050    -    0

196 Reallocated_Event_Count -O--CK   100   100   000    -    0

197 Current_Pending_Sector  -O--CK   100   100   000    -    0

==================== /dev/sdh ===============

  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0

196 Reallocated_Event_Count -O--CK   200   200   000    -    0

197 Current_Pending_Sector  -O--CK   200   200   000    -    0
```

The interfaces look OKish from inside the drive too.

```
$ grep -i -e "dev/sd" -e UDMA_CRC_Error russK_smart.txt

==================== /dev/sdj ===============

199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0

==================== /dev/sdp ===============

199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0

==================== /dev/sdb ===============

199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0

==================== /dev/sdc ===============

199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    1

==================== /dev/sdl ===============

199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0

==================== /dev/sdk ===============

199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0

==================== /dev/sdo ===============

199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    1

==================== /dev/sdn ===============

199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    1

==================== /dev/sdh ===============

199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
```

Three of them have an interface error but we can't tell when. Two of those three errored drives fell out of the arrary.

/dev/sdc1 has an interface error but is still current. its probably not cause and effect. 

Interface errors can be the controller, data cable or drive.

sdc is getting a bit long in the tooth.

```
$ grep -i -e "dev/sd" -e Power_On_Hours russK_smart.txt

==================== /dev/sdc ===============

  9 Power_On_Hours          -O--CK   001   001   000    -    72857
```

Personally, I would not use SMR drives is a raid set. When they decide to reshingle a zone due to a write, which takes a long time, the kernel may kick them out of the raid.

The missing drives do not have SMR in common either and you would be really unlucky to have three drives want to reshingle due to the same write.

I think all that's left is a forced RO assembly with the minimum number of drives so that you can have a look round.

The mess due to the 400 writes can be lots (think root directory chunks being inconsistent across drives) to 400 separate files.

The most important thing now is to not do anything that cannot be undone, hence play with images

----------

## eccerr0r

no updates in a while, any luck on recovery?

Ugh, have a related dilemma myself.  RAID5 (3-disk), one disk is throwing bad sectors, been able to rebuild onto the failing disk and it would return no bad sectors (all remapped and fine ... for a while).

Wonder if I should buy another used CMR to replace it or a brand new (closeout) SMR disk at twice the cost...  Cost being a severe factor here unless I can find a new CMR disk at the same price as the SMR...

----------

## NeddySeagoon

eccerr0r,

SMR in raid?

It will work until a write forces reshingling ...

----------

## eccerr0r

So a used 50K POH used CMR disk instead of a brand new SMR?

There's no question reshingling will need to occur, question is whether DM/MDRAID can tolerate it on rebuild.  I'd suspect all other situations it should not be an issue.

---

I ended up getting the 50K POH disk :(

Well it works, and currently rebuilding the array onto it, so I can pull the sick drive out after it's done rebuilding.  Now how long this previously used disk will last is the question...

Fortunately it's getting sequential disk read speeds of 140MB/sec unlike the failing disk of 100MB/sec, though unfortunately it already has SMART fields down to 001 indicating it thinks it's old age...  seems enterprise disks expect to be swapped out well before how long I expect my drives to work for...

----------

