# mdadm RAID 5 - replacing a drive?

## PraetorZero

I built an array a year or two ago with 5 1TB drives.   I apparently never tagged one of them as a spare, which may be coming back around to bite me.

Logwatch has been reporting errors on one of the drives and I'm seeing hiccups when accessing a few of the videos I have stored there.

```
 Currently unreadable (pending) sectors detected:

       /dev/sdc [SAT] - 48 Time(s)

       1128 unreadable sectors detected

 Offline uncorrectable sectors detected:

       /dev/sdc [SAT] - 48 Time(s)

       45 offline uncorrectable sectors detected
```

```
 Currently unreadable (pending) sectors detected:

       /dev/sdc [SAT] - 48 Time(s)

       1128 unreadable sectors detected

 Offline uncorrectable sectors detected:

       /dev/sdc [SAT] - 48 Time(s)

       45 offline uncorrectable sectors detected
```

/proc/mdstat reports

```
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 

md0 : active raid5 sde1[2] sdf1[3] sdc1[0] sdd1[1] sdb1[4]

      3907039744 blocks level 5, 32k chunk, algorithm 2 [5/5] [UUUUU]

      

unused devices: <none>

```

I imagine the array as a whole is fine and is working around this.  If I'm reading this correctly, the array isn't showing any errors.   I have another drive that I'd like to replace, but how do I go about doing it?

Given that the failed drive is sdc1, would it be a matter of

```

mdadm /dev/md1 -r /dev/sdc1

power down

physically replace faulty drive

boot and partition new drive

mdadm /dev/md1 -a /dev/sdc1

```

My apologies if this is a simple matter, I'd rather be positive of the process I need to take before I lose my library.

----------

## linuxtuxhellsinki

 *PraetorZero wrote:*   

>  I have another drive that I'd like to replace, but how do I go about doing it?
> 
> Given that the failed drive is sdc1, would it be a matter of
> 
> ```
> ...

 

Just mark the drive as failed first...

```
mdadm /dev/md1 -f /dev/sdc1
```

...and if you want to partition the new drive similar as others, you can do it with..

```
sfdisk -d /dev/sdb | sfdisk /dev/sdc
```

----------

## PraetorZero

From my reading, there shouldn't be any further commands to enter, correct?   Once the new drive is added, it will populate it with the data that was previously found on the original drive?

If it isn't obvious, this is my first failed disk.   :Smile: 

edit -    That does appear to be it.  cat /proc/mdstat shows it's already rebuilding it.  Very cool.

----------

## Mad Merlin

FWIW, powering down and booting back up isn't strictly necessary. SATA drives can be hotplugged, I've done this several times before.

----------

## Veldrin

That would depend on the controller/driver....

----------

## PraetorZero

 *Mad Merlin wrote:*   

> FWIW, powering down and booting back up isn't strictly necessary. SATA drives can be hotplugged, I've done this several times before.

 

In my case, it was.  The drives aren't on rails or in a case designed for hot swaps.  Maybe someday when the funds become present I'll upgrade.

----------

## eccerr0r

I find it weird that md did not find a disk that failed yet, perhaps it didn't need to (some sort of read-ahead on not-yet used blocks).

But the suggestion to fail the disk first before removing is the correct sequence of action based on current status...  First fail disks with mdadm --fail, then --remove.  Then physically remove, and then --add the new disk back in.

Normally if md detects a bad disk it should automatically --fail a disk and you'd see [UU_UU] ... Then you can just --remove the disk and proceed with physical swap.

If you haven't swapped the disk yet, one thing you could do (though risky) is to do a raid surface scrub.  I do this on my main array every month to make sure there are no undetected unreadable regions on the disk, basically I just force a repair

echo repair > /sys/block/md0/md/sync_action

This rewrites all the parity blocks on a 'UUUUU' RAID by rereading everything, recalculating parity.  (if one of the drives had an undetectable error it would make it hopelessly unrecoverable however.  It's a risk I have to take as I probably can't detect it anyway, depending on disk ECC.)

In any case remember: RAID is NOT backup.  You need to make sure you have a backup of the system before swapping disks... heck, before even having a disk failure.  This is to prevent exactly the fears people have in working with RAIDs - losing data due to swapping wrong disk, etc...

----------

