# mdadm raid5 disk failure? remove spares?[SOLVED]

## PietdeBoer

Hi there,

i've got 2 arrays of 4 sata disks in raid5

one 4x 400GB 

one 4x 300GB

the 400gb array works like a charm, but the 300gb array gets me headaches..

yesterday my server crashes, at bootup i saw my 300gb array wasnt started so i took a look at the messages:

```
fileserver ~ # mdadm --assemble /dev/md0

mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.
```

```
fileserver ~ # cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]

md1 : active raid5 sdb1[0] sde1[3] sdd1[2] sdc1[1]

      1172126208 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md0 : inactive sdg1[1](S) sdf1[4](S) sdi1[3](S) sdh1[2](S)

      1172198400 blocks

unused devices: <none>
```

```
fileserver ~ # mdadm --examine /dev/sdf1

/dev/sdf1:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : bee0557b:82595891:05cb0d31:ecdeebd9

  Creation Time : Tue Mar 20 15:20:46 2007

     Raid Level : raid5

  Used Dev Size : 293049600 (279.47 GiB 300.08 GB)

     Array Size : 879148800 (838.42 GiB 900.25 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0

    Update Time : Wed Nov 28 23:45:42 2007

          State : clean

 Active Devices : 2

Working Devices : 3

 Failed Devices : 1

  Spare Devices : 1

       Checksum : 7bda2617 - correct

         Events : 0.151390

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     4       8       81        4      spare   /dev/sdf1

   0     0       0        0        0      removed

   1     1       8       97        1      active sync   /dev/sdg1

   2     2       0        0        2      faulty removed

   3     3       8      129        3      active sync   /dev/sdi1

   4     4       8       81        4      spare   /dev/sdf1
```

as you can see i have this devices tagged as removed, how can i "completely" remove them from my array?

why is my sdf1 set as spare? it should be a normal disk active synced, how can i reset the state to a normal raid disk instead of spare?

and what is the best way to check a disk for errors without taking the server down (not filesystem)

thx in advance!

----------

## merlijn

First off you need to determine if it is really one of your disks that is giving errors, or perhaps one of the cables isn't plugged in properly. The best way to check is to use the disk's SMART capability. You will need sys-apps/smartmontools to view the information of your disks, and perhaps run a long selftest.

When you create a raid5 array, one of the drives will become the 'spare', and this cannot be changed unless you recreate the array. Remember the spare just keeps parity information in order to restore a failed drive when needed.

To remove the failed drive from the array you might have to use --fail,  --remove and --force flags. But first you need to figure out where the problem is, if it's just a hiccup in the controller or a bad cable, you can just re-add this drive. If one of your disks has really died, you might as well just take it out and put a new drive in, and then mdadm /dev/md0 --add /dev/sdX.

Cheers,

----------

## HeissFuss

It looks like one of your disks (/dev/sdh1 it seems) was kicked out as faulty.  It's a good idea to check the cables.  Check dmesg for I/O errors related to that device.  Also, run badblocks on /dev/sdh to see if there are any bad blocks that may have caused the disk to be marked faulty.  In your array, none of the drives would normally be marked spare since you are dedicating them all to the raid (rather than leaving one as a hot spare.)  The parity in RAID 5 is spread across all of the disks, taking up 1/n of each of the n disks. 

If the disk does come out as faulty, fail it and then remove it from the RAID.  You can then bring the array up in a degraded state with 3 disks (effectively operating as a RAID 0) until you have a disk to replace the failed one.

----------

## PietdeBoer

after a reboot, my array turned out to have only one drive active..

fileserver ~ # mdadm --examine --scan /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1

ARRAY /dev/md0 level=raid5 num-devices=4 UUID=bee0557b:82595891:05cb0d31:ecdeebd9

fileserver ~ # cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]

md1 : active raid5 sdc1[1] sde1[3] sdd1[2]

      1172126208 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU]

md0 : inactive sdg1[1](S)

      293049600 blocks

unused devices: <none>

fileserver ~ # mdadm --examine /dev/sdg1

/dev/sdg1:

          Magic : a92b4efc

        Version : 00.90.00

           UUID : bee0557b:82595891:05cb0d31:ecdeebd9

  Creation Time : Tue Mar 20 15:20:46 2007

     Raid Level : raid5

  Used Dev Size : 293049600 (279.47 GiB 300.08 GB)

     Array Size : 879148800 (838.42 GiB 900.25 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0

    Update Time : Sat Dec  1 00:54:54 2007

          State : clean

 Active Devices : 2

Working Devices : 3

 Failed Devices : 1

  Spare Devices : 1

       Checksum : 7bdcd98b - correct

         Events : 0.151412

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     1       8       97        1      active sync   /dev/sdg1

   0     0       0        0        0      removed

   1     1       8       97        1      active sync   /dev/sdg1

   2     2       0        0        2      faulty removed

   3     3       8      129        3      active sync

   4     4       8       81        4      spare

what could be the cause of this? i checked the power cables, even replaced them and my dmesg didnt give any io errrors about that drives

the disks are still listed in fdisk -l, i want to add them back to the array if thats needed to get it back up

i also bought some new spare disks, but i have to get the array up again in order to get the data

----------

## HeissFuss

Your sdb1 device is now missing from your md1 raid.

For md0, assemble the raid.  Remove any devices marked spare or faulty.  Then --re-add them.

```

mdadm -A /dev/md0 /dev/sdg1 /dev/sdf1 /dev/sdi1 /dev/sdh1

```

```

mdadm /dev/md0 --fail /dev/sdx --re-add /dev/sdx

```

However, I still think that one of your drive probably has physical issues.  Please run badblocks (a read-only test by default) on that device before attempting to add it back into your raid.

----------

## PietdeBoer

that did the trick

i readded the faulty drive, it fully recovered (took one day)

i was able to mount it in readonly mode (write could cause issues on broken drives i guess) and backup all my data.. now the array has been replaced with new and bigger disks and the data is placed back on the new array

thx for your help!

----------

