# Faulty disk in raid, can badblocks make it usable?

## zotalore

I recently experienced my first software raid (raid5) drive failure. Is there anyway to mark the bad blocks and re-add the drive to the raid?

Here's some more details. When I boot my system md reports the following:

```

md: Autodetecting RAID arrays.

md: Scanned 4 and added 4 devices.

md: autorun ...

md: considering hdd2 ...

md:  adding hdd2 ...

md:  adding hdc2 ...

md:  adding hdb2 ...

md:  adding hda2 ...

md: created md0

md: bind<hda2>

md: bind<hdb2>

md: bind<hdc2>

md: bind<hdd2>

md: running: <hdd2><hdc2><hdb2><hda2>

md: kicking non-fresh hdc2 from array!

md: unbind<hdc2>

md: export_rdev(hdc2)

raid5: device hdd2 operational as raid disk 3

raid5: device hdb2 operational as raid disk 2

raid5: device hda2 operational as raid disk 0

raid5: allocated 4274kB for md0

raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2

RAID5 conf printout:

 --- rd:4 wd:3

 disk 0, o:1, dev:hda2

 disk 2, o:1, dev:hdb2

 disk 3, o:1, dev:hdd2

md0: bitmap initialized from disk: read 11/11 pages, set 22589 bits

created bitmap (174 pages) for device md0

md: ... autorun DONE.

```

So it appears that hdc is faulty, even though mdstat does not label the drive with 'F', or do I have to mark it faulty in order to do that?

```

$ cat /proc/mdstat 

Personalities : [raid6] [raid5] [raid4] 

md0 : active raid5 hdd2[3] hdb2[2] hda2[0]

      2185980288 blocks level 5, 64k chunk, algorithm 2 [4/3] [U_UU]

      bitmap: 164/174 pages [656KB], 2048KB chunk

unused devices: <none>

```

mdadm reports my array as degraded with one drive removed:

```

 mdadm --detail /dev/md0

/dev/md0:

        Version : 0.90

  Creation Time : Sat Feb 23 20:22:25 2008

     Raid Level : raid5

     Array Size : 2185980288 (2084.71 GiB 2238.44 GB)

  Used Dev Size : 728660096 (694.90 GiB 746.15 GB)

   Raid Devices : 4

  Total Devices : 3

Preferred Minor : 0

    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Mar 29 09:51:54 2009

          State : active, degraded

 Active Devices : 3

Working Devices : 3

 Failed Devices : 0

  Spare Devices : 0

         Layout : left-symmetric

     Chunk Size : 64K

           UUID : bb305d69:b11c41b3:725dd80f:10527e77

         Events : 0.1340266

    Number   Major   Minor   RaidDevice State

       0       3        2        0      active sync   /dev/hda2

       1       0        0        1      removed

       2       3       66        2      active sync   /dev/hdb2

       3      22       66        3      active sync   /dev/hdd2

```

If I run badblocks (I had it running for 24 hours, but it seem to require 2-3 days until it will complete) and got the following messages in my kernel log

```

hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

hdc: task_in_intr: error=0x84 { DriveStatusError BadCRC }

ide: failed opcode was: unknown

hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

hdc: task_in_intr: error=0x84 { DriveStatusError BadCRC }

ide: failed opcode was: unknown

hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

hdc: task_in_intr: error=0x84 { DriveStatusError BadCRC }

ide: failed opcode was: unknown

hdc: task_in_intr: status=0x51 { DriveReady SeekComplete Error }

hdc: task_in_intr: error=0x84 { DriveStatusError BadCRC }

ide: failed opcode was: unknown

```

The smarctl selftest detects no errors:

```

 # smartctl -l selftest /dev/hdc

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%      7574         -

```

So is there any hope for this drive, or should I just replace it?

I also have an issue with my current kernel (I guess it's the old IDE driver) since the drives appears as hd and not sd. But I would like to get my full array back before I start building a new kernel.

----------

## zotalore

I forgot to add the examine output for the faulty drive, which does not indicate any faults:

```

# mdadm --examine /dev/hdc2

/dev/hdc2:

          Magic : a92b4efc

        Version : 0.90.00

           UUID : bb305d69:b11c41b3:725dd80f:10527e77

  Creation Time : Sat Feb 23 20:22:25 2008

     Raid Level : raid5

  Used Dev Size : 728660096 (694.90 GiB 746.15 GB)

     Array Size : 2185980288 (2084.71 GiB 2238.44 GB)

   Raid Devices : 4

  Total Devices : 4

Preferred Minor : 0

    Update Time : Thu Jan  1 03:24:13 2009

          State : clean

Internal Bitmap : present

 Active Devices : 4

Working Devices : 4

 Failed Devices : 0

  Spare Devices : 0

       Checksum : 54b45b6b - correct

         Events : 14

         Layout : left-symmetric

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State

this     1      22        2        1      active sync   /dev/hdc2

   0     0       3        2        0      active sync   /dev/hda2

   1     1      22        2        1      active sync   /dev/hdc2

   2     2       3       66        2      active sync   /dev/hdb2

   3     3      22       66        3      active sync   /dev/hdd2

```

----------

## Dairinin

Did you try readding your drive into array?

```
mdadm md0 --manage --add /dev/hdc2
```

BTW, CRC errors in dmesg are generaly because of a faulty cable or because of 40-wire IDE cable being used for >udma2 transfers.

Bad sectors in modern drives are remapped by drive itself. If you see sectors which you can not read/write, the drive is almost dead as it has already used all available sectors in a reallocation zone.

----------

## John R. Graham

I concur with Dairinin.  The other symptom you reported is seek failures, indicating either that the servo platter is damaged or else there's some sort of electromechanical failure looming.  Leaving that drive in the array is like playing with fire:  there's a good chance you'll get burned.

- John

----------

## zotalore

I've replaced the drive today. It seems like it will take several days for the reconstruction to complete. I'll inspect the drive on a different machine using tools supplied by the vendor.

----------

