# RAID5 stuck in recovery

## robak

Hi all,

I'm quite lost currently and I even don't know what to google.

This is my situation: I have a raid-5 running with 3 harddrives. One drive had a write error, so the array was set to degraded and the failed drive was automatically removed. I checked the drive with a long-test with smartmontools and checked for badblocks. All is fine with the drive, no errors were reported. So I added it back into the raid array but now the array is stuck in recovery-state.

cat /proc/mdstate output:

```

md0: active raid5 dm-2[4](S) sdd1[1] sdc1[0] sdb1[3]

    3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [U_U]

    [>........................] recovery = 0.0% (43136/1953382400) finish=234631.1min speed=138k/sec

    bitmap: 15/15 pages [60KB], 65536KB chunk

unused devices: <none>

```

cat /sys/block/md0/md/array_state:

```

write-pending

```

And I can't write to /sys/block/md0/md/sync_action. It says the resource is busy (which makes sense to me since the recovery is still running, but stuck)

Can someone please point me to the right direction. Any help is highly apprectiated

greetings

robak

----------

## eccerr0r

Well, according to the mdstat, the recovery is happening at a whole 138KB/sec, seems that drive is still having issues trying to write what md wants to do.

----------

## robak

Sorry, I forgot to mention that the write rate is decreasing towards 0. And the amount of recovered data iuntil it gets stuck is random after every reboot.

----------

## NeddySeagoon

robak,

As the drive is fine, it must be the interface.

Either the data cable is faulty or the SATA port is faulty or the data cable connection is faulty.

dmesg will show lots of error and interface resets.

The output of 

```
smartctl -a /dev...
```

would be good.

----------

## robak

dmesg is also clear. No errors what so ever.

----------

## robak

I don't know if this is related, but if I have all three raid drives plugged in, lvm is not responding anymore if I run 

```
vgdisplay
```

 for example.

----------

## NeddySeagoon

robak,

As dmesg is clear, it has to be interface errors.

Please post

```
smartctl -a /dev...
```

for the drive.

vgdisplay isn't getting any IO time.

----------

## robak

What does  *Quote:*   

> vgdisplay isn't getting any IO time.

  mean?

Here's the smartctl output. I checked the drive after the two logged errors appeared.

```

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.5.10-200.fc31.x86_64] (local build)

Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Purple

Device Model:     WDC WD20PURX-64P6ZY0

Serial Number:    WD-WCC4M2YC2XH3

LU WWN Device Id: 5 0014ee 26070c82a

Firmware Version: 80.00A80

User Capacity:    2.000.398.934.016 bytes [2,00 TB]

Sector Sizes:     512 bytes logical, 4096 bytes physical

Rotation Rate:    5400 rpm

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ACS-2 (minor revision not indicated)

SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is:    Mon Jun  8 22:15:22 2020 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                (27840) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        ( 281) minutes.

Conveyance self-test routine

recommended polling time:        (   5) minutes.

SCT capabilities:              (0x703d) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0027   176   169   021    Pre-fail  Always       -       4158

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       124

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   042   041   000    Old_age   Always       -       42889

 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       123

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       99

193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       66

194 Temperature_Celsius     0x0022   112   096   000    Old_age   Always       -       35

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1

ATA Error Count: 2

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 42285 hours (1761 days + 21 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 40 00 38 0a e0  Error: UNC 64 sectors at LBA = 0x000a3800 = 669696

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 40 00 38 0a e0 08   2d+05:25:42.597  READ DMA

  ca 00 d0 00 30 0a e0 08   2d+05:25:42.595  WRITE DMA

Error 1 occurred at disk power-on lifetime: 42285 hours (1761 days + 21 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 38 58 e4 06 e0  Error: UNC 56 sectors at LBA = 0x0006e458 = 451672

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 38 58 e4 06 e0 08   2d+05:25:21.801  READ DMA

  ca 00 40 00 e8 06 e0 08   2d+05:25:21.727  WRITE DMA

  ca 00 40 00 dc 06 e0 08   2d+05:25:21.676  WRITE DMA

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%     42841         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

----------

## NeddySeagoon

robak,

That smartctl output is contradictory.

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  9 Power_On_Hours          0x0032   042   041   000    Old_age   Always       -       42889

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
```

That says that there are no errors, nor are there any errors on the interface, thats the UDMA_CRC_Error_Count.

The long test completed with no errors too.

```
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%     42841 
```

The detailed log shows two errors,

```
  40 51 40 00 38 0a e0  Error: UNC 64 sectors at LBA = 0x000a3800 = 669696

  40 51 38 58 e4 06 e0  Error: UNC 56 sectors at LBA = 0x0006e458 = 451672 
```

UNC means UNCorrectable.

When an uncorrectable read error happens, the Current_Pending_Sector is incremented. That appears to not have happened.

Either the drive was able to read those sectors within the retry count limit, so they appeared to be good, or the drive is not remapping failed sectors as it should.

That says that the drive is good ... but its not working that way.

----------

## robak

Could it be a software problem?

----------

## NeddySeagoon

robak,

Anything is possible but a software problem would not be unique to you, so unlikely.

Can you swap the drive to a different SATA port and or a replacement SATA data cable?

Only change one thing at a time or you won't know what the problem was.

----------

## robak

I did. I changed all cables (one by one) and lastly replaced the drive that had the error. Without success.

With the new drive, I added it to the array with

```
mdadm /dev/md0 --add /dev/sdd1
```

The command didn't return (waited for couple minutes), so I did a reboot. After the reboot the raid started the recovery but is stuck aber some MB, same behaviour as before.

----------

## NeddySeagoon

robak,

I hate to suggest this but maybe its a problem with one of the remaining drives in the array?

What does their 

```
smartctl -a
```

 say?

Before you can write the redundancy data, you have to read the remaining drives ...

----------

## robak

/dev/sdb:

```

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.5.10-200.fc31.x86_64] (local build)

Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Purple

Device Model:     WDC WD20PURX-64P6ZY0

Serial Number:    WD-WCC4M2YC2H5H

LU WWN Device Id: 5 0014ee 20b1b9a9a

Firmware Version: 80.00A80

User Capacity:    2.000.398.934.016 bytes [2,00 TB]

Sector Sizes:     512 bytes logical, 4096 bytes physical

Rotation Rate:    5400 rpm

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ACS-2 (minor revision not indicated)

SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is:    Tue Jun  9 00:19:05 2020 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                (26580) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        ( 268) minutes.

Conveyance self-test routine

recommended polling time:        (   5) minutes.

SCT capabilities:              (0x703d) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0027   177   172   021    Pre-fail  Always       -       4116

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       57

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   040   040   000    Old_age   Always       -       44088

 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       56

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       31

193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       67

194 Temperature_Celsius     0x0022   106   095   000    Old_age   Always       -       41

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

/dev/sdc:

```

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.5.10-200.fc31.x86_64] (local build)

Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar Green (AF)

Device Model:     WDC WD20EARS-00MVWB1

Serial Number:    WD-WCAZA9229234

LU WWN Device Id: 5 0014ee 2b0d45d5b

Firmware Version: 51.0AB51

User Capacity:    2.000.398.934.016 bytes [2,00 TB]

Sector Sizes:     512 bytes logical, 4096 bytes physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

SATA Version is:  SATA 2.6, 3.0 Gb/s

Local Time is:    Tue Jun  9 00:20:49 2020 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

                                        was completed without error.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                (38580) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        ( 372) minutes.

Conveyance self-test routine

recommended polling time:        (   5) minutes.

SCT capabilities:              (0x3035) SCT Status supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       162

  3 Spin_Up_Time            0x0027   173   168   021    Pre-fail  Always       -       6333

  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1645

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   025   025   000    Old_age   Always       -       55084

 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       99

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       62

193 Load_Cycle_Count        0x0032   019   019   000    Old_age   Always       -       543958

194 Temperature_Celsius     0x0022   113   097   000    Old_age   Always       -       37

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       79

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%      8543         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

Is the second one broken? I just realized the values for Multi_Zone_Error_Rate, Current_Pending_Sector, Raw_Read_Error_Rate

----------

## NeddySeagoon

robak,

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1 
```

That's a bad sign. The drive knows about one sector that it would relocate, if only it could read it.

There may be more. Any unreadable sectors that the drive has not yet tried to read are not counted.

If you don't have a backup of the raid set. Your next step is to use ddrescue to make an image of /dev/sdc.

ddrescue is dd with error handling. It will try very hard to coax one last read from the problem sector(s).

We don't know what is in that unreadable sector.  If its a block of a file, the file cannot be read.

If its a block of a directory, that directory cannot be read and any child directories cannot be accessed. If its a metadata block ... it gets worse, maybe. There is backups of some metadata.

When you use ddrescue, you must create the log file. That's required so that you can rerun ddrescue several times with different options.

It looks at the log to know what data has already been recovered.

If ddrescue reads that faulty block, the drive may look good again, as the block could be relocated but would you trust a drive that cannot read it own writing? 

Once you have an image, use the image in place of the current /dev/sdc and go back to trying to rebuild the raid set. 

Put the original /dev/sdc away and keep it safe.

Unfortunately, I've been through this with a five drive raid5 set. Two drives got errors only 15 minuets apart and were kicked out the raid set.

-- edit --

Stop the raid set before you run ddrescue.

----------

## robak

Thank you!

Is it also possible to replace the drive and let the raid5 recover the data?

--edit--

The raid is fully encrypted.

----------

## NeddySeagoon

robak,

After the new sdc is in place, you should be able to go back to where you left off with rebuilding the raid set.

You will probably need to coax ddrescue quite a bit.

Once you have run ddrescue with the default settings, post the log, so we can see what remains to be recovered.

I do not expect it to get everything at the first attempt.

This step will give you sdb unchanged, a new sdc, with the same data as the old sdc (if we get it all) and sda? that needs to be rebuilt.

-- edit --

ddrescue is a low level block copy. It neither knows nor cares what is in the blocks it copies.

----------

## eccerr0r

My currently healthy 0.9 RAID5 looks like this in /proc/mdstat:

```
md1 : active raid5 sdc2[0] sdb2[2] sda2[1]

      3901440000 blocks level 5, 512k chunk, algorithm 2 [3/3] [UUU]

      bitmap: 3/15 pages [12KB], 65536KB chunk

```

Yours looks like:

```
md0: active raid5 dm-2[4](S) sdd1[1] sdc1[0] sdb1[3]

    3906764800 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [U_U]

    [>........................] recovery = 0.0% (43136/1953382400) finish=234631.1min speed=138k/sec

    bitmap: 15/15 pages [60KB], 65536KB chunk 

```

How did you get dm-2 as a (S)pare member 4 of the RAID?

Probably not a big issue but curious what happened here...

----------

## robak

NeddySeagoon,

ddrescue could read all the dada, no bad blocks were found. After a reboot I now have a active, degraded raid running (thats what mdadm -D /dev/md0 says). Whats the next step?

eccerr0r,

true. I can't tell what happend here. Maybe it's a weired fedora thing.

----------

## NeddySeagoon

robak,

Put the copy in place of the failing sdc and attempt to rebuild the raid. Put the old sdc away safely.

That dm-2 in the raid as a spare is scary. You mentioned that its a crypto raid set so device-mapper will be used for LUKS.

You should remove it from the raid set before you do much more.

----------

## robak

I can't add or remove any drives from the raid. mdadm is not returning. Same as vgdisplay.

Is there something wrong with my lvm setup?

--edit--

I can't even stop the raid to reassemble it.

```

[root@localhost robak]# mdadm --stop /dev/md0

mdadm: Cannot get exclusive access to /dev/md0:Perhaps a running process, mounted filesystem or active volume group?

[root@localhost robak]# lsof | grep md0

md0_raid5  739                    root  cwd       DIR              253,0       224        128 /

md0_raid5  739                    root  rtd       DIR              253,0       224        128 /

md0_raid5  739                    root  txt   unknown                                         /proc/739/exe

[root@localhost robak]# ps aux | grep 739

root         739  0.0  0.0      0     0 ?        D    12:23   0:00 [md0_raid5]

```

----------

## NeddySeagoon

robak,

If the volumes in te raid set are in use, you need to stop the entire stack in the reverse order to building the stack.

Unmount any filesystems

vgchange the logical volume group,

Then you can maipulate the raid set.

That's not required for all raid set operations but its safe.

If you have root on that raid set, you need to boot some other way as root is always in use.

----------

## robak

Sorry, but I don't understand why the raid is in use. Nothing is mounted from the drives. Even if I boot via a live-cd I get the same messages.

Besides, if I boot a live-cd I get a LVM timeout error.

----------

## eccerr0r

IIRC spares you still have to mdadm --fail it first before removing, but not sure if this is the issue at hand.

I don't currently see how it could affect md, because if it's not the right size, md shouldn't actually use the disk despite it being part of the array...

I just hope there's no funny race condition being lost each time when you have a möbius loop here...

----------

## robak

I think I understand now.

```

root@solus /home/live # vgdisplay -v

  --- Volume group ---

  VG Name               vg

  System ID             

  Format                lvm2

  Metadata Areas        1

  Metadata Sequence No  2

  VG Access             read/write

  VG Status             resizable

  MAX LV                0

  Cur LV                1

  Open LV               1

  Max PV                0

  Cur PV                1

  Act PV                1

  VG Size               <3.64 TiB

  PE Size               4.00 MiB

  Total PE              953799

  Alloc PE / Size       953799 / <3.64 TiB

  Free  PE / Size       0 / 0   

  VG UUID               pMxYHd-JvDx-w1gB-y5XS-1QHd-092s-6nm3Bl

   

  --- Logical volume ---

  LV Path                /dev/vg/multimedia

  LV Name                multimedia

  VG Name                vg

  LV UUID                Xc67Fi-OB4n-Fber-LTxG-v9a4-Ygtl-kHif2U

  LV Write Access        read/write

  LV Creation host, time localhost.localdomain, 2015-03-08 22:28:21 +0000

  LV Status              available

  # open                 2

  LV Size                <3.64 TiB

  Current LE             953799

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:1

   

  --- Physical volumes ---

  PV Name               /dev/md127     

  PV UUID               ymcgRq-bv2O-Pkbz-jYKM-kwAX-Mwb2-bU4ez8

  PV Status             allocatable

  Total PE / Free PE    953799 / 0

```

The raid (md127 in my live-cd environment) is part of a logical volume, which is part of a logical group. Therefor I can't change anything.

NeddySeagoon: can you please guide my step by step on changing the lv-group?

----------

## NeddySeagoon

robak,

Your 'stack' starts off with the three partitions donated to  /dev/md127 so make the raid set.

/dev/md127 is then donated to LVM as a physical volume.

That physical volume is donated to the Volume Group called vg

Inside that volume group is the logical volume  /dev/vg/multimedia 

You may not stop the raid until everything that uses it is no longer active.

You also have to stop the layers in the right order.

First unmount  /dev/vg/multimedia.

Then make the Volume Group vg inactive.

Now you can stop the raid.

----------

## robak

So, I have my raid running again. I had to reinstall my system, can't tell why. For some reason the password of my LUKS drive inside the raid isn't working anymore.

TL;DR

For some reason, LVM was in a dead lock. Even if I booted a live-cd. I couldn't do any changes to the volume group or logical volume. Therefor I couldn't do any changes to the raid, also. After the reinstall of fedora two things worked:

the suspicious spare dm-2 was gone. 

```
mdadm /dev/md0 --add
```

 worked immediately and the raid started the recovery. With success after about 8h. One last thing I needed to do is update the PV header. When I tried to add a new volume group for my raid I got the message 

```
WARNING: PV /dev/md0 in VG vg is using an old PV header, modify the VG to update
```

 via 

```
vgck --updatemetadata vg
```

To all people reading this with the same problem I had: sorry I couldn't find a better solution. I think I ran into this by upgrading fedora from version to version since the last years.

Thank you for all the help NeddySeagoon and eccerr0r !

----------

## NeddySeagoon

robak,

There is one more thing you can do for us here.

Post the smartctl -a output for the apparently failed drive that ddrescue recovered all the data from.

----------

## robak

That is this one:

 *robak wrote:*   

> 
> 
> /dev/sdc:
> 
> ```
> ...

 

----------

## robak

One last thing:

The dm-2 spare returned after vgck --updatemetadata

So I think it's correct in some way.

----------

## NeddySeagoon

robak,

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 

  9 Power_On_Hours          0x0032   025   025   000    Old_age   Always       -       55084 

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
```

That looks unlikely.  A 7200 RPM drive will average about 80MB/sec over a full surface read, That's capacity independent. Its the head/platter data rate limit.   Your 5400 RPM drive will be a little slower.

2TB at 80MB/sec is just under 7 Hours. Both your posts for /dev/sdc have 55084 power on hours.

After the full surface read performed by ddrescue, the power on hours should have increased by about 7 hours.

The interesting point is what did the drive do when it encountered the Current_Pending_Sector and maybe others that it can't read?

That ddrescue recovered all the data is good. You did look in the log?

The changes in the drives internal error state caused by ddrescue are of interest here.

----------

## robak

I reposted the old log of smartctl. I'll do a new read and post it.

The ddrescue log reported no bad blocks and no read errors. It ran over night so I can't tell if the drive did any sounds.

----------

## eccerr0r

Something is fishy, vgck should be mucking with the volumes/volume groups, but md is one layer above it - and lvm should not be touching md...

Somehow you got the block that md uses to identify the RAID to think that dm-2 is part of the raid when it's not.  What do you have dm-2 - is it valuable information, how big is it?

----------

## robak

This is the current smart status of the drive:

```

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       167

  3 Spin_Up_Time            0x0027   171   168   021    Pre-fail  Always       -       6425

  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1648

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   025   025   000    Old_age   Always       -       55093

 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       102

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       64

193 Load_Cycle_Count        0x0032   019   019   000    Old_age   Always       -       543973

194 Temperature_Celsius     0x0022   125   097   000    Old_age   Always       -       25

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       79

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%      8543         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

----------

## NeddySeagoon

robak,

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0 

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1 
```

It looks like ddrescue got the data but the drive did not reallocate the failing sector.

We don't know that its the same sector being flagged before and after ddrescue.

The key take away is that the drive cannot read its own writing so is not fit for further use.

It unlikely that with  55093 running hours, the drive is still under warranty but if it is, that smartctl output is grounds for a warranty replacement.

----------

## robak

Warrenty is gone, the drive is simply to old.

So you think the luks key is a victim of the failing drive?

----------

## NeddySeagoon

robak,

In a raid5 set, you can get your data back if any one drive fails ... almost.

There are a few transient problems with software raid5, such as the write hole.

----------

## steve_v

 *NeddySeagoon wrote:*   

> It looks like ddrescue got the data but the drive did not reallocate the failing sector.

 

IME, often a drive won't reallocate a pending sector until an attempt is made to write to it.

Some will do it during an offline (smartctl -t offline /dev/sdx) self-test, others need more encouragement.

 *NeddySeagoon wrote:*   

> The key take away is that the drive cannot read its own writing so is not fit for further use.

 

Indeed. It might be an isolated media defect, or it might be the first sign of a progressive failure. One could get anything from days to years out of it, but it's certainly not fit for storing anything important. 

That WD20EARS is not really an appropriate drive for a RAID anyway. "Green" drives are universally horrible.

I have some personal experience with that particular WD model, and while not quite as bad as Seagate's "Barracuda LP" series, none of it is remotely good. They're slow, they're unreliable, and their error reporting is substandard. The only redeeming feature is that they're cheap, but then you only get a 3 year warranty vs the 5 years on a RAID/NAS rated drive.

----------

## robak

steve_v

yeah I agree. The EARS drive was left here so I used it. I was hoping that at least smartmon would report early if the drive was dying so I got replace it. And here comes my fault: smartd was running and detected the drive failure, I just missed to configure it to send emails :/

Anyway, I use WD purlpe drives now.

Besides: I know this is off-topic, but do I NEED LVM for a raid? I mean, I don't need to shrink or extend the partition in any way so I don't see any advantages in using LVM.

----------

## NeddySeagoon

robak,

mdadm can do raid on its own.

LVM can do raid on its own.

I'm a great believer in do one thing and do it well, so I use LVM (without raid) on top of mdadm raid5 because I like growing 'partitions'. 

Once upon a time, it was non possible to partition a raid set.

So you made partitions and several raid sets.

Partitioning /dev/md* has been supported for a few years now. 

That's answered the question the way you asked it.

You mentioned LUKS further up the thread. 

You do need LVM for LUKS as mdadm will not do that.

----------

## robak

Perfect answer. Thank you!

----------

## C5ace

robak:

My choice for hard drives is Hitachi HSTC. Have 4 x 500GB running as RAID 5 since 2007 in my desktop. 

No LVM. Just boot, root, home and VirtualBox and a WD Black 2TB for backups and videos

https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-sata-series/data-sheet-ultrastar-7k4000.pdf

Use the best quality SATA cables you can get.

----------

