# [SOLVED] Sata Hard drive dying?

## walkingcorpse

Here some terminal output:

```
[ 316.274677] ata1: link is slow to respond, please be patient (ready=0)

[ 320.957670] ata1: COMRESET failed (errno=-16)

[ 323.974683] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

[ 329.335678] ata1.00: qc timeout (cmd 0xef)

[ 329.335686] ata1.00: failed to enable AA (error_mask=0x4)

[ 329.335688] ata1.00: revalidation failed (errno=-5)

[ 334.694674] ata1: link is slow to respond, please be patient (ready=0)

[ 339.374662] ata1: COMRESET failed (errno=-16)

[ 344.733674] ata1: link is slow to respond, please be patient (ready=0)

[ 345.201675] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)

[ 360.568421] ata1.00: qc timeout (cmd 0xef)

[ 360.568428] ata1.00: failed to set xfermode (err_mask=0x4)

[ 360.568431] ata1: limiting SATA link speed to 1.5 Gbps

[ 360.568432] ata1.00: limiting speed to UDMA/133:PIO3

[ 365.882365] ata1: link is slow to respond, please be patient (ready=0)

[ 366.922357] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

[ 366.923516] ata1.00: configured for UDMA/133
```

S.M.A.R.T. --all

```
smartctl 6.6 2017-11-05 r4594 x86_64-linux-4.19.0-gentoo (local build)

Copyright (C 2002-17, Bruce Allen, Christian Franke,) www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family: Seagate Barracuda 3.5

Device Model: ST1000DM010-2EP102

Serial Number: Z9A9RCFT

LU WWN Device Id: 5 000c50 0a1cd0b66

Firmware Version: CC43

User Capacity: 1,000,204,886,016 bytes [1.00 TB]

Sector Sizes: 512 bytes logical, 4096 bytes physical

Rotation Rate: 7200 rpm

Form Factor: 3.5 inches

Device is: In smartctl database [for details use: -P show]

ATA Version is: ATA8-ACS T13/1699-D revision 4

SATA Version is: SATA 3.0, 6.0 Gb/s (current: 1.5 Gb/s)

Local Time is: Thu Nov 8 19:38:41 2018 CET

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

[b]SMART overall-health self-assessment test result: PASSED[/b]

General SMART Values:

Offline data collection status: (0x00   Offline data collection activity)

was never started.

Auto Offline Data Collection: Disabled.

Self-test execution status: ( 33   The self-test routine was interrupted)

by the host with a hard or soft reset.

Total time to complete Offline

data collection: ( 0 seconds.)

Offline data collection

capabilities: (0x73 SMART execute Offline immediate.)

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

No Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003   Saves SMART data before entering)

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01   Error logging supported.)

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 1 minutes.)

Extended self-test routine

recommended polling time: ( 109 minutes.)

Conveyance self-test routine

recommended polling time: ( 2 minutes.)

SCT capabilities: (0x1085   SCT Status supported.)

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x000f 081 039 006 Pre-fail Always - 166371877

3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0

4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2435

[b]5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0[/b]

7 Seek_Error_Rate 0x000f 079 060 045 Pre-fail Always - 87721798

9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 7172

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0

12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 757

183 Runtime_Bad_Block 0x0032 097 097 000 Old_age Always - 3

184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0

187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 8676

188 Command_Timeout 0x0032 099 098 000 Old_age Always - 16 22 29

189 High_Fly_Writes 0x003a 015 015 000 Old_age Always - 85

190 Airflow_Temperature_Cel 0x0022 060 053 040 Old_age Always - 40 (Min/Max 30/40)

193 Load_Cycle_Count 0x0032 099 099 000 Old_age Always - 2443

194 Temperature_Celsius 0x0022 040 012 000 Old_age Always - 40 (0 12 0 0 0)

[b]195 Hardware_ECC_Recovered 0x001a 003 001 000 Old_age Always - 166371877

[/b]

197 Current_Pending_Sector 0x0012 100 097 000 Old_age Always - 0

[b]198 Offline_Uncorrectable 0x0010 100 097 000 Old_age Offline - 0

[/b]

[b]199 UDMA_CRC_Error_Count 0x003e 200 178 000 Old_age Always - 105[/b]

240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 6042h+04m+55.947s

241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 71943152182

242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 85390698458

SMART Error Log Version: 1

ATA Error Count: 8676 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 8676 occurred at disk power-on lifetime: 6869 hours (286 days + 5 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

60 00 40 ff ff ff 4f 00 05:38:28.156 READ FPDMA QUEUED

ef 02 00 00 00 00 a0 00 05:38:27.216 SET FEATURES [Enable write cache]

00 00 00 00 00 00 00 ff 05:38:27.156 NOP [Abort queued commands]

60 00 40 ff ff ff 4f 00 05:38:24.463 READ FPDMA QUEUED

ef 02 00 00 00 00 a0 00 05:38:23.524 SET FEATURES [Enable write cache]

Error 8675 occurred at disk power-on lifetime: 6869 hours (286 days + 5 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

60 00 40 ff ff ff 4f 00 05:38:24.463 READ FPDMA QUEUED

ef 02 00 00 00 00 a0 00 05:38:23.524 SET FEATURES [Enable write cache]

00 00 00 00 00 00 00 ff 05:38:23.463 NOP [Abort queued commands]

60 00 40 ff ff ff 4f 00 05:38:20.812 READ FPDMA QUEUED

ef 02 00 00 00 00 a0 00 05:38:19.873 SET FEATURES [Enable write cache]

Error 8674 occurred at disk power-on lifetime: 6869 hours (286 days + 5 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

60 00 40 ff ff ff 4f 00 05:38:20.812 READ FPDMA QUEUED

ef 02 00 00 00 00 a0 00 05:38:19.873 SET FEATURES [Enable write cache]

00 00 00 00 00 00 00 ff 05:38:19.812 NOP [Abort queued commands]

60 00 40 ff ff ff 4f 00 05:38:17.163 READ FPDMA QUEUED

61 00 08 ff ff ff 4f 00 05:38:16.217 WRITE FPDMA QUEUED

Error 8673 occurred at disk power-on lifetime: 6869 hours (286 days + 5 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

60 00 40 ff ff ff 4f 00 05:38:17.163 READ FPDMA QUEUED

61 00 08 ff ff ff 4f 00 05:38:16.217 WRITE FPDMA QUEUED

ea 00 00 00 00 00 a0 00 05:38:16.204 FLUSH CACHE EXT

61 00 08 ff ff ff 4f 00 05:38:16.203 WRITE FPDMA QUEUED

ef 02 00 00 00 00 a0 00 05:38:16.203 SET FEATURES [Enable write cache]

Error 8672 occurred at disk power-on lifetime: 6869 hours (286 days + 5 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:

CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name

-- -- -- -- -- -- -- -- ---------------- --------------------

60 00 40 ff ff ff 4f 00 05:38:13.594 READ FPDMA QUEUED

ef 02 00 00 00 00 a0 00 05:38:13.570 SET FEATURES [Enable write cache]

00 00 00 00 00 00 00 ff 05:38:13.509 NOP [Abort queued commands]

60 00 58 ff ff ff 4f 00 05:38:10.827 READ FPDMA QUEUED

61 00 08 ff ff ff 4f 00 05:38:10.329 WRITE FPDMA QUEUED

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours LBA_of_first_error)

# 1 Short offline Interrupted (host reset 00% 7171 -)

# 2 Short offline Completed: read failure 90% 4554 1405442960

# 3 Short offline Completed: read failure 90% 4554 1405442960

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0:)

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
```

----------

## NeddySeagoon

walkingcorpse,

Filling in the Seagate Warranty page and using the UK (my location) it tells 

 *Quote:*   

> In Warranty  
> 
> Expiration 25-Mar-2019  

 

Fill in that page with your location (I have no idea where you are), if you get the same response an me, return the drive for a warranty replacement.

```
Error 8676 occurred at disk power-on lifetime: 6869 hours (286 days + 5 hours) 

# 2 Short offline Completed: read failure 90% 4554 1405442960

# 3 Short offline Completed: read failure 90% 4554 1405442960 
```

is fairly damming.

Some HDD vendors will ship your warranty replacement in advance of receiving your faulty unit, so you gen some time with both drives, which can give you a chance to save your data.

----------

## walkingcorpse

NeddySeagoon,

Thanks for responding, it is a honor to have you in my thread, your reputation precedes you.

Back on the problem, I tried to plug the sata and power cables again on the drive but it gave the same behaviors so I tried the drive on another machine and it was working perfectly, therefore I tried it again on my main machine and this time it is working and passes tests:

# 1  Short offline       Completed without error       00%      7176         -

Its weird, I had similar problems in the past and I have always fixed them by re-plugging the cables; but this time the problem didn't disappear after the first replug.

Do you think this issue could have other consequences in the future? I don't know if I can return the drive because I didn't buy it.

----------

## walkingcorpse

Interesting, I have found some weird things with S.M.A.R.T. in my second drive (that is working):

```
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-gentoo] (local build)

Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Blue

Device Model:     WDC WD5000AAKX-001CA0

Serial Number:    WD-WCAYUJ751015

LU WWN Device Id: 5 0014ee 103cf8319

Firmware Version: 15.01H15

User Capacity:    500,107,862,016 bytes [500 GB]

Sector Size:      512 bytes logical/physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is:    Fri Nov  9 01:06:51 2018 CET

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x82)   Offline data collection activity

               was completed without error.

               Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 121)   The previous self-test completed having

               the read element of the test failed.

Total time to complete Offline 

data collection:       ( 8160) seconds.

Offline data collection

capabilities:           (0x7b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     (  83) minutes.

Conveyance self-test routine

recommended polling time:     (   5) minutes.

SCT capabilities:           (0x3037)   SCT Status supported.

               SCT Feature Control supported.

               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       4937

  3 Spin_Up_Time            0x0027   142   139   021    Pre-fail  Always       -       3900

  4 Start_Stop_Count        0x0032   095   095   000    Old_age   Always       -       5516

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       6816

 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   096   096   000    Old_age   Always       -       4760

192 Power-Off_Retract_Count 0x0032   197   197   000    Old_age   Always       -       2930

193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2585

194 Temperature_Celsius     0x0022   104   091   000    Old_age   Always       -       39

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

[b]197 Current_Pending_Sector  0x0032   200   198   000    Old_age   Always       -       27

198 Offline_Uncorrectable   0x0030   200   198   000    Old_age   Offline      -       27

199 UDMA_CRC_Error_Count    0x0032   200   196   000    Old_age   Always       -       9629

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       25[/b]

SMART Error Log Version: 1

ATA Error Count: 686 (device log contains only the most recent five errors)

   CR = Command Register [HEX]

   FR = Features Register [HEX]

   SC = Sector Count Register [HEX]

   SN = Sector Number Register [HEX]

   CL = Cylinder Low Register [HEX]

   CH = Cylinder High Register [HEX]

   DH = Device/Head Register [HEX]

   DC = Device Command Register [HEX]

   ER = Error register [HEX]

   ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 686 occurred at disk power-on lifetime: 1473 hours (61 days + 9 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 c4 04 00 e0  Error: UNC at LBA = 0x000004c4 = 1220

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 00 e0 03 00 e0 08      00:19:01.065  READ DMA

  c8 00 00 e0 02 00 e0 08      00:19:01.065  READ DMA

  c8 00 00 e0 01 00 e0 08      00:19:01.064  READ DMA

  c8 00 00 e0 00 00 e0 08      00:19:01.063  READ DMA

  c8 00 58 88 00 00 e0 08      00:19:01.063  READ DMA

Error 685 occurred at disk power-on lifetime: 1473 hours (61 days + 9 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 61 01 00 e0  Error: UNC 8 sectors at LBA = 0x00000161 = 353

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 60 01 00 e0 08      00:15:50.833  READ DMA

  ec 00 00 00 00 00 a0 08      00:15:50.831  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 08      00:15:50.828  SET FEATURES [Set transfer mode]

Error 684 occurred at disk power-on lifetime: 1473 hours (61 days + 9 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 61 01 00 e0  Error: UNC 8 sectors at LBA = 0x00000161 = 353

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 60 01 00 e0 08      00:15:49.185  READ DMA

  c8 00 08 58 01 00 e0 08      00:15:49.185  READ DMA

  c8 00 08 50 01 00 e0 08      00:15:49.185  READ DMA

  c8 00 08 48 01 00 e0 08      00:15:49.185  READ DMA

  c8 00 08 40 01 00 e0 08      00:15:49.185  READ DMA

Error 683 occurred at disk power-on lifetime: 1473 hours (61 days + 9 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 f0 61 01 00 e0  Error: UNC 240 sectors at LBA = 0x00000161 = 353

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 f0 10 01 00 e0 08      00:15:47.536  READ DMA

  c8 00 90 80 00 00 e0 08      00:15:47.535  READ DMA

  c8 00 38 40 00 00 e0 08      00:15:47.535  READ DMA

  c8 00 08 10 00 00 e0 08      00:15:47.535  READ DMA

  c8 00 18 20 00 00 e0 08      00:15:47.535  READ DMA

Error 682 occurred at disk power-on lifetime: 1473 hours (61 days + 9 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 61 01 00 e0  Error: UNC 8 sectors at LBA = 0x00000161 = 353

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 60 01 00 e0 08      00:15:45.880  READ DMA

  ec 00 00 00 00 00 a0 08      00:15:45.878  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 08      00:15:45.875  SET FEATURES [Set transfer mode]

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

[b]# 1  Short offline       Completed: read failure       90%      6816         1220

# 2  Short offline       Completed: read failure       90%      6816         1220[/b]

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
```

Swapped quote tags for code tags to preserve the monospace layout -- NeddySeagoon

----------

## NeddySeagoon

walkingcorpse,

Those recorded errors are all internal to the drive. Seagate won't try to argue that the drive is not faulty, with 

```
Error 8676 occurred
```

in the log.

As plugging the data and power cables in appears to fix it, that suggests a faulty solder joint on the electronics board that the cables plug into.

Every time you disturb the faulty joint (by plugging the data and power cables) it works for a while. 

That would explain the zero pending sectors and reallocated sector count too.

It will get worse.

Follow the Seagate Warranty process.  If you want your replacement before your return your faulty drive, they will want a credit card number. Just so that they can bill the card if the drive is not returned.

You will not got a new drive, nor will you get your drive back. You will get somebody elses reconditioned drive, that was a previous warranty failure.

I have never dealt with Seagate but its typically a fully automated process.

You fill in the web form, get a RMA (Returned Material Authorisation) and give a credit card number.

Your replacement is shipped to you.

You mave your data off the failed drive and return the drive. You pay post and insurance.

Your credit card is never charged.

-- edit --

After fixing your code tags.

The second drive is in a bad way.

```
197 Current_Pending_Sector  0x0032   200   198   000    Old_age   Always       -       27
```

That's 27 sectors it can no longer read that it knows about and none have been relocated.

If its under warranty still, get it replaced.

The error log suggests it has a run of bad sectors.

I have had WD drives replaced. The process is as I described above.

----------

## The Main Man

I have similar problems, going on and off for over a year now, I still don't know where the problem is.

I suspect motherboard or cables, though I don't think it's the cables, could be those connectors on drives but I don't think so, both of them to go bad at the same time (hard to believe)

For one drive I had this errors :

```
Good    C5 current-pending-sector           200   200         0 000000000000 |

Good    C6 offline-uncorrectable            200   200         0 000000000000 |
```

I mean now they are good but those were the ones which had error until I deleted few files at which point those errors went away.

Second drive still has one error, three weeks after I "solved" the problem by re-attaching cables.

```
Caution 05 reallocated-sector-count         100   100        36 000000000038

```

btw I use 

```
sys-apps/crazydiskinfo
```

----------

## NeddySeagoon

kajzer,

The errors went away because the drive is no longer trying to read those sectors.

Please post the entire smart log. smantctl -a is good.

I don't know crazydiskinfo

----------

## The Main Man

NeddySeagoon,

sda

```
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.15.14-gentoo] (local build)

Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Seagate Barracuda 7200.14 (AF)

Device Model:     ST500DM002-1BD142

Serial Number:    Z2A9SWVN

LU WWN Device Id: 5 000c50 035ceb743

Firmware Version: KC43

User Capacity:    500,107,862,016 bytes [500 GB]

Sector Sizes:     512 bytes logical, 4096 bytes physical

Rotation Rate:    7200 rpm

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS T13/1699-D revision 4

SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)

Local Time is:    Sat Nov 10 13:32:37 2018 CET

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x82)   Offline data collection activity

               was completed without error.

               Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:       (  592) seconds.

Offline data collection

capabilities:           (0x7b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   1) minutes.

Extended self-test routine

recommended polling time:     (  75) minutes.

Conveyance self-test routine

recommended polling time:     (   2) minutes.

SCT capabilities:           (0x303f)   SCT Status supported.

               SCT Error Recovery Control supported.

               SCT Feature Control supported.

               SCT Data Table supported.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   114   099   006    Pre-fail  Always       -       74351920

  3 Spin_Up_Time            0x0003   100   099   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       509

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       56

  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       604630450

  9 Power_On_Hours          0x0032   030   030   000    Old_age   Always       -       61816

 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       165

183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0

184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

188 Command_Timeout         0x0032   100   092   000    Old_age   Always       -       3 3 155

189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   070   056   045    Old_age   Always       -       30 (Min/Max 27/33)

194 Temperature_Celsius     0x0022   030   044   000    Old_age   Always       -       30 (0 19 0 0 0)

195 Hardware_ECC_Recovered  0x001a   049   021   000    Old_age   Always       -       74351920

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       61816h+09m+48.898s

241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       689434060

242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3275501626

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%     56452         -

# 2  Short offline       Completed without error       00%     56451         -

# 3  Short offline       Completed without error       00%     50987         -

# 4  Short offline       Completed without error       00%     50670         -

# 5  Short offline       Aborted by host               90%     50670         -

# 6  Short offline       Completed without error       00%     50669         -

# 7  Short offline       Completed without error       00%     50659         -

# 8  Short offline       Completed without error       00%     50659         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

sdb

```
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.15.14-gentoo] (local build)

Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar Blue (SATA)

Device Model:     WDC WD3200AAKS-00B3A0

Serial Number:    WD-WMAT10341971

LU WWN Device Id: 5 0014ee 00078c882

Firmware Version: 01.03A01

User Capacity:    320,072,933,376 bytes [320 GB]

Sector Size:      512 bytes logical/physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

SATA Version is:  SATA 2.5, 3.0 Gb/s

Local Time is:    Sat Nov 10 13:32:19 2018 CET

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x82)   Offline data collection activity

               was completed without error.

               Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:       ( 6000) seconds.

Offline data collection

capabilities:           (0x7b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     (  73) minutes.

Conveyance self-test routine

recommended polling time:     (   5) minutes.

SCT capabilities:           (0x303f)   SCT Status supported.

               SCT Error Recovery Control supported.

               SCT Feature Control supported.

               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       10

  3 Spin_Up_Time            0x0003   189   189   021    Pre-fail  Always       -       1525

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       0

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       597

 10 Spin_Retry_Count        0x0012   100   253   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0012   100   253   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       49

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       48

193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       49

194 Temperature_Celsius     0x0022   109   106   000    Old_age   Always       -       34

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1

ATA Error Count: 10 (device log contains only the most recent five errors)

   CR = Command Register [HEX]

   FR = Features Register [HEX]

   SC = Sector Count Register [HEX]

   SN = Sector Number Register [HEX]

   CL = Cylinder Low Register [HEX]

   CH = Cylinder High Register [HEX]

   DH = Device/Head Register [HEX]

   DC = Device Command Register [HEX]

   ER = Error register [HEX]

   ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 10 occurred at disk power-on lifetime: 128 hours (5 days + 8 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 20 89 9e 0e 08   1d+08:15:46.072  READ DMA

  ea 00 00 b7 28 fd 18 08   1d+08:15:43.791  FLUSH CACHE EXT

  35 00 08 b0 28 fd 18 08   1d+08:15:43.791  WRITE DMA EXT

  ea 00 00 af 28 fd 18 08   1d+08:15:43.781  FLUSH CACHE EXT

  35 00 10 a0 28 fd 18 08   1d+08:15:43.781  WRITE DMA EXT

Error 9 occurred at disk power-on lifetime: 128 hours (5 days + 8 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 20 89 9e 0e 08   1d+08:00:01.152  READ DMA

  27 00 00 00 00 00 00 08   1d+08:00:01.152  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

  ec 00 00 00 00 00 00 08   1d+08:00:01.149  IDENTIFY DEVICE

  ef 03 46 00 00 00 00 08   1d+08:00:01.149  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 00 08   1d+08:00:01.149  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 8 occurred at disk power-on lifetime: 128 hours (5 days + 8 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 20 89 9e 0e 08   1d+07:59:57.996  READ DMA

  27 00 00 00 00 00 00 08   1d+07:59:57.986  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

  ec 00 00 00 00 00 00 08   1d+07:59:57.983  IDENTIFY DEVICE

  ef 03 46 00 00 00 00 08   1d+07:59:57.983  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 00 08   1d+07:59:57.983  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 7 occurred at disk power-on lifetime: 128 hours (5 days + 8 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 20 89 9e 0e 08   1d+07:59:54.992  READ DMA

  27 00 00 00 00 00 00 08   1d+07:59:54.992  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

  ec 00 00 00 00 00 00 08   1d+07:59:54.989  IDENTIFY DEVICE

  ef 03 46 00 00 00 00 08   1d+07:59:54.989  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 00 08   1d+07:59:54.989  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

Error 6 occurred at disk power-on lifetime: 128 hours (5 days + 8 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 20 89 9e 0e 08   1d+07:59:51.676  READ DMA

  27 00 00 00 00 00 00 08   1d+07:59:51.676  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

  ec 00 00 00 00 00 00 08   1d+07:59:51.672  IDENTIFY DEVICE

  ef 03 46 00 00 00 00 08   1d+07:59:51.672  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 00 08   1d+07:59:51.672  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

----------

## NeddySeagoon

kajzer,

from sda 

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       56 

  9 Power_On_Hours          0x0032   030   030   000    Old_age   Always       -       61816 

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0 

SMART Error Log Version: 1

No Errors Logged

```

That drive is getting old at nearly 62,000 operating hours. It has 56 reallocated sectors and zero pending, so the sector reallocation is working as its intended. 

The Short offline test tells very little. Its worth running the long test.

That drive appears ageing but healthy.

sdb

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0 

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       597 

197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0 
```

Thats a new drive that appears to be OK but ...

```
Error 10 occurred at disk power-on lifetime: 128 hours (5 days + 8 hours)

  When the command that caused the error occurred, the device was active or idle. 

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819 

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819 

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819 

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819 

  40 51 00 23 89 9e ee  Error: UNC at LBA = 0x0e9e8923 = 245270819 
```

The drive is struggling to read LBA = 0x0e9e8923.

It reads OK sometimes, or the  Current_Pending_Sector count would be non zero.

Even worse, when it reads correctly, it does not trigger the sector remapping mechanism, so sometimes it works sometimes it doesn't.  

That's a warranty return. I did not check your warranty status but at only 597 running hours, the drive looks fairly new.

----------

## The Main Man

Thank you very much, NeddySeagoon.

sdb reports for some reason only 597 running hours, in fact it's older than sda

That started recently (showing lower number on running hours) , I'm not sure what it was before that but certainly higher than 62,000 , not much though, around 70,000 maybe.

Disks are working fine until after some time I see on one of them some error message in dmesg, then I open the case, clear the dust or check the cables and then it's fine, and it goes like that in cycles...

----------

## eccerr0r

Uhoh... it wrapped around?  Probably 65536 hours + 597.

I had a drive that wrapped around... and it looks something like that where the VALUE is no longer 100 -- much less than 50 ...

I suspect HD manufacturers do not expect people to be running their disks longer than 65535 hours, probably much less than that, if they planned it that way :(

----------

## The Main Man

Yeah, you're probably right about that, makes sense.

Edit:

I forgot to mention why I'm using crazydiskinfo, as I don't understand SMART information this app simplifies that for me and tells me is it good or not.

Screenshot

----------

