# [SOLVED] smartd errors every two weeks

## cfgauss

smartd is monitoring my hard disk and every two weeks emails me to tell me that the ATA error count increased. Here's the last error:

```
Error 14 occurred at disk power-on lifetime: 10068 hours (419 days + 12 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 01 cf c0 cd 0a  Error: ICRC, ABRT at LBA = 0x0acdc0cf = 181256399

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  ca 00 08 c8 c0 cd ea 08      20:04:14.370  WRITE DMA

  ca 00 08 90 c0 cd ea 08      20:04:14.370  WRITE DMA

  ca 00 10 60 c0 cd ea 08      20:04:14.370  WRITE DMA

  ca 00 08 48 c0 cd ea 08      20:04:14.370  WRITE DMA

  ca 00 08 f0 bf cd ea 08      20:04:14.370  WRITE DMA
```

But the long test, smartctl -t long /dev/sda, about 10 hours, always returns Completed without error.

Here are the attributes from smartctl -A /dev/sda:

```
SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0

  2 Throughput_Performance  0x0005   137   137   054    Pre-fail  Offline      -       79

  3 Spin_Up_Time            0x0007   128   128   024    Pre-fail  Always       -       600 (Average 603)

  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       582

  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0

  8 Seek_Time_Performance   0x0005   121   121   020    Pre-fail  Offline      -       34

  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       10203

 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       582

192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1066

193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       1066

194 Temperature_Celsius     0x0002   111   111   000    Old_age   Always       -       54 (Min/Max 15/59)

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       14

```

Can I safely ignore smartd's email and only replace the hard disk when the long test shows problems?

Thanks for any help in interpreting smartmontools.

[SOLVED] tholin, below, suggested that the problem might be a SATA cable. I replaced it and have had no errors since. [/SOLVED]Last edited by cfgauss on Tue Jun 06, 2017 2:46 pm; edited 2 times in total

----------

## Jaglover

Hard drives can fail in many different ways. Yours looks OK, though. When reallocated/pending sector count goes up then get ready to get a new drive, or if the test does not finish at 100%.

----------

## tholin

 *cfgauss wrote:*   

> Error: ICRC
> 
> ...
> 
> 199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       14
> ...

 

Looks like you have transfer error between the controller and disk. Try changing the sata cable and make sure the connectors are clean. Most sata cables are unshielded and run at high speeds so transfer errors are common. The data is checksummed so the controller will try again when that happens. That's why you don't notice any other problems.

----------

## cfgauss

 *tholin wrote:*   

>  *cfgauss wrote:*   Error: ICRC
> 
> ...
> 
> 199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       14
> ...

 

Thanks for pointing this out. I've changed the cable and will watch for future ICRC errors.

----------

