# [SOLVED] ATA bus error

## someone12345

Hi!

My machine seems to stand still from time to time and during that time I do see this in my log:

Aug 16 11:15:26 server ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Aug 16 11:15:26 server ata2.00: configured for UDMA/33

Aug 16 11:15:26 server ata2: EH complete

Aug 16 11:15:27 server ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x6

Aug 16 11:15:27 server ata2.00: BMDMA stat 0x4

Aug 16 11:15:27 server ata2: SError: { 10B8B Dispar }

Aug 16 11:15:27 server ata2.00: cmd c8/00:08:6d:34:bd/00:00:00:00:00/e2 tag 0 dma 4096 in

Aug 16 11:15:27 server res 51/84:00:74:34:bd/84:01:0e:00:00/e2 Emask 0x10 (ATA bus error)

Aug 16 11:15:27 server ata2.00: status: { DRDY ERR }

Aug 16 11:15:27 server ata2.00: error: { ICRC ABRT }

Aug 16 11:15:27 server ata2: hard resetting link

Aug 16 11:15:27 server ata2: SATA link down (SStatus 100 SControl 310)

So, what does this exactly mean? Broken hardware?

Background: I always have SATA problems when after I booted up windows (from another SATA/PATA drive) shortly. But after a while everything's fine again and my Linux runs rock solid for weeks and months. This is really strange.Last edited by someone12345 on Sun Aug 23, 2009 9:42 am; edited 1 time in total

----------

## NeddySeagoon

someone12345,

Poor quality SATA cables possibly. The connectors are supposed to snap together but often they don't.

Check the data cables are connected properly at both ends.

Less likely is a 300Gb drive on a 150Gb controller. The drives are suppod to fall back to the lower data speed but most don't.

They are provided with a tiny jumper to force the link speed. This sort of thing is not normally intermittent.

We need to know the drive and motherboard SATA controller to check

----------

## someone12345

Thanks for the reply! 

Asus A8N5X and 2 SAMSUNG HD160JJ (RAID0). I actually already tried different cables in the past (those which came with my Asus board).

----------

## someone12345

I am not complete sure but I think back then I did actually set the 150Gb jumper on the hard drives.

But it's quite reproducable: start up windows once and I run into this problem for quite some time. Most of the time it starts off like this: GRUB does not show up (my boot partition is RAID1, when forcing to boot from the second drive GRUB shows up, the kernel boots but mounting the RAID0 fails). This is completely non-deterministic. I do reboot 10-20 times, switch SATA ports on the board, sometime the system boots up completely and runs for a couple of minutes before it crashes. After 1-2 days everything's fine and the machine runs and runs...until I startup windows again (even booting an Windows installer CD triggers this).

----------

## Sysa

What power supply do you have?

Try to replace it with the more powerful.

BTW: what the smartctl said?

 *someone12345 wrote:*   

> Hi!
> 
> My machine seems to stand still from time to time and during that time I do see this in my log:
> 
> Aug 16 11:15:26 server ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ...

 

----------

## someone12345

Power supply is more than sufficient. And as I mentioned this happens only after booting up Windows. As soon as the system is stable again it runs for weeks and month.

{code}

smartctl -H /dev/sd[ab]

{code}

reports no problems on any of the two drives.

----------

## Sysa

 *someone12345 wrote:*   

> Power supply is more than sufficient. And as I mentioned this happens only after booting up Windows. As soon as the system is stable again it runs for weeks and month.
> 
> ```
> 
> smartctl -H /dev/sd[ab]
> ...

 

I suggest to look at 

```
smartctl -a /dev/sd[ab]
```

after you run it with -t short/long...

Also please check that your HDD jumpers are set correct for 1.5/3Gbps and the setting are same as for the BIOS.

----------

## someone12345

```
# smartctl -a /dev/sda

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family:     SAMSUNG SpinPoint P80 SD series

Device Model:     SAMSUNG HD160JJ

Serial Number:    S08HJ10YB33273

Firmware Version: ZM100-33

User Capacity:    160,041,885,696 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   7

ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a

Local Time is:    Mon Aug 17 11:43:51 2009 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

                                        was suspended by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                 (3654) seconds.

Offline data collection

capabilities:                    (0x5b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (   1) minutes.

Extended self-test routine

recommended polling time:        (  60) minutes.

SCT capabilities:              (0x003f) SCT Status supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       1

  3 Spin_Up_Time            0x0007   100   100   025    Pre-fail  Always       -       6208

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       520

  5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0

  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -       0

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       30817

 10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0012   253   002   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       388

190 Airflow_Temperature_Cel 0x0022   100   058   000    Old_age   Always       -       46

194 Temperature_Celsius     0x0022   100   058   000    Old_age   Always       -       46

195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       67415327

196 Reallocated_Event_Count 0x0032   253   253   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   253   253   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1723

200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0

201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

202 TA_Increase_Count       0x0032   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1

ATA Error Count: 1707 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1707 occurred at disk power-on lifetime: 30794 hours (1283 days + 2 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 80 cd 74 1d e5  Error: ICRC, ABRT 128 sectors at LBA = 0x051d74cd = 85816525

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 80 cd 74 1d e5 00   4d+18:27:26.625  READ DMA

  c8 00 38 95 74 1d e5 00   4d+18:27:26.625  READ DMA

  c8 00 00 cd 70 1d e5 00   4d+18:27:26.625  READ DMA

  c8 00 48 4d 76 1d e5 00   4d+18:27:26.625  READ DMA

  c8 00 58 f5 75 1d e5 00   4d+18:27:26.563  READ DMA

Error 1706 occurred at disk power-on lifetime: 30794 hours (1283 days + 2 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 58 f5 75 1d e5  Error: ICRC, ABRT 88 sectors at LBA = 0x051d75f5 = 85816821

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 58 f5 75 1d e5 00   4d+18:27:25.625  READ DMA

  c8 00 48 05 ea 1d e5 00   4d+18:27:25.625  READ DMA

  c8 00 10 ed e9 1d e5 00   4d+18:27:25.625  READ DMA

  c8 00 08 dd ea 1d e5 00   4d+18:27:25.625  READ DMA

  c8 00 08 cd ea 1d e5 00   4d+18:27:25.625  READ DMA

Error 1705 occurred at disk power-on lifetime: 30794 hours (1283 days + 2 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 70 de be 3b e7  Error: ICRC, ABRT 112 sectors at LBA = 0x073bbede = 121355998

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 70 de be 3b e7 00   4d+18:23:10.625  READ DMA

  c8 00 90 4e be 3b e7 00   4d+18:23:10.625  READ DMA

  c8 00 40 e6 72 3b e7 00   4d+18:23:10.625  READ DMA

  c8 00 08 de 72 3b e7 00   4d+18:23:10.625  READ DMA

  c8 00 10 06 e1 3a e7 00   4d+18:23:10.313  READ DMA

Error 1704 occurred at disk power-on lifetime: 30794 hours (1283 days + 2 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 80 4d d5 f7 e5  Error: ICRC, ABRT 128 sectors at LBA = 0x05f7d54d = 100128077

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 80 4d d5 f7 e5 00   4d+18:12:22.250  READ DMA

  c8 00 80 4d d4 f7 e5 00   4d+18:12:22.188  READ DMA

  ca 00 28 5e c3 4d e6 00   4d+18:12:21.063  WRITE DMA

  ca 00 f8 e6 fb 3d e7 00   4d+18:12:20.438  WRITE DMA

  ca 00 08 de fb 3d e7 00   4d+18:12:20.438  WRITE DMA

Error 1703 occurred at disk power-on lifetime: 30794 hours (1283 days + 2 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 18 46 80 3b e7  Error: ICRC, ABRT 24 sectors at LBA = 0x073b8046 = 121339974

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 18 46 80 3b e7 00   4d+18:11:59.250  READ DMA

  c8 00 30 ae 6a 3b e7 00   4d+18:11:59.188  READ DMA

  c8 00 08 7e 6a 3b e7 00   4d+18:11:59.188  READ DMA

  c8 00 38 ee 69 3b e7 00   4d+18:11:59.188  READ DMA

  c8 00 10 de 69 3b e7 00   4d+18:11:59.125  READ DMA

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%     30805         -

# 2  Short offline       Completed without error       00%     30804         -

# 3  Extended offline    Completed without error       00%     26054         -

# 4  Extended offline    Interrupted (host reset)      60%     26052         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1

SMART Selective self-test log data structure revision number 0

Warning: ATA Specification requires selective self-test log data structure revision number = 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

# smartctl -a /dev/sdb

smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family:     SAMSUNG SpinPoint P80 SD series

Device Model:     SAMSUNG HD160JJ

Serial Number:    S08HJ10YB33277

Firmware Version: ZM100-33

User Capacity:    160,041,885,696 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   7

ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a

Local Time is:    Mon Aug 17 11:44:29 2009 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

                                        was suspended by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                 (3561) seconds.

Offline data collection

capabilities:                    (0x5b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine

recommended polling time:        (   1) minutes.

Extended self-test routine

recommended polling time:        (  59) minutes.

SCT capabilities:              (0x003f) SCT Status supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0007   100   100   025    Pre-fail  Always       -       6144

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       524

  5 Reallocated_Sector_Ct   0x0033   253   253   010    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0

  8 Seek_Time_Performance   0x0025   253   253   015    Pre-fail  Offline      -       0

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       30817

 10 Spin_Retry_Count        0x0033   253   253   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0012   253   002   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       386

190 Airflow_Temperature_Cel 0x0022   094   055   000    Old_age   Always       -       48

194 Temperature_Celsius     0x0022   094   055   000    Old_age   Always       -       48

195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       58540580

196 Reallocated_Event_Count 0x0032   253   253   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   253   253   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   253   253   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       1

200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0

201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

202 TA_Increase_Count       0x0032   100   100   000    Old_age   Always       -       262

SMART Error Log Version: 1

ATA Error Count: 357 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 357 occurred at disk power-on lifetime: 27619 hours (1150 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 08 35 c3 57 e4  Error: ICRC, ABRT at LBA = 0x0457c335 = 72860469

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  ca 00 08 35 c3 57 e4 00      04:55:42.000  WRITE DMA

  ec 00 00 00 00 00 a0 00      04:55:41.938  IDENTIFY DEVICE

  ef 03 42 00 00 00 a0 00      04:55:41.938  SET FEATURES [Set transfer mode]

  ec 00 00 00 00 00 a0 00      04:55:41.938  IDENTIFY DEVICE

  00 00 01 01 00 00 a0 00      04:55:41.563  NOP [Abort queued commands]

Error 356 occurred at disk power-on lifetime: 27619 hours (1150 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 28 de e8 e3 e6  Error: ICRC, ABRT at LBA = 0x06e3e8de = 115599582

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  ca 00 28 de e8 e3 e6 00      04:55:41.500  WRITE DMA

  ec 00 00 00 00 00 a0 00      04:55:41.500  IDENTIFY DEVICE

  ef 03 42 00 00 00 a0 00      04:55:41.500  SET FEATURES [Set transfer mode]

  ec 00 00 00 00 00 a0 00      04:55:41.500  IDENTIFY DEVICE

  00 00 01 01 00 00 a0 00      04:55:41.125  NOP [Abort queued commands]

Error 355 occurred at disk power-on lifetime: 27619 hours (1150 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 28 de e8 e3 e6  Error: ICRC, ABRT at LBA = 0x06e3e8de = 115599582

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  ca 00 28 de e8 e3 e6 00      04:55:41.063  WRITE DMA

  ec 00 00 00 00 00 a0 00      04:55:41.063  IDENTIFY DEVICE

  ef 03 42 00 00 00 a0 00      04:55:41.063  SET FEATURES [Set transfer mode]

  ec 00 00 00 00 00 a0 00      04:55:41.063  IDENTIFY DEVICE

  00 00 01 01 00 00 a0 00      04:55:40.688  NOP [Abort queued commands]

Error 354 occurred at disk power-on lifetime: 27619 hours (1150 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 28 de e8 e3 e6  Error: ICRC, ABRT at LBA = 0x06e3e8de = 115599582

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  ca 00 28 de e8 e3 e6 00      04:55:40.625  WRITE DMA

  ec 00 00 00 00 00 a0 00      04:55:40.625  IDENTIFY DEVICE

  ef 03 42 00 00 00 a0 00      04:55:40.625  SET FEATURES [Set transfer mode]

  ec 00 00 00 00 00 a0 00      04:55:40.563  IDENTIFY DEVICE

  00 00 01 01 00 00 a0 00      04:55:40.250  NOP [Abort queued commands]

Error 353 occurred at disk power-on lifetime: 27619 hours (1150 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  84 51 28 de e8 e3 e6  Error: ICRC, ABRT at LBA = 0x06e3e8de = 115599582

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  ca 00 28 de e8 e3 e6 00      04:55:34.813  WRITE DMA

  ec 00 00 00 00 00 a0 00      04:55:34.750  IDENTIFY DEVICE

  ef 03 42 00 00 00 a0 00      04:55:34.750  SET FEATURES [Set transfer mode]

  ec 00 00 00 00 00 a0 00      04:55:34.750  IDENTIFY DEVICE

  00 00 01 01 00 00 a0 00      04:55:34.375  NOP [Abort queued commands]

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%     30806         -

# 2  Short offline       Completed without error       00%     30805         -

# 3  Extended offline    Completed without error       00%     26051         -

# 4  Extended offline    Interrupted (host reset)      60%     26049         -

SMART Selective Self-Test Log Data Structure Revision Number (0) should be 1

SMART Selective self-test log data structure revision number 0

Warning: ATA Specification requires selective self-test log data structure revision number = 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

----------

## someone12345

BTW the last time this errors occured in the logs was yesterday noon.

Seems as if everything's fine again. As always after 1-2 days.

----------

## NeddySeagoon

someone12345,

That very high number of corrected ECC errors shows you have drive to motherboard interface errors.

It can be the motherboard, the drives, the data cables, even the PSU and its wiring.

Correcting the errors takes time, so even though it seems to work for a few days at a time, its not really.

Its not Windows related either.

Both drives show ECC errors, which suggests its not the SATA data cables unless they are both causing issues.

They are the lowest cost parts to replace. After that, you need a plug in SATA card to test your motherboard.

Your drives have 31,000 operating hours on, which is 3.5 years continious running. A low cost power supply could easily be causing problems after that many operating hours. The poor quality parts used begin to fail slowly and the PSU no longer regulates properly. Thats regardless of any excess PSU capacity you think you may have.

----------

## someone12345

So, but why does it occur always/only after I booted windows?

----------

## NeddySeagoon

someone1234,

Its probably related to the thermal changes in the system somehow.

I would not be surprised to find the symptoms linked to cooling air temperature.

From your SMART data, the problem is there all the time. You only get a crash when the ECC cannot recover.

A few degrees hotter/cooler, or just the changing temperature can be enough.

The failure condition you have is marginal. Its always there, causing ECC errors but sometimes its so bad as to cause crashes or lockups.

----------

## energyman76b

 *NeddySeagoon wrote:*   

> someone12345,
> 
> That very high number of corrected ECC errors shows you have drive to motherboard interface errors.
> 
> It can be the motherboard, the drives, the data cables, even the PSU and its wiring.
> ...

 

no, the ecc errors are fine. There is no problem with that. Samsung drives always have a high Hardware_ECC_Recovered error count. You can ignore that. Important are the not recovered errors.

The dma errors are the problem. /dev/sda has a defective cable/connector. Replace the cable.

----------

## someone12345

You guys seem to be right.  I bought new SATA cables with metal latches and so far I have no problems no more.

thanks!

----------

