# hdd dead?

## DaggyStyle

so, my media hdd is dead...

here are the boot dmesg:

```
[    2.933889] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

[    2.933890] ata5.00: irq_stat 0x40000001

[    2.933891] ata5.00: failed command: READ DMA

[    2.933894] ata5.00: cmd c8/00:08:80:70:7b/00:00:00:00:00/e0 tag 8 dma 4096 in

                        res 51/04:08:80:70:7b/00:00:00:00:00/e0 Emask 0x1 (device error)

[    2.933894] ata5.00: status: { DRDY ERR }

[    2.933895] ata5.00: error: { ABRT }

[    2.934027] ata5.00: configured for UDMA/133 (device error ignored)

[    2.934038] sd 4:0:0:0: [sdb] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s

[    2.934039] sd 4:0:0:0: [sdb] tag#8 Sense Key : Illegal Request [current] 

[    2.934041] sd 4:0:0:0: [sdb] tag#8 Add. Sense: Unaligned write command

[    2.934043] sd 4:0:0:0: [sdb] tag#8 CDB: Read(10) 28 00 00 7b 70 80 00 00 08 00

[    2.934045] blk_update_request: I/O error, dev sdb, sector 8089728 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0

[    2.934050] ata5: EH complete

[    2.935279] nct6775: Found NCT6791D or compatible chip at 0x2e:0x290

[    2.936810] sr 0:0:0:0: [sr0] scsi3-mmc drive: 48x/12x writer dvd-ram cd/rw xa/form2 cdda tray

[    2.936812] cdrom: Uniform CD-ROM driver Revision: 3.20

[    2.940622] tun: Universal TUN/TAP device driver, 1.6

[    2.947774] VFIO - User Level meta-driver version: 0.3

[    2.950883] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

[    2.950887] ata5.00: irq_stat 0x40000001

[    2.950892] ata5.00: failed command: READ DMA

[    2.950901] ata5.00: cmd c8/00:08:80:70:7b/00:00:00:00:00/e0 tag 2 dma 4096 in

                        res 51/04:08:80:70:7b/00:00:00:00:00/e0 Emask 0x1 (device error)

[    2.950904] ata5.00: status: { DRDY ERR }

[    2.950907] ata5.00: error: { ABRT }

[    2.951023] ata5.00: configured for UDMA/133 (device error ignored)

[    2.951029] ata5: EH complete

[    2.958913] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

[    2.958917] ata5.00: irq_stat 0x40000001

[    2.958921] ata5.00: failed command: READ DMA

[    2.958931] ata5.00: cmd c8/00:08:80:70:7b/00:00:00:00:00/e0 tag 22 dma 4096 in

                        res 51/04:08:80:70:7b/00:00:00:00:00/e0 Emask 0x1 (device error)

[    2.958934] ata5.00: status: { DRDY ERR }

[    2.958937] ata5.00: error: { ABRT }

[    2.959235] ata5.00: configured for UDMA/133 (device error ignored)

[    2.959260] ata5: EH complete

[    2.963016] vfio_pci: add [8086:a170[ffffffff:ffffffff]] class 0x000000/00000000

[    2.963034] vfio-pci 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem

[    2.966911] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

[    2.966915] ata5.00: irq_stat 0x40000001

[    2.966919] ata5.00: failed command: READ DMA

[    2.966929] ata5.00: cmd c8/00:08:80:70:7b/00:00:00:00:00/e0 tag 1 dma 4096 in

                        res 51/04:08:80:70:7b/00:00:00:00:00/e0 Emask 0x1 (device error)

[    2.966932] ata5.00: status: { DRDY ERR }

[    2.966935] ata5.00: error: { ABRT }

[    2.967295] ata5.00: configured for UDMA/133 (device error ignored)

[    2.967320] ata5: EH complete

[    2.968524] sr 0:0:0:0: Attached scsi CD-ROM sr0

[    2.974927] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

[    2.974931] ata5.00: irq_stat 0x40000001

[    2.974935] ata5.00: failed command: READ DMA

[    2.974945] ata5.00: cmd c8/00:08:80:70:7b/00:00:00:00:00/e0 tag 3 dma 4096 in

                        res 51/04:08:80:70:7b/00:00:00:00:00/e0 Emask 0x1 (device error)

[    2.974948] ata5.00: status: { DRDY ERR }

[    2.974950] ata5.00: error: { ABRT }

[    2.975241] ata5.00: configured for UDMA/133 (device error ignored)

[    2.975266] ata5: EH complete

[    2.976044] vfio_pci: add [8086:5912[ffffffff:ffffffff]] class 0x000000/00000000

[    2.983910] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

[    2.983914] ata5.00: irq_stat 0x40000001

[    2.983918] ata5.00: failed command: READ DMA

[    2.983928] ata5.00: cmd c8/00:08:80:70:7b/00:00:00:00:00/e0 tag 5 dma 4096 in

                        res 51/04:08:80:70:7b/00:00:00:00:00/e0 Emask 0x1 (device error)

[    2.983931] ata5.00: status: { DRDY ERR }

[    2.983934] ata5.00: error: { ABRT }

[    2.984226] ata5.00: configured for UDMA/133 (device error ignored)

[    2.984251] ata5: EH complete

[    2.990917] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

[    2.990920] ata5.00: irq_stat 0x40000001

[    2.990925] ata5.00: failed command: READ DMA

[    2.990935] ata5.00: cmd c8/00:08:80:70:7b/00:00:00:00:00/e0 tag 7 dma 4096 in

                        res 51/04:08:80:70:7b/00:00:00:00:00/e0 Emask 0x1 (device error)

[    2.990937] ata5.00: status: { DRDY ERR }

[    2.990940] ata5.00: error: { ABRT }

[    2.991251] ata5.00: configured for UDMA/133 (device error ignored)

[    2.991280] sd 4:0:0:0: [sdb] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s

[    2.991286] sd 4:0:0:0: [sdb] tag#7 Sense Key : Illegal Request [current] 

[    2.991291] sd 4:0:0:0: [sdb] tag#7 Add. Sense: Unaligned write command

[    2.991295] sd 4:0:0:0: [sdb] tag#7 CDB: Read(10) 28 00 00 7b 70 80 00 00 08 00

[    2.991301] blk_update_request: I/O error, dev sdb, sector 8089728 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0

[    2.991306] Buffer I/O error on dev sdb, logical block 4044864, async page read

[    2.991309] Buffer I/O error on dev sdb, logical block 4044865, async page read

[    2.991330] ata5: EH complete

```

smartctl -a /dev/sdb

```

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.76-gentoo-r1] (local build)

Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Device Model:     ST3000DM001

Serial Number:    ZA500H9A

LU WWN Device Id: 5 000c50 07a7155cd

Firmware Version: CC25

User Capacity:    137,438,952,960 bytes [137 GB]

Sector Size:      512 bytes logical/physical

Rotation Rate:    7200 rpm

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:   ATA8-ACS T13/1699-D revision 4

SATA Version is:  SATA 3.0, 6.0 Gb/s

Local Time is:    Fri Jul 29 19:52:18 2022 IDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

Read SMART Data failed: scsi error badly formed scsi parameters

=== START OF READ SMART DATA SECTION ===

SMART Status command failed: scsi error badly formed scsi parameters

SMART overall-health self-assessment test result: UNKNOWN!

SMART Status, Attributes and Thresholds cannot be read.

Read SMART Log Directory failed: scsi error badly formed scsi parameters

Read SMART Error Log failed: scsi error badly formed scsi parameters

Read SMART Self-test Log failed: scsi error badly formed scsi parameters

Selective Self-tests/Logging not supported

```

the hdd is 3T, what are the chances it is the cable?

----------

## mike155

It could be a connection problem. Can you try to attach the drive to a different computer with a different cable? Or at least to a different SATA port with a different cable?

----------

## NeddySeagoon

DaggyStyle,

That looks like the HDD is connected over USB and the USB/HDD bridge chip does not support the full SCSI command set.

Can you connect it to the motherboard?

----------

## DaggyStyle

 *mike155 wrote:*   

> It could be a connection problem. Can you try to attach the drive to a different computer with a different cable? Or at least to a different SATA port with a different cable?

 

that was my intention, I thought I can 

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> That looks like the HDD is connected over USB and the USB/HDD bridge chip does not support the full SCSI command set.
> 
> Can you connect it to the motherboard?

 

it is, it's connected to the mb with sata cable to one of the sata ports

----------

## NeddySeagoon

DaggyStyle,

The smart data says 

```
User Capacity:    137,438,952,960 bytes [137 GB]
```

 *DaggyStyle wrote:*   

> ... the hdd is 3T ...

 

They can't both be right. By a strange coincidence, 137G is all you can address under some very old BIOSes.

Very old kernels needed the kernel parameter hdd=stroke to turn on 48 bit LBA after the BIOS failed to provide support.

You could also have turned on the Host Protected Area and pun most the space into that. You can't use it there.

Anyway.Until you can see the entire drive all bets are off.

Pastebin your entire dmesg.

----------

## DaggyStyle

here: https://dpaste.com/6VTC3JUEZ

I don't know if it is turned one, I didn't do anything, it happened while streaming a file from the device.

----------

## NeddySeagoon

DaggyStyle,

```
[    0.680246] ata5.00: failed to read native max address (err_mask=0x1)

[    0.680257] ata5.00: HPA support seems broken, skipping HPA handling
```

That failed to read native max address is game over.

I've not seen that before. That means that the kernel can't tell how big the HDD is. 

```
[    0.727050] ata5.00: status: { DRDY ERR }
```

suggests the drive never becomes ready.

Try a different SATA data cable.

First on the same motherboard SATA port,  then on a different SATA motherboard port.

Nothing matters until that failed to read native max address error goes away.

----------

## DaggyStyle

I've extracted the disk and inserted it into sata to usb case, the device is visible however there are issues reported, see:

```
# smartctl -a /dev/sdh

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.18.15-gentoo] (local build)

Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Seagate Barracuda 7200.14 (AF)

Device Model:     ST3000DM001-1ER166

Serial Number:    ZA500H9A

LU WWN Device Id: 5 000c50 07a7155cd

Firmware Version: CC25

User Capacity:    3,000,592,982,016 bytes [3.00 TB]

Sector Sizes:     512 bytes logical, 4096 bytes physical

Rotation Rate:    7200 rpm

Form Factor:      3.5 inches

Device is:        In smartctl database 7.3/5387

ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b

SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is:    Sun Jul 31 20:39:32 2022 IDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                (   80) seconds.

Offline data collection

capabilities:                    (0x73) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        No Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (   1) minutes.

Extended self-test routine

recommended polling time:        ( 315) minutes.

Conveyance self-test routine

recommended polling time:        (   2) minutes.

SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -       205953736

  3 Spin_Up_Time            0x0003   098   093   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1089

  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   062   060   030    Pre-fail  Always       -       1627599

  9 Power_On_Hours          0x0032   037   037   000    Old_age   Always       -       55446

 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       403

183 Runtime_Bad_Block       0x0032   098   098   000    Old_age   Always       -       2

184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       2 2 3

189 High_Fly_Writes         0x003a   093   093   000    Old_age   Always       -       7

190 Airflow_Temperature_Cel 0x0022   072   048   045    Old_age   Always       -       28 (Min/Max 24/28)

191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0

192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       755

193 Load_Cycle_Count        0x0032   067   067   000    Old_age   Always       -       67397

194 Temperature_Celsius     0x0022   028   052   000    Old_age   Always       -       28 (0 17 0 0 0)

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       18503h+11m+42.001s

241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       28041850244

242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       89685898172

SMART Error Log Version: 1

ATA Error Count: 2

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 21562 hours (898 days + 10 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  04 51 00 00 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  00 00 00 00 00 00 00 ff      07:13:04.285  NOP [Abort queued commands]

  b0 d4 00 82 4f c2 00 00      07:12:43.664  SMART EXECUTE OFF-LINE IMMEDIATE

  b0 d0 01 00 4f c2 00 00      07:12:43.581  SMART READ DATA

  ec 00 01 00 00 00 00 00      07:12:43.576  IDENTIFY DEVICE

  ec 00 01 00 00 00 00 00      07:12:43.575  IDENTIFY DEVICE

Error 1 occurred at disk power-on lifetime: 21553 hours (898 days + 1 hours)

  When the command that caused the error occurred, the device was in an unknown state.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  04 51 00 00 00 00 00  Error: ABRT

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  00 00 00 00 00 00 00 ff      00:05:54.302  NOP [Abort queued commands]

  b0 d4 00 81 4f c2 00 00      00:05:33.750  SMART EXECUTE OFF-LINE IMMEDIATE

  b0 d0 01 00 4f c2 00 00      00:05:33.694  SMART READ DATA

  ec 00 01 00 00 00 00 00      00:05:33.689  IDENTIFY DEVICE

  ec 00 01 00 00 00 00 00      00:05:33.688  IDENTIFY DEVICE

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%     55446         -

# 2  Extended captive    Interrupted (host reset)      90%     21562         -

# 3  Short captive       Interrupted (host reset)      70%     21553         -

# 4  Short offline       Completed without error       00%     21553         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
```

any incites?

----------

## NeddySeagoon

DaggyStyle,

```
User Capacity:    3,000,592,982,016 bytes [3.00 TB] 
```

That's a good start.

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 

  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0 

  9 Power_On_Hours          0x0032   037   037   000    Old_age   Always       -       55446

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0 
```

All that says is the drive is old. 55446 hours.

The errors look like problems communicating with the host at power up.  

```
Error 2 occurred at disk power-on lifetime: 21562 hours 
```

They are old and have not recurred.

```
SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%     55446        -
```

That's your recent short test. The other tests are old, around the time of the two errors in the log.

Run the long test. If it passes, it's probably OK for now. I get nervous around 60000 running hours.

----------

## eccerr0r

I've been getting those illegal write commands which are indicative of OS problems, did you switch kernels recently or perhaps didn't notice the failed commands until now?

BTW, sigh.  Just had a 200GB disk buy the farm.  It gave sector errors and SMART soon after reported "24 hours to live"... I copied everything off of it to another disk and sure enough, a few days later, it fails to detect.

SMART was smart this time around.  Probably the first time actually.  I have another disk that SMART reports "24 hours to live" and the disk is still working, sort of...

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> ```
> User Capacity:    3,000,592,982,016 bytes [3.00 TB] 
> ```
> ...

 

I cannot complete the long test because the usb hub keeps reseting,,. I''l have to try do it from a windows system unfortunately..

 *eccerr0r wrote:*   

> I've been getting those illegal write commands which are indicative of OS problems, did you switch kernels recently or perhaps didn't notice the failed commands until now?
> 
> BTW, sigh.  Just had a 200GB disk buy the farm.  It gave sector errors and SMART soon after reported "24 hours to live"... I copied everything off of it to another disk and sure enough, a few days later, it fails to detect.
> 
> SMART was smart this time around.  Probably the first time actually.  I have another disk that SMART reports "24 hours to live" and the disk is still working, sort of...

 

so, the system didn't changed kernel in ages, in fact I'm planing to replace the os with a different flavor.

----------

## NeddySeagoon

DaggyStyle,

As long as the HDD does not power down, you don't need a data link during the test.

You start the test then come back after its complete and read the answer.

There is another way but it involves the outside word, not just the drive internals.

dd the content of the drive to /dev/null. Use bs=1M so you don't die of old age waiting.

If that works the drive, interface and cables are good.

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> As long as the HDD does not power down, you don't need a data link during the test.
> 
> You start the test then come back after its complete and read the answer.
> ...

 

tried the latter suggestion, after 17534.9 seconds, the read ended with no errors in dmesg.

this means the hdd is good and my issue is either the the port on the original board, the cable or both.

thanks for the help All!

----------

## DaggyStyle

update,

I've inserted the hdd back to the case, replaced both sata cable and the sata port.

still getting errors, see:

```

[    0.000000] BIOS-e820: [mem 0x0000000047f9a000-0x0000000047ffefff] ACPI data

[    0.006169] ACPI: SSDT 0x0000000047F52850 00036D (v01 SataRe SataTabl 00001000 INTL 20120913)

[    0.094662] Memory: 31518464K/32429532K available (12299K kernel code, 2374K rwdata, 2588K rodata, 1080K init, 1152K bss, 910812K reserved, 0K cma-reserved)

[    0.112372] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.

[    0.112372] TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.

[    0.167925] libata version 3.00 loaded.

[    0.368795] ata1: SATA max UDMA/133 abar m2048@0xdf04b000 port 0xdf04b100 irq 128

[    0.368797] ata2: SATA max UDMA/133 abar m2048@0xdf04b000 port 0xdf04b180 irq 128

[    0.368799] ata3: SATA max UDMA/133 abar m2048@0xdf04b000 port 0xdf04b200 irq 128

[    0.368801] ata4: SATA max UDMA/133 abar m2048@0xdf04b000 port 0xdf04b280 irq 128

[    0.368803] ata5: SATA max UDMA/133 abar m2048@0xdf04b000 port 0xdf04b300 irq 128

[    0.368805] ata6: SATA max UDMA/133 abar m2048@0xdf04b000 port 0xdf04b380 irq 128

[    0.678960] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

[    0.678999] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

[    0.679032] ata6: SATA link down (SStatus 4 SControl 300)

[    0.679065] ata2: SATA link down (SStatus 4 SControl 300)

[    0.679097] ata5: SATA link down (SStatus 4 SControl 300)

[    0.680227] ata1.00: ATAPI: HL-DT-ST DVDRAM GH24NSD1, LW00, max UDMA/133

[    0.681736] ata1.00: configured for UDMA/133

[    0.686747] ata3.00: ATA-9: TS120GSSD220S, R0510A0, max UDMA/133

[    0.686761] ata3.00: 234441648 sectors, multi 1: LBA48 NCQ (depth 32), AA

[    0.705233] ata3.00: configured for UDMA/133

[    2.594837] ata4: COMRESET failed (errno=-32)

[    2.594849] ata4: reset failed (errno=-32), retrying in 8 secs

[   13.031545] ata4: COMRESET failed (errno=-32)

[   13.031558] ata4: reset failed (errno=-32), retrying in 8 secs

[   23.291916] ata4: COMRESET failed (errno=-32)

[   23.291928] ata4: reset failed (errno=-32), retrying in 33 secs

[   60.642426] ata4: COMRESET failed (errno=-32)

[   60.642439] ata4: reset failed, giving up

[   62.873960] ata4: COMRESET failed (errno=-32)

[   62.873972] ata4: reset failed (errno=-32), retrying in 8 secs

[   73.088958] ata4: SATA link down (SStatus 1 SControl 300)

[   74.200021] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)

[   74.201225] Write protecting the kernel read-only data: 18432k

[   74.201565] Freeing unused kernel image (text/rodata gap) memory: 2036K

[   74.201799] Freeing unused kernel image (rodata/data gap) memory: 1508K

[   75.629917] ata4: COMRESET failed (errno=-32)

[   75.629922] ata4: reset failed (errno=-32), retrying in 8 secs

[   79.084262] cfg80211: Loading compiled-in X.509 certificates for regulatory database

[   79.335596] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.

[   85.834889] ata4: COMRESET failed (errno=-32)

[   85.834892] ata4: reset failed (errno=-32), retrying in 8 secs

[   96.104940] ata4: SATA link down (SStatus 1 SControl 300)

[   98.391910] ata4: COMRESET failed (errno=-32)

[   98.391912] ata4: reset failed (errno=-32), retrying in 8 secs

[  109.127898] ata4: SATA link down (SStatus 1 SControl 300)

[  111.401917] ata4: COMRESET failed (errno=-32)

[  111.401924] ata4: reset failed (errno=-32), retrying in 8 secs

[  121.782947] ata4: COMRESET failed (errno=-32)

[  121.782955] ata4: reset failed (errno=-32), retrying in 8 secs

[  131.833965] ata4: SATA link down (SStatus 1 SControl 300)

[  134.262950] ata4: COMRESET failed (errno=-32)

[  134.262957] ata4: reset failed (errno=-32), retrying in 8 secs

[  144.202950] ata4: COMRESET failed (errno=-32)

[  144.202957] ata4: reset failed (errno=-32), retrying in 8 secs

[  154.550950] ata4: COMRESET failed (errno=-32)

[  154.550957] ata4: reset failed (errno=-32), retrying in 33 secs

[  191.791960] ata4: COMRESET failed (errno=-32)

[  191.791968] ata4: reset failed, giving up

[  194.187956] ata4: COMRESET failed (errno=-32)

[  194.187962] ata4: reset failed (errno=-32), retrying in 8 secs

[  204.160953] ata4: COMRESET failed (errno=-32)

[  204.160960] ata4: reset failed (errno=-32), retrying in 8 secs

[  214.376954] ata4: COMRESET failed (errno=-32)

[  214.376961] ata4: reset failed (errno=-32), retrying in 33 secs

[  253.285955] ata4: COMRESET failed (errno=-32)

[  253.285963] ata4: reset failed, giving up

[  255.648955] ata4: COMRESET failed (errno=-32)

[  255.648962] ata4: reset failed (errno=-32), retrying in 8 secs

[  266.085955] ata4: COMRESET failed (errno=-32)

[  266.085962] ata4: reset failed (errno=-32), retrying in 8 secs

```

can it be the power cable?

----------

## NeddySeagoon

DaggyStyle,

The power cable is unlikely.

The kernel can see the drive, or it wouldn't be sending the COMRESET command to be told

```
[   75.629917] ata4: COMRESET failed (errno=-32)

[   75.629922] ata4: reset failed (errno=-32), retrying in 8 secs 
```

It could be the electronics board on the drive, or possibly the drive failing to spin up.

In the case of the electronics board, you would need a replacement from an *identical* drive.

The  drive failing to spin up can be felt. Hold the drive in your hand then plug the power cable in.

That's safe as SATA is hot pluggable.

You will feel the spinup if it happens.

----------

## DaggyStyle

there are two additional sata devices, a sdd and a cdrom, both seems to work ok

I wonder, how it can be working with the eternal case and not when it is in the case?

----------

## NeddySeagoon

DaggyStyle,

 *DaggyStyle wrote:*   

> I wonder, how it can be working with the eternal case and not when it is in the case?

 

Ahhh ...

Once, a long time ago, I had a USB HDD that had problems. I took it out of the case and tried it in the PC.

Well it had conventional SATA connectors, so why not.

However, the USB/SATA bridge was on the drive. The SATA data connector carried the USB signals. Eww.

dmesg was good enough to see that it was USB and refused to go further. 

I wonder if your drive is like that?

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
>  *DaggyStyle wrote:*   I wonder, how it can be working with the eternal case and not when it is in the case? 
> 
> Ahhh ...
> ...

 

the case is an stlab sata to usb3 and the hdd is a seagate 3t one. not sure which model

----------

## NeddySeagoon

DaggyStyle,

You would need to know the chip numbers from the PCB inside the USB case.

That will tell what they are. I would expect a power supply chip and a USB to SATA bridge chip.

The drive reports itself as 

```
=== START OF INFORMATION SECTION ===

Model Family:     Seagate Barracuda 7200.14 (AF)

Device Model:     ST3000DM001-1ER166

Serial Number:    ZA500H9A

LU WWN Device Id: 5 000c50 07a7155cd

Firmware Version: CC25

User Capacity:    3,000,592,982,016 bytes [3.00 TB] 
```

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> You would need to know the chip numbers from the PCB inside the USB case.
> 
> That will tell what they are. I would expect a power supply chip and a USB to SATA bridge chip.
> ...

 

not sure I can get that but I'll try, I think the case is of local production

----------

## DaggyStyle

what can be deduced if I replace the hdd with another one from another company?

if it works ok, it is the hdd, if not?

----------

## NeddySeagoon

DaggyStyle,

If that works, you can be sure that the drives are different.

Yuo said that in works in its USB enclosure. That makes me think the USB/SATA bridge in included on the drive.

----------

## DaggyStyle

not sure I follow, I cannot get anything regarding the case,

what I can do is try another drive:

with the same capacity, different vendor.

with the different capacity, same vendor.

with the different capacity, different vendor.

will any of this help?

----------

## NeddySeagoon

DaggyStyle,

Replacing the HDD with a different drive may indicate something about everything except the drive that was replaced.

It says nothing about that drive. It may increase confidence that the drive that was replaced was faulty.

You said that the drive works in its USB enclosure.

Is that still correct?

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> Replacing the HDD with a different drive may indicate something about everything except the drive that was replaced.
> 
> It says nothing about that drive. It may increase confidence that the drive that was replaced was faulty.
> ...

 

yup, thats correct

----------

## NeddySeagoon

DaggyStyle,

If it works at all, anywhere, the HDD has not failed.

I suspect that data connector looks like SATA connector, actually is a SATA connector but is not wired as a SATA connector, so it only works in its own enclosure.

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> If it works at all, anywhere, the HDD has not failed.
> 
> I suspect that data connector looks like SATA connector, actually is a SATA connector but is not wired as a SATA connector, so it only works in its own enclosure.

 

this is all in the case right?

----------

## NeddySeagoon

DaggyStyle,

Yes. Normally there is a small printed circuit board in the case.

It usually has two black plastic integrated circuits on it.

They stand out as they have more than three silvers legs each.

One is a power converter. it takes the 12v input from the PSU brick, generates 5v for the HDD electronics.

The other is normally a USB to SATA bridge chip.

The numbers on the tops identify them. The usual problem is the size of the writing. Its very small.

Once finding one black plastic integrated circuit would be very telling.

It would likely be the power converter. That would indicate that there is no USB to SATA bridge so the drive gets the USB signals directly.

----------

## DaggyStyle

here it is: link

it doesn't tells me anything... but thats me

----------

## NeddySeagoon

DaggyStyle,

The chip with the big VLI marking, labelled U3 is a VL711 - Low Power Super Speed USB 3.0 to SATA 6Gb/s Bridge Controller

As it has one of those, its a real SATA HDD in the enclosure. 

The 1418 next to the power connector, means that it was manufactured in 2014, week 18.

Above that is a area that looks like power supply. I can't read the writing on U4, to see what it is but no matter. We have established the the USB to SATA conversion does not take place on the drive, so the drive is a real SATA drive.

Knowing that, I'm at a loss to explain why it only works in its enclosure.

----------

## DaggyStyle

I've swapped the sata port and cable, what's left is the power connector and the disk itself.

I'll try to use the same disk with another power cable and see.

I think it might be a good hint as the os disk is sata and works ok.

----------

## logrusx

 *NeddySeagoon wrote:*   

> DaggyStyle,
> 
> If it works at all, anywhere, the HDD has not failed.
> 
> 

 

That's not necessarily the case. A lot of HDD's will work on and off before they fail for good.

If it at all works somewhere, this opportunity must be used as soon and as quickly as possible to salvage the most important data from it.

Regards,

Georgi

----------

## DaggyStyle

 *logrusx wrote:*   

>  *NeddySeagoon wrote:*   DaggyStyle,
> 
> If it works at all, anywhere, the HDD has not failed.
> 
>  
> ...

 

based on the above, I should try another hdd of the same vendor, if it works, then the other hdd might die soon.

----------

## NeddySeagoon

logrusx,

That's true at face value. However, DaggyStyle has tried the drive in several locations it appears to work only in the USB enclosure.

That could just be luck of course.

----------

## DaggyStyle

 *NeddySeagoon wrote:*   

> logrusx,
> 
> That's true at face value. However, DaggyStyle has tried the drive in several locations it appears to work only in the USB enclosure.
> 
> That could just be luck of course.

 

I've tried on two system, the board on which the disk comes from using sata and on a different system using usb to sata case.

I cannot try on the other system with sata because all the ports are used.

I have other system at home which have free sata ports however they are all running windows 11. I don't know how to look for the same data on W11

----------

## eccerr0r

mininstall CD?  LiveCD?  SysrescueCD ? :D

----------

## logrusx

 *NeddySeagoon wrote:*   

> logrusx,
> 
> That's true at face value. However, DaggyStyle has tried the drive in several locations it appears to work only in the USB enclosure.
> 
> That could just be luck of course.

 

It appears to work but it if it really worked, this thread wouldn't exist or at least the data would be already recovered. From what I see it doesn't work properly anywhere. It's unreasonable to expect the problem is  not the drive itself as it's the most vulnerable thing with its mechanical parts.

Regards,

Georgi

----------

## DaggyStyle

 *eccerr0r wrote:*   

> mininstall CD?  LiveCD?  SysrescueCD ? 

 

yeah but that means I need to disable a computer...

that might not go well with the wfe or kids...

----------

## DaggyStyle

so now the main hdd on the system displays ata errors...

I'll try boot livecd but something tells me it is the mb...

----------

## eccerr0r

Recently I've found my hard disks generating random errors and it appears to be due to bad SATA and power cables.  Also poorly mounted hdds appears to generate errors too.

I had two 2T disks in one case and they were erroring out left and right.  I transferred and bolted properly to a different case/PSU and used different cables... same MB, same disks, errors went away...

I set up a degraded 4-disk RAID5 (3-disk) to see if I can get another set of disks I was testing to drop.  So far so good.  I was planning to introduce the 4th disk once I find another molex to sata adapter... and hope it too is good.

It will be root on lvm on cryptsetup on MDRAID5 on SATA... talk about a long dependency chain...

----------

## DaggyStyle

so the main hdd is ok.

it was the problematic hdd which caused comreset errors (errno=-32)

the system took a while to boot.

now I have the problematic hdd n the usb to sata case, everything loads ok.

on the server I have a 1tb seagate hdd which is detected without any issues.....

I have no clue what is the issue....

----------

## DaggyStyle

now the hdd I've added to the server is not working... will try to reduce power connections, maybe it is the psu or connection

----------

## steve_v

 *DaggyStyle wrote:*   

> maybe it is the psu or connection

 

That would be a logical next thing to poke at indeed... The external drive enclosure has it's own power supply after all.

I've seen dodgy PSUs (and cheap splitter cables) cause all sorts of weirdness, and I wouldn't be surprised if that's the culprit - particularly as mechanical HDDs draw considerable current from the 12v line when they spin up.

----------

## DaggyStyle

I think it is the sata connection, I've connected another hdd which took less than an hour to disconnect.

I've opened the case to examine it and I think I saw that the cable was disconnected.

so I reconnected it and now all stable for more than a day

will give it another day to verify

----------

## DaggyStyle

update, the drive doesn't disconnects anymore but, when I mount and unmount it, I see these errors: http://dpaste.com/CHPMCDHNF

I think the drive is dying on me....

----------

