# [solved] Hard disk failing?

## s|mon

Hi.

I got two new spinning disks to store mass data and replace my old one.

As before i have used a setup where i have use luks crypt for each and them put both in a btrfs filesyste (raid 1 for data and metadata).

Initial filling etc. went smooth and smartctl showed also no errors for a long test on both.

Unfortunately afte my first scrub i got 5 errors reading data from one of the disks. 

Next step was to run another long  smart test. Although smart lists the errors that where raised during scrub it itself found no erorrs and shows the disk as healthy.

I swapped power plugs and data cables between the disks but errors remain on one of them.

My concern is currently why smartctl shows not any errors. Any hints on how to "proof" that it is a device issue (for replacement) with no reallocated sectors etc, no raw read errors or anything like that in smart.

Or a more specific device test with the device itself. Ideally on the blocks where the bad file is (i assume that is not direcly possible as this is hidden by crypt and i don't have the direct sectors)?

Would badblocks make sense?

dmesg

```

[167775.867870] BTRFS info (device dm-3): scrub: started on devid 1

[167787.758964] BTRFS info (device dm-3): scrub: started on devid 2

[190225.096284] ata10.00: exception Emask 0x0 SAct 0x80001cbf SErr 0x0 action 0x6 frozen

[190225.096298] ata10.00: failed command: READ FPDMA QUEUED

[190225.096301] ata10.00: cmd 60/00:00:00:86:a1/05:00:5b:01:00/40 tag 0 ncq dma 655360 in

                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

[190225.096313] ata10.00: status: { DRDY }

[190225.096317] ata10.00: failed command: READ FPDMA QUEUED

[190225.096320] ata10.00: cmd 60/00:08:00:8b:a1/08:00:5b:01:00/40 tag 1 ncq dma 1048576 in

                         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

[190225.096331] ata10.00: status: { DRDY }

...

[204725.249280] sd 9:0:0:0: [sdf] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s

[204725.249287] sd 9:0:0:0: [sdf] tag#5 Sense Key : Medium Error [current] 

[204725.249291] sd 9:0:0:0: [sdf] tag#5 Add. Sense: Unrecovered read error - auto reallocate failed

[204725.249297] sd 9:0:0:0: [sdf] tag#5 CDB: Read(16) 88 00 00 00 00 02 3b 61 68 00 00 00 05 00 00 00

[204725.249300] blk_update_request: I/O error, dev sdf, sector 9586173952 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 0

[204725.249358] ata10: EH complete

[204728.128580] ata10.00: exception Emask 0x0 SAct 0xbff SErr 0x0 action 0x0

[204728.128595] ata10.00: irq_stat 0x48000000

[204728.128605] ata10.00: failed command: READ FPDMA QUEUED

[204728.128611] ata10.00: cmd 60/00:48:00:6d:61/05:00:3b:02:00/40 tag 9 ncq dma 655360 in

                         res 43/40:00:00:6d:61/00:05:3b:02:00/40 Emask 0x409 (media error) <F>

[204728.128625] ata10.00: status: { DRDY SENSE ERR }

[204728.128630] ata10.00: error: { UNC }

[204728.467793] ata10.00: configured for UDMA/133

[204728.467886] sd 9:0:0:0: [sdf] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=6s

[204728.467893] sd 9:0:0:0: [sdf] tag#9 Sense Key : Medium Error [current] 

[204728.467898] sd 9:0:0:0: [sdf] tag#9 Add. Sense: Unrecovered read error - auto reallocate failed

[204728.467903] sd 9:0:0:0: [sdf] tag#9 CDB: Read(16) 88 00 00 00 00 02 3b 61 6d 00 00 00 05 00 00 00

[204728.467906] blk_update_request: I/O error, dev sdf, sector 9586175232 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 0

[204728.467943] ata10: EH complete

[204738.942718] BTRFS warning (device dm-3): i/o error at logical 4908104941568 on dev /dev/mapper/bd-pool2, physical 4908104941568, root 8964, inode 128051, offset 667648, length 4096, links 1 (path: vidz/somefile.wmv)

[204738.942911] BTRFS warning (device dm-3): i/o error at logical 4908105072640 on dev /dev/mapper/bd-pool2, physical 4908105072640, root 8964, inode 128051, offset 798720, length 4096, links 1 (path: vidz/somefile.wmv)

[204738.942992] BTRFS warning (device dm-3): i/o error at logical 4908104937472 on dev /dev/mapper/bd-pool2, physical 4908104937472, root 8964, inode 128051, offset 663552, length 4096, links 1 (path: vidz/somefile.wmv)

[204738.954747] BTRFS warning (device dm-3): i/o error at logical 4908105072640 on dev /dev/mapper/bd-pool2, physical 4908105072640, root 8962, inode 128051, offset 798720, length 4096, links 1 (path: vidz/somefile.wmv)

[204738.954750] BTRFS warning (device dm-3): i/o error at logical 4908104941568 on dev /dev/mapper/bd-pool2, physical 4908104941568, root 8962, inode 128051, offset 667648, length 4096, links 1 (path: vidz/somefile.wmv)

[204738.954771] BTRFS error (device dm-3): bdev /dev/mapper/bd-pool2 errs: wr 0, rd 6, flush 0, corrupt 0, gen 0

[204738.954774] BTRFS error (device dm-3): bdev /dev/mapper/bd-pool2 errs: wr 0, rd 7, flush 0, corrupt 0, gen 0

[204738.954881] BTRFS warning (device dm-3): i/o error at logical 4908104937472 on dev /dev/mapper/bd-pool2, physical 4908104937472, root 8962, inode 128051, offset 663552, length 4096, links 1 (path: vidz/somefile.wmv)

...

```

```

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.7-gentoo] (local build)

Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Toshiba MG08ACA... Enterprise Capacity HDD

Device Model:     TOSHIBA MG08ACA16TE

Serial Number:    51C0A011FVGG

LU WWN Device Id: 5 000039 ae8cbf2b5

Firmware Version: 0102

User Capacity:    16.000.900.661.248 bytes [16,0 TB]

Sector Sizes:     512 bytes logical, 4096 bytes physical

Rotation Rate:    7200 rpm

Form Factor:      3.5 inches

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ACS-3 T13/2161-D revision 5

SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 1.5 Gb/s)

Local Time is:    Thu Aug 12 08:42:07 2021 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x80)   Offline data collection activity

               was never started.

               Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:       (  120) seconds.

Offline data collection

capabilities:           (0x5b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               No Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     (1468) minutes.

SCT capabilities:           (0x003d)   SCT Status supported.

               SCT Error Recovery Control supported.

               SCT Feature Control supported.

               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000b   100   090   050    Pre-fail  Always       -       0

  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0

  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       8113

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       19

  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0

  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       167

 10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5

 23 Helium_Condition_Lower  0x0023   100   100   075    Pre-fail  Always       -       0

 24 Helium_Condition_Upper  0x0023   100   100   075    Pre-fail  Always       -       0

191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0

192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0

193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       59

194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       39 (Min/Max 18/47)

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       235012110

222 Loaded_Hours            0x0032   100   100   000    Old_age   Always       -       122

223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0

226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       532

240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1

ATA Error Count: 62 (device log contains only the most recent five errors)

   CR = Command Register [HEX]

   FR = Features Register [HEX]

   SC = Sector Count Register [HEX]

   SN = Sector Number Register [HEX]

   CL = Cylinder Low Register [HEX]

   CH = Cylinder High Register [HEX]

   DH = Device/Head Register [HEX]

   DC = Device Command Register [HEX]

   ER = Error register [HEX]

   ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 62 occurred at disk power-on lifetime: 163 hours (6 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 43 18 c8 6e 61 40  Error: UNC at LBA = 0x00616ec8 = 6385352

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  60 08 18 c8 6e 61 40 00   2d+08:55:41.070  READ FPDMA QUEUED

  61 08 f8 c0 6e 61 40 00   2d+08:55:41.070  WRITE FPDMA QUEUED

  b0 d5 01 e0 4f c2 00 00   2d+08:55:41.068  SMART READ LOG

  ec 00 00 00 00 00 a0 00   2d+08:55:41.061  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00   2d+08:55:40.768  SET FEATURES [Set transfer mode]

Error 61 occurred at disk power-on lifetime: 163 hours (6 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 43 18 c0 6e 61 40  Error: UNC at LBA = 0x00616ec0 = 6385344

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  60 08 18 c0 6e 61 40 00   2d+08:55:38.322  READ FPDMA QUEUED

  60 08 10 b8 6e 61 40 00   2d+08:55:37.299  READ FPDMA QUEUED

  60 08 08 b0 6e 61 40 00   2d+08:55:35.811  READ FPDMA QUEUED

  61 08 38 a8 6e 61 40 00   2d+08:55:35.810  WRITE FPDMA QUEUED

  b0 d5 01 e0 4f c2 00 00   2d+08:55:35.808  SMART READ LOG

Error 60 occurred at disk power-on lifetime: 163 hours (6 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 43 58 a8 6e 61 40  Error: UNC at LBA = 0x00616ea8 = 6385320

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  60 08 58 a8 6e 61 40 00   2d+08:55:32.476  READ FPDMA QUEUED

  61 08 30 a0 6e 61 40 00   2d+08:55:32.475  WRITE FPDMA QUEUED

  ec 00 00 00 00 00 a0 00   2d+08:55:32.469  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 00   2d+08:55:32.176  SET FEATURES [Set transfer mode]

  ec 00 00 00 00 00 a0 00   2d+08:55:32.170  IDENTIFY DEVICE

Error 59 occurred at disk power-on lifetime: 163 hours (6 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 43 f0 a0 6e 61 40  Error: UNC at LBA = 0x00616ea0 = 6385312

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  60 08 f0 a0 6e 61 40 00   2d+08:55:29.647  READ FPDMA QUEUED

  b0 d5 01 e0 4f c2 00 00   2d+08:55:29.646  SMART READ LOG

  61 08 20 f8 6d 61 40 00   2d+08:55:28.302  WRITE FPDMA QUEUED

  60 08 70 98 6e 61 40 00   2d+08:55:28.301  READ FPDMA QUEUED

  ec 00 00 00 00 00 a0 00   2d+08:55:28.294  IDENTIFY DEVICE

Error 58 occurred at disk power-on lifetime: 163 hours (6 days + 19 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 43 10 f8 6d 61 40  Error: UNC at LBA = 0x00616df8 = 6385144

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  60 08 d8 98 6e 61 40 00   2d+08:55:25.214  READ FPDMA QUEUED

  61 08 18 90 6e 61 40 00   2d+08:55:25.061  WRITE FPDMA QUEUED

  60 08 10 f8 6d 61 40 00   2d+08:55:25.061  READ FPDMA QUEUED

  b0 d5 01 e0 4f c2 00 00   2d+08:55:25.060  SMART READ LOG

  ec 00 00 00 00 00 a0 00   2d+08:55:25.053  IDENTIFY DEVICE

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%       128         -

# 2  Extended offline    Completed without error       00%        21         -

# 3  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

Quote tags swapped to code tags to preserve formatting -- NeddySeagoonLast edited by s|mon on Wed Aug 25, 2021 7:40 pm; edited 1 time in total

----------

## user

usually way for fresh hdd

1) badblocks (destructive mode because of fresh/empty hdd)

2) smartctl long run

3) fill hdd with random data

4) verifiy hdd s.m.a.r.t. values after run

=> ready for production or RMA if s.m.a.r.t. errors

In your case try non-dectrutive badblocks run around LBAs in question

1) remove hdd from OS usage (raid/mount)

2) non-dectrutive badblocks with tiny read/write focus

```
# badblocks -b 512 -c 8 -s -n -v /dev/disk/by-id/ata-<disk> <end LBA> <start LBA> 
```

hope that LBA in question can be readed successful than it willl re-write to hdd on same or differ sector

----------

## s|mon

Thanks, any hints on how to resolve the LBAs in questions to start with. I was not successfull - would the dmesg output be a starting point ( physical 4908105072640, root 8962, inode 128051, offset 798720) - any of them to be used.

And if i can't go with a specific LBA would it make sense to go with destructive over non-destructive (besides loosing data - but i have backup and my original old-setup still around)

Thanks

[edit]

if i try to read one of the sectors reported in dmesg with hdparm i get no entries in dmesg that it could not be read or so.

```

hdparm --read-sector 9586175648  /dev/sdf

```

[edit 2]

based on smartctl output

```

Error 62 occurred at disk power-on lifetime: 163 hours (6 days + 19 hours)

  ....

  40 43 18 c8 6e 61 40  Error: UNC at LBA = 0x00616ec8 = 6385352

...

Error 58 occurred at disk power-on lifetime: 163 hours (6 days + 19 hours)

 ...

  40 43 10 f8 6d 61 40  Error: UNC at LBA = 0x00616df8 = 6385144

```

```

badblocks -b 512 -c 8 -s -n -v /dev/sdf 6385352 6385144

Checking for bad blocks in non-destructive read-write mode

From block 6385144 to 6385352

Checking for bad blocks (non-destructive read-write test)

Testing with random pattern: done                                                 

Pass completed, 0 bad blocks found. (0/0/0 errors)

```

which confuses me a bit as it did not raise errors.

I would start a complete run - just not sure if a destructive one would be better to be more sure.

Quote tags swapped to code tags to preserve formatting -- Neddyseagoon

----------

## NeddySeagoon

s|mon,

```
[204725.249280] sd 9:0:0:0: [sdf] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s

[204725.249287] sd 9:0:0:0: [sdf] tag#5 Sense Key : Medium Error [current]

[204725.249291] sd 9:0:0:0: [sdf] tag#5 Add. Sense: Unrecovered read error - auto reallocate failed

[204725.249297] sd 9:0:0:0: [sdf] tag#5 CDB: Read(16) 88 00 00 00 00 02 3b 61 68 00 00 00 05 00 00 00

[204725.249300] blk_update_request: I/O error, dev sdf, sector 9586173952 op 0x0:(READ) flags 0x4000 phys_seg 160 prio class 0

[204725.249358] ata10: EH complete
```

The kernel is seeing errors.

The drive reports

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE 

  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0 

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0 
```

which contradicts what the kernel says.

A failed reallocate event, were it real, would increment the  Current_Pending_Sector  count.

This points to an interface error but again, the drive isn't seeing that either 

```
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0 
```

It's either the SATA controller on the motherboard, the SATA port on the motherboard or the SATA cable between the motherboard and drive.

Poor quality SATA cables do this. When they are disturbed, they work for a few months, then the error comes back.

If you have a spare SATA cable to hand, swap it.

If not, move the motherboard end to a spare SATA port.

Only test one thing at a time or the root cause will never be established.

-- edit --

Run the long test with smartctl. Thats like reading the entire drive to /dev/null but after the command is issued, no data passes over the interface.

It the long test passes, the drive is probably OK internally.

```
Extended self-test routine

recommended polling time:     (1468) minutes. 
```

Hmm that going to take a while

----------

## eccerr0r

16TB ! ... Not unexpected that tests will take a while and better to wait a day between test polls...?

Some hearsay: heard that some HDDs try to "silently" not mark the first few errors on the disk before it actually starts recording them in order to reduce RMAs, but I don't know how true this is.  Also since storage devices manufacturers do not warrant data stored on the disk anyway, there's no repercussion on these initial bad sectors?

Oh well...  Backup, backup, backup.

Not that it matters much, but are you using a SATA 1.5Gb controller with this disk or did it downshift?

----------

## NeddySeagoon

eccerr0r,

Good catch. 

```
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 1.5 Gb/s) 
```

 I missed that.

That already points to the interface being unhappy.

s|mon,

Tell us about the motherboard hardware.

```
lspci -nnk
```

would be good.

Is the drive attached by USB, if so, 

```
lsusb
```

 will be good too.

----------

## Jaglover

 *eccerr0r wrote:*   

> Some hearsay: heard that some HDDs try to "silently" not mark the first few errors on the disk before it actually starts recording them in order to reduce RMAs, but I don't know how true this is.

 

I have had two drives which developed bad sectors pending relocation, one had two and the other had four. After running badblocks in write mode the pending sectors disappeared without bad sector count changing, it remained at zero. Is it possible the bad sectors "healed"?   :Rolling Eyes: 

----------

## NeddySeagoon

Jaglover,

Yes. That happens.

A sector that cannot be read normally to relocate is is flagged an a Pending Sector.

A write can fix that and subsequent reads work.

More bizarrely, a successful read of the pending sector can fix it without relocating it too.

ddrescue is good at triggering these events.

----------

## s|mon

Thanks for the hints and information so far.

Yes big drives test for ages! 

Regarding the hint on cables and motherboard: i already swapped power and sata connector with the second drive before the 2nd run of scrub / smartctl -t long which raised more errors than the first run (so i'd hope/fear it is not the cables - which would be best as easiest to replace). I can try completly different cables too - either swapping from 2nd machine or some old ones being around.

The board has some years on it (lspci output below) - but i recently added another pci-sata controller (for esata connection of backup bays) which has internal ports - so i could try that as well for next test runs should the current one bring no further knowledge.

As everything was used with others (smaller disks before i'm not sure it is not the disk itself - but yes the errors are not as clear as i'd like them to be)

Currently i started a destructive badblocks on the drive (18% done after 6h so it will take some time - no errors so far).

Wil come back with updates/questions.

Currently data should be existing 3 times (once on the second disk of the raid, once on the prior two disk-raid and once on the old offline backup). So main concern is to find the broken piece for replacement/rma to be back under smooth operation soon - thanks again.

```

00:00.0 Host bridge [0600]: Intel Corporation Core Processor DRAM Controller [8086:0040] (rev 12)

   Subsystem: Micro-Star International Co., Ltd. [MSI] Core Processor DRAM Controller [1462:7587]

00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0042] (rev 12)

   Subsystem: Micro-Star International Co., Ltd. [MSI] Core Processor Integrated Graphics Controller [1462:7587]

   Kernel driver in use: i915

00:16.0 Communication controller [0780]: Intel Corporation 5 Series/3400 Series Chipset HECI Controller [8086:3b64] (rev 06)

   Subsystem: Micro-Star International Co., Ltd. [MSI] 5 Series/3400 Series Chipset HECI Controller [1462:7587]

00:1a.0 USB controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b3c] (rev 06)

   Subsystem: Micro-Star International Co., Ltd. [MSI] 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [1462:7587]

   Kernel driver in use: ehci-pci

00:1b.0 Audio device [0403]: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio [8086:3b56] (rev 06)

   Subsystem: Micro-Star International Co., Ltd. [MSI] 5 Series/3400 Series Chipset High Definition Audio [1462:7587]

   Kernel driver in use: snd_hda_intel

00:1c.0 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 [8086:3b42] (rev 06)

   Kernel driver in use: pcieport

00:1c.3 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 [8086:3b48] (rev 06)

   Kernel driver in use: pcieport

00:1c.4 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 [8086:3b4a] (rev 06)

   Kernel driver in use: pcieport

00:1c.5 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 6 [8086:3b4c] (rev 06)

   Kernel driver in use: pcieport

00:1c.6 PCI bridge [0604]: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 7 [8086:3b4e] (rev 06)

   Kernel driver in use: pcieport

00:1d.0 USB controller [0c03]: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [8086:3b34] (rev 06)

   Subsystem: Micro-Star International Co., Ltd. [MSI] 5 Series/3400 Series Chipset USB2 Enhanced Host Controller [1462:7587]

   Kernel driver in use: ehci-pci

00:1e.0 PCI bridge [0604]: Intel Corporation 82801 PCI Bridge [8086:244e] (rev a6)

00:1f.0 ISA bridge [0601]: Intel Corporation H57 Chipset LPC Interface Controller [8086:3b08] (rev 06)

   Subsystem: Micro-Star International Co., Ltd. [MSI] H57 Chipset LPC Interface Controller [1462:7587]

00:1f.2 SATA controller [0106]: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller [8086:3b22] (rev 06)

   Subsystem: Micro-Star International Co., Ltd. [MSI] 5 Series/3400 Series Chipset 6 port SATA AHCI Controller [1462:7587]

   Kernel driver in use: ahci

00:1f.3 SMBus [0c05]: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller [8086:3b30] (rev 06)

   Subsystem: Micro-Star International Co., Ltd. [MSI] 5 Series/3400 Series Chipset SMBus Controller [1462:7587]

   Kernel driver in use: i801_smbus

02:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 01)

   Subsystem: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:1060]

   Kernel driver in use: ahci

03:00.0 SATA controller [0106]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 03)

   Subsystem: Micro-Star International Co., Ltd. [MSI] JMB363 SATA/IDE Controller [1462:7587]

   Kernel driver in use: ahci

03:00.1 IDE interface [0101]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 03)

   Subsystem: Micro-Star International Co., Ltd. [MSI] JMB363 SATA/IDE Controller [1462:7587]

   Kernel driver in use: pata_jmicron

ff:00.0 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers [8086:2c61] (rev 02)

   Subsystem: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers [8086:8086]

ff:00.1 Host bridge [0600]: Intel Corporation Core Processor QuickPath Architecture System Address Decoder [8086:2d01] (rev 02)

   Subsystem: Intel Corporation Core Processor QuickPath Architecture System Address Decoder [8086:8086]

ff:02.0 Host bridge [0600]: Intel Corporation Core Processor QPI Link 0 [8086:2d10] (rev 02)

   Subsystem: Intel Corporation Core Processor QPI Link 0 [8086:8086]

```

quote tags -> code tags for easy reading -- NeddySeagoon

----------

## NeddySeagoon

s|mon,

Badblocks is writing then reading the entire drive. 

When it completes, post the SMART data again. I don't think the problem is the drive.

```
00:1f.2 SATA controller [0106]: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller [8086:3b22] (rev 06)

   Subsystem: Micro-Star International Co., Ltd. [MSI] 5 Series/3400 Series Chipset 6 port SATA AHCI Controller [1462:7587]

   Kernel driver in use: ahci

02:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 01)

   Subsystem: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:1060]

   Kernel driver in use: ahci

03:00.0 SATA controller [0106]: JMicron Technology Corp. JMB363 SATA/IDE Controller [197b:2363] (rev 03)

   Subsystem: Micro-Star International Co., Ltd. [MSI] JMB363 SATA/IDE Controller [1462:7587]

   Kernel driver in use: ahci 
```

You have a mix of SATA 2 and SATA 3 ports there, so why is the drive running at SATA 1 speeds?

That's an interface problem.

----------

## s|mon

Hi. To be honest i'm not sure why it is on Sata 1 speeds. It might be the controller:

 *Quote:*   

> 
> 
> l /sys/block/sd*
> 
> lrwxrwxrwx 1 root root 0 Aug  9 20:24 /sys/block/sda -> ../devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sda/
> ...

 

 *Quote:*   

> 
> 
> 00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 06)
> 
> 03:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 03)
> ...

 

So currently it would be the only one on the JMB SATA/IDE controller which might be limited - but if i read the datasheet it should also manage SATA II speeds.

As written before i had it on the cable (and therefore controller of the 2nd drive, which is currently sdc) but also there first errors where showing.

Next step would be to use a completly different cable and either connect to Intel or ASmedia controller which should allow higher link speed again.

(Current progress is 63% still 0/0/0 errors)

To be honest i did not pay to much attention to the used controllers so far (all ports onboard are in used currently).

----------

## NeddySeagoon

s|mon,

It will get slow now. Sector 0 is at the outside of the platter, As the patter spins at a constant RPM, you get more sectors per track at the outsize of the platter than you do near the spindle.

It changes by over 50%. This translates to the head/platter data rate being related to the cylinder being accessed.

Its not changed track by track. Tracks are grouped into zones.

Its a design feature, not a bug.

 "Ye cannae change the laws of physics!

----------

## s|mon

First pattern is done - not sure if it makes sense to wait for all of them or at least the second. So far no errors (still on slow connection currently).

Nothing new in dmesg or smart output either - but not sure if that would happen anyhow.

 *Quote:*   

> Checking for bad blocks in read-write mode
> 
> From block 0 to 3906469887
> 
> Testing with pattern 0xaa: done
> ...

 

----------

## NeddySeagoon

s|mon,

Has the reallocated sector count or pending sector count changed?

If not in all sounds good.

We don't know what caused the error and badblocks hasn't helped narrow it down. It couldn't as it uses the interface.

The SMART long test would have separated out the drive internals from the motherboard and data cable.

The original problem is still there as the drive is running at SATA1 speeds and you only have SATA2 and SATA3 ports.

We don't know why it down shifted.

----------

## s|mon

No change on smart values (besides hours and temp) - still 0 errors. I let the second pattern of badblocks finish and exchange connections tomorrow which would hopefully show normal speed.

----------

## s|mon

Badblocks finished the 2nd run also without any error nor did anything apear in dmesg or smartctl output.

I now reconnected the drive to different cable and controller. It shows now with better speeds.

 *Quote:*   

> SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s)
> 
> 

 

I am currently running 

```
btrfs replace
```

 to re-add it to the raid and will then restart smartctl and a scrub with some pause between hoping for the best (no new errrors).

Or would someone recommend sth. else/in addition?

Thanks a lot for all support so far!

----------

## Jaglover

You still have a communication problem, having 3 Gbit/s instead of 1.5 does not mean it is OK now.

----------

## s|mon

It has now 3.0 Gb/s which i assume is what my (old) controller (Intel 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 06)) is limited to - same as other drives in that system. The 6 Gb/s is what the interface on the disk itself could reach if put it on a new (sata 3) controller, or did i miss sth.?

----------

## Jaglover

It's OK then. I thought maybe it is a controller limitation, but better to be sure.

----------

## s|mon

Ok. Changing the controller seemed to help. No new errors since then after one restore of the raid, a scrub a full balance and smartctl-long-check.

Thanks once more for the support!

----------

