# Hard disk resets (harmlessly) when accessing

## anjunatux

Hi there.  My first time posting on these forums, so my bad if something doesn't jive about my post.

I'm looking into an odd issue I'm having where my hard disk will "reset" when it performs a read operation.  No data has been lost (yet) and I can continue using the disk normally, the only change being longer read times due to the resetting.

Here's the dmesg action:

```

[ 5155.300388] ata1.00: exception Emask 0x10 SAct 0xff SErr 0x400000 action 0x6 frozen

[ 5155.300391] ata1.00: irq_stat 0x08000000, interface fatal error

[ 5155.300397] ata1.00: cmd 61/40:00:e8:75:b5/00:00:06:00:00/40 tag 0 ncq 32768 out

[ 5155.300398]          res 40/00:20:00:b5:b1/00:00:06:00:00/40 Emask 0x10 (ATA bus error)

[ 5155.300403] ata1.00: cmd 61/40:08:38:68:b1/00:00:06:00:00/40 tag 1 ncq 32768 out

[ 5155.300404]          res 40/00:20:00:b5:b1/00:00:06:00:00/40 Emask 0x10 (ATA bus error)

[ 5155.300409] ata1.00: cmd 61/40:10:18:7e:b1/00:00:06:00:00/40 tag 2 ncq 32768 out

[ 5155.300410]          res 40/00:20:00:b5:b1/00:00:06:00:00/40 Emask 0x10 (ATA bus error)

[ 5155.300415] ata1.00: cmd 61/40:18:f0:90:b1/00:00:06:00:00/40 tag 3 ncq 32768 out

[ 5155.300416]          res 40/00:20:00:b5:b1/00:00:06:00:00/40 Emask 0x10 (ATA bus error)

[ 5155.300421] ata1.00: cmd 61/80:20:00:b5:b1/00:00:06:00:00/40 tag 4 ncq 65536 out

[ 5155.300422]          res 40/00:20:00:b5:b1/00:00:06:00:00/40 Emask 0x10 (ATA bus error)

[ 5155.300427] ata1.00: cmd 61/c0:28:28:76:b5/00:00:06:00:00/40 tag 5 ncq 98304 out

[ 5155.300428]          res 40/00:20:00:b5:b1/00:00:06:00:00/40 Emask 0x10 (ATA bus error)

[ 5155.300432] ata1.00: cmd 61/40:30:90:91:b1/00:00:06:00:00/40 tag 6 ncq 32768 out

[ 5155.300433]          res 40/00:20:00:b5:b1/00:00:06:00:00/40 Emask 0x10 (ATA bus error)

[ 5155.300439] ata1.00: cmd 60/08:38:00:96:95/00:00:04:00:00/40 tag 7 ncq 4096 in

[ 5155.300440]          res 40/00:20:00:b5:b1/00:00:06:00:00/40 Emask 0x10 (ATA bus error)

[ 5155.300445] ata1: hard resetting link

[ 5155.760110] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

[ 5155.761579] ata1.00: configured for UDMA/33

[ 5155.772126] ata1: EH complete

```

The amount of those repeating cmd/res portions varies, but the rest is accurate of each instance.  My lspci output:

```

00:00.0 Host bridge: Advanced Micro Devices [AMD] Family 12h Processor Root Complex

00:02.0 PCI bridge: Advanced Micro Devices [AMD] Family 12h Processor Root Port

00:04.0 PCI bridge: Advanced Micro Devices [AMD] Family 12h Processor Root Port

00:11.0 SATA controller: Advanced Micro Devices [AMD] Hudson SATA Controller [AHCI mode] (rev 40)

00:12.0 USB controller: Advanced Micro Devices [AMD] Hudson USB OHCI Controller (rev 11)

00:12.2 USB controller: Advanced Micro Devices [AMD] Hudson USB EHCI Controller (rev 11)

00:13.0 USB controller: Advanced Micro Devices [AMD] Hudson USB OHCI Controller (rev 11)

00:13.2 USB controller: Advanced Micro Devices [AMD] Hudson USB EHCI Controller (rev 11)

00:14.0 SMBus: Advanced Micro Devices [AMD] Hudson SMBus Controller (rev 13)

00:14.1 IDE interface: Advanced Micro Devices [AMD] Hudson IDE Controller

00:14.2 Audio device: Advanced Micro Devices [AMD] Hudson Azalia Controller (rev 01)

00:14.3 ISA bridge: Advanced Micro Devices [AMD] Hudson LPC Bridge (rev 11)

00:14.4 PCI bridge: Advanced Micro Devices [AMD] Hudson PCI Bridge (rev 40)

00:14.5 USB controller: Advanced Micro Devices [AMD] Hudson USB OHCI Controller (rev 11)

00:15.0 PCI bridge: Advanced Micro Devices [AMD] Device 43a0

00:16.0 USB controller: Advanced Micro Devices [AMD] Hudson USB OHCI Controller (rev 11)

00:16.2 USB controller: Advanced Micro Devices [AMD] Hudson USB EHCI Controller (rev 11)

00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h Processor Function 0 (rev 43)

00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h Processor Function 1

00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h Processor Function 2

00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h Processor Function 3

00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h Processor Function 4

00:18.5 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h Processor Function 6

00:18.6 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h Processor Function 5

00:18.7 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h Processor Function 7

01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Turks [Radeon HD 6570]

01:00.1 Audio device: Advanced Micro Devices [AMD] nee ATI Turks HDMI Audio [Radeon HD 6000 Series]

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)

04:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)

```

And the uname just in case:

```

Linux gentoo 3.3.0-gentoo #2 SMP Tue Mar 27 15:53:00 PDT 2012 x86_64 AMD Athlon(tm) II X4 631 Quad-Core Processor AuthenticAMD GNU/Linux

```

The hard disk is a SATA Seagate SV35.5 ST3500410SV.  I only have one Linux partition, two partitions for windows and a swap partition.  The linux partition is btrfs.

I don't initially think this is a hardware issue because I don't seem to be having problems with the windows partition, but if I have to migrate disks I think I can (before it's too late!)

Thanks in advance for any help!

----------

## DirtyHairy

Some suggestions:

- Failing hardrive. Check the SMART log for anything suspicious using smartctl.

- Power. If you have another power supply handy, check whether your's might be flaky by swapping them.

- Cables. Check whether the sata connection is properly plugged and try switching the cable.

- Faulty sata interface. If you have one handy, try another interface card (provided this is not a laptop  :Wink:  ).

- Drivers. I get similar resets on my old laptop due to glitches in the radeon driver. Try removing modules and check whether it alleviates the problem.

Did you make any changes to the system before the issue first appeared?

----------

## kite14

I have a similar problem with my Seagate ST3500320AS, occurring only during boot:

```
[    8.866381] reiserfs: enabling write barrier flush mode

[    8.867234] ata1.00: exception Emask 0x10 SAct 0x2 SErr 0x400101 action 0x6 frozen

[    8.867238] ata1.00: irq_stat 0x08000000, interface fatal error

[    8.867247] ata1.00: cmd 61/08:08:88:2d:45/00:00:00:00:00/40 tag 1 ncq 4096 out

[    8.867248]          res 40/00:08:88:2d:45/00:00:00:00:00/40 Emask 0x10 (ATA bus error)

[    8.867254] ata1: hard resetting link

[   14.203353] ata1: link is slow to respond, please be patient (ready=0)

[   18.896687] ata1: COMRESET failed (errno=-16)

[   18.896692] ata1: hard resetting link

[   19.216692] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

[   19.219912] ata1.00: configured for UDMA/133

[   19.219925] ata1: EH complete
```

the boot process freezes for 20-30 secs then resumes with no further errors.

So far I have excluded broken cables or interface; 

the S.M.A.R.T. tests are OK (Gnome disk utility says "Healthy disk"), but I have a growing number of disk sectors which have been reallocated due to read/write errors (when I started monitoring my disk they were ~70 bad sectors, now they are 138...)

Probably the disk is dying ...

----------

## anjunatux

 *DirtyHairy wrote:*   

> Some suggestions:
> 
> - Failing hardrive. Check the SMART log for anything suspicious using smartctl.
> 
> - Power. If you have another power supply handy, check whether your's might be flaky by swapping them.
> ...

 

Thanks for the assistance, I've now looked into these.  SMART does not report anything negative.  I ran an extended test with smartctl and it passed without error.  I have tried switching the cable and plugging in to a different port on the motherboard.

The only strange thing that I should've reported earlier was that this error randomly crops up in certain sessions; two days couldl go by where I'm not having this problem, then the next day this happens.  So I will test these things when it reoccurs, like rmmod'ing my radeon module.

The only system change I can think of that would be problematic is upgrading from kernel series 3.2 to 3.3, since the Linux partition uses the WIP that is btrfs.

----------

## roarinelk

get new cables or try to re-route them inside the case.  the logs say there's something wrong

on the way between disk and board, could be excessive EMI.

----------

## anjunatux

I am not sure what changed, but the problem is no longer occurring.  I implemented roarinelk's solution, which may have been the ticket!  Nevertheless I haven't been getting these errors anymore.

Thanks for everyone who gave feedback.  This can be marked as closed/solved.

----------

## anjunatux

JUST KIDDING: still having the problem.  I even hear the disk doing something as it resets.  Seems to freeze the execution of other things too.  dmesg is a bit different this time - notice the 3-second pause:

```

[   46.341265] ata2.00: exception Emask 0x10 SAct 0x4 SErr 0x10200 action 0xe frozen

[   46.341268] ata2.00: irq_stat 0x00400000, PHY RDY changed

[   46.341274] ata2.00: cmd 60/88:10:d0:81:fa/03:00:05:00:00/40 tag 2 ncq 462848 in

[   46.341275]          res 40/00:10:d0:81:fa/00:00:05:00:00/40 Emask 0x10 (ATA bus error)

[   46.341281] ata2: hard resetting link

[   49.565124] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

[   49.567034] ata2.00: configured for UDMA/133

[   49.578118] ata2: EH complete

```

Any other suggestions?   :Confused: 

----------

## roarinelk

well, could either be the drive, or worst case, the controller on the board.

Test the drive on another mainboard and the current board with another (known working) disk.

----------

## NeddySeagoon

anjunatux,

Your more recent

```
[   49.565124] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 

[   49.567034] ata2.00: configured for UDMA/133 

[   49.578118] ata2: EH complete 
```

looks better then your original

```
[ 5155.760110] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) 

[ 5155.761579] ata1.00: configured for UDMA/33 

[ 5155.772126] ata1: EH complete 
```

Now the interface resets to SATA2 speeds, which is correct for your drive.  In the original, notice the UDMA/33, which is wrong even for SATA1. Both should be UDMA/133.

The evidence suggests an interface issue but a HDD problem cannot be ruled out.

Replace the SATA data cable with a good quality new one.

----------

## anjunatux

OK, I will see what I can do about getting a new cable.  Thanks for the input Neddy.

Was also wondering if it'd be worth (the potential risk of) a BIOS firmware flash?  This is my board: http://www.gigabyte.com/products/product-page.aspx?pid=3988#bios

----------

## NeddySeagoon

anjunatux,

The BIOS is not involved in disk data trasnfers after your bootloader has exitied, so its not worth a BIOS update for this issue.

----------

## anjunatux

Looks like this was a power supply issue.  Today it got to the point where the drive was cycling repeatedly before loading the bootloader.  Swapped the PSU out and I think I'm golden.

----------

