# Can't complete smartctl test, host reset [SOLVED]

## Tony0945

And kern log has a lot of entries like this:

```
Nov 17 19:50:14 X3 kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen

Nov 17 19:50:14 X3 kernel: ata5: irq_stat 0x00400000, PHY RDY changed

Nov 17 19:50:14 X3 kernel: ata5: SError: { RecovComm Persist PHYRdyChg 10B8B }

Nov 17 19:50:14 X3 kernel: ata5: hard resetting link

Nov 17 19:50:22 X3 kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

Nov 17 19:50:22 X3 kernel: ata5.00: configured for UDMA/133

Nov 17 19:50:22 X3 kernel: ata5: EH complete
```

Any idea what this means? Drive is WD Black 5TB, one year old.

EDIT:

I did the following from TTY-1:

```
/etc/init.d/samba stop

/etc/init.d/minidlna stop

/etc/init.d/xdm stop

umount /dev/sdc2

smartctl -t long /dev/sdc
```

And it completed. However, the test is supposed to run on a mounted drive, so I still want to know what was wrong.Last edited by Tony0945 on Thu Dec 01, 2016 11:41 pm; edited 1 time in total

----------

## christoph_peter_s

After a short Google search...

http://unix.stackexchange.com/questions/217113/what-causes-the-ata-exceptions-in-my-syslog-and-how-to-solve-them

I'd try to go along that road first.

----------

## Tony0945

Thanks for the reply but I don't know which road you mean since that link is about different errors. The BIOS was updated to the latest years ago (it's an old board), smartcl showed no errors when running with the disk unmounted. I don't use grub2, so the referenced file doesn't exist.

I don't think it's in IDE mode because:

```
 # hdparm -t -T /dev/sdc

/dev/sdc:

 Timing cached reads:   4026 MB in  2.00 seconds = 2013.28 MB/sec

 Timing buffered disk reads: 576 MB in  3.01 seconds = 191.49 MB/sec

```

 Looks like it's running at a good clip.

----------

## NeddySeagoon

Tony0945,

```
PHY RDY changed
```

is probably a spin speed issue.

The drive asserts ready once its done all its self checks and its up to speed.

Spin up is what takes the longest.

Move the drive so that the gravity vector is in a different plane to the one you have been running it in.

Check the 12v to the drive.

----------

## christoph_peter_s

 *Tony0945 wrote:*   

> Thanks for the reply but I don't know which road you mean since that link is about different errors. 

 

Well, what I mean, is to first doublecheck, that You really have no H/W issue. If I were You, I'd try a known good disk - as I wouldn't trust the disk until it is proven, that the disk is absolutely OK. And after that, I would begin double check any piece of S/W related to the disk. Bios of the board, firmware of the disk, try a different kernel - and so on. The simplest things first, the more tedious things later. 

Btw, I had troubles with my file server for more than a year. In the end it turned out to be the cable between the RAID controller and one HD. But the remark of Neddy looks pretty plausible. But the spin-up time is one of the SMART values, You most likely did doublecheck already. Maybe there is a jumper on the HD for delayed spin-up? It might be on the edge of being OK. Or maybe the power supply is starting to fail. A subtle problem can be difficult to track down. An old principle of debugging is, to check all individual components in a known good environment - but typically You can't do this for Your private machine as You would need to have a properly functioning identical machine and then start swapping devices. Anyway, double-check the SMART data, try a different SATA cable (and maybe a different mainboard port).

----------

## Tony0945

Swapping ports & cables is easy. I'll try that first. BTW, this isn't a disk drive, it's a hard drive, so I can't put a different disk in it.

Also can try cutting back on the unmounting steps that worked. i.e. shut down the applications that accessed the disk but don't unmount. if that's OK, start leaving services running until it fails agin to locate the service that's resetting it (if any).  PS is not that old and it's running on a UPS, but, yes, they might be problems.

----------

## Tony0945

Swapped port and that didn't help. In fact, it may have caused a problem on the drive because it switched from sdc to sdb and fstab tried to load it as JFS instead of ext4. fsck, fixed that.

Replaced the SATA cable. Problem seems solved. No more entries in kern.log and hdparm shows the drive running at full speed.

Ordered some black SATA cables from Cable Matters. I have a few older orange cables that I like to use for ata1 so I know which will be the boot drive, but I'd like to color code all the cables. Right now, the box that had the problem has three red cables.

----------

## NeddySeagoon

Tony0945,

If you run smartctl with the new cable, it should show that there have been no internal drive errors.

It may (or may not) show interface errors.

----------

## Tony0945

 *NeddySeagoon wrote:*   

> If you run smartctl with the new cable, it should show that there have been no internal drive errors.
> 
> It may (or may not) show interface errors.

 

I ran the short test with no errors. I'll run the long test tonight as it takes about 9 hours on this 5TB drive.

----------

