# Could hard disk damage slow the system?

## lordalbert

Hi.

Some day ago i was using gentoo on my old desktop, and that day it was slow to startup and run all program that start at the openinf of kde (kmix, battery and network monitor, kipboard, etc...).

I used it for few hours, but i noticed it is slow... so i saved my work on a flash drive and reboot the system.

At booting time, it checked the disk because it found some errors.... after 40minutes of checking, boot the system without kdm. So i reboot again, than it start kdm and kde, but opening kde's menu take 15minutes. Just click on K and wait to open the menu... 15 minutes!! 

So i shutdown system and run a clonezilla live. I made a completly backup, and ran startctl to check the healt of disk. It return "passed" message.

Could it be an hard disk problem? Also if smartctl return "passed" ?

Could it be another problem? What?

I wait to buy another HD. I thinked to erase disk with dd if=/dev/zero  and than re-make fs (ext4). And also to restore the copy of the system, made with clonezilla.

Any idea?

Thank you!

----------

## NeddySeagoon

lordalbert,

Yes, HDD damage will cause slowdowns but not of the magnitude you report.

There are two mechanisims.  The first is that the disk needs to do retries to read data.  If this happens, the drive will move the data to a spare physical sector, but he retries take extra pevolutions of the drive platter.

In extreme cases. The drive may recalibrate the head, which involves moving the head several tracks away from the data.

The system may also reset the SATA interface.  That will generate entries in dmesg.

Relocated sectors cause a slowdown are they are rarely on the same track as the unreadable sector, so there is extra latency incerred in reading them.

if you suspoct drive problems, install smartmantools and read the drives internal error log.

----------

## vaxbrat

Hard drive problems will definitely slow down the system performance.  When the drive encounters errors trying to read a sector, it will repeatedly try to read before either getting a good read or declaring an error.  If you see the hard drive light flashing like a heartbeat, this is probably happening.

Note the error can also be caused by a bad or loose cable connection between the controller and the drive

----------

## lordalbert

ok... so I'll install smartmontools and read the error messages... So i could understand if there's some problem...

----------

## energyman76b

IDE: every error means at least 30 seconds of lost time.

SATA: every error means a couple of seconds of lost time.

this is also one reason why you should use an ide-usb adapter to read out damaged IDE disks - the error timeouts are much shorter. 

So if you have a bunch of damaged sectors and the system is running into them, you can easily loose HOURS of time waiting for the timeouts&error handling. Not so much with sata - there is in the rage of minutes, but if you have still some IDE disk around... it is hell.

----------

## lordalbert

this is an error message (show in gsmartcontrol, a gui of smartmontools):

 *Quote:*   

> 
> 
> Complete error log:
> 
> SMART Error Log Version: 1
> ...

 

i'm not able to understand the message. I don't understand the type of error. Pheraps the lifetime value is too high? 

This is also an another screenshot. I think is time to change my hard disk..  :Sad: 

http://s30.postimg.org/xyr98gpe9/Screenshot.png

----------

## NeddySeagoon

lordalbert,

In your screenshot a value is failed when its less that or equal to threashhold.

By that measure your drive in OK.

The reallocated sector count being non zero is not a cause for concern.  

The pending sector count is zero, which is a good sign.  The drive does not know about any unreadable sectors.

Change your SATA data cable and try it for a few days.

----------

## lordalbert

if that values are in the normal-range... why it return "pre-failure" type message?   :Shocked: 

----------

## energyman76b

they don't.

That colums says 'if this goes out of hand, it is a sign that your harddrive will fail soon'.

Also please post smartctl - a /dev/sdX and not some gui screenshot.

----------

## lordalbert

 *smartctl -a /dev/sda wrote:*   

> 
> 
> smartctl 6.2 2013-07-26
> 
> === START OF INFORMATION SECTION ===
> ...

 

----------

## energyman76b

5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1 

ok, one sector was reallocated. Something you need to keep an eye o, but nothing to lose sleep on.

10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 3 

this is more worrisome. A lot more worrisome.

199 UDMA_CRC_Error_Count 0x003e 200 181 000 Old_age Always - 2683 

oh fuck. Well. Not really. This is either cabling or a bad, noisy power supply.

But if you still get these after replacing the cable and not laying it next to the power switching circuits of your mobo, you got a problem. 

This is how a healthy drive looks like:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0

  2 Throughput_Performance  0x0005   134   134   054    Pre-fail  Offline      -       103

  3 Spin_Up_Time            0x0007   138   138   024    Pre-fail  Always       -       398 (Average 396)

  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       1141

  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0

  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29

  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       5298

 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1140

192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1244

193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1244

194 Temperature_Celsius     0x0002   214   214   000    Old_age   Always       -       28 (Min/Max 18/42)

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%      5256         -

# 2  Short offline       Completed without error       00%      5202         -

# 3  Short offline       Completed without error       00%      5147         -

# 4  Short offline       Completed without error       00%      5102         -

# 5  Short offline       Completed without error       00%      5050         -

# 6  Extended offline    Completed without error       00%      5015         -

# 7  Short offline       Completed without error       00%      5004         -

# 8  Short offline       Completed without error       00%      4927         -

# 9  Extended offline    Completed without error       00%      4862         -

#10  Short offline       Completed without error       00%      4844         -

#11  Short offline       Completed without error       00%      4749         -

#12  Short offline       Completed without error       00%      4683         -

#13  Short offline       Completed without error       00%      4635         -

#14  Short offline       Completed without error       00%      4575         -

#15  Extended offline    Completed without error       00%      4562         -

#16  Short offline       Completed without error       00%      4492         -

#17  Short offline       Completed without error       00%      4418         -

#18  Short offline       Completed without error       00%      4362         -

#19  Short offline       Completed without error       00%      4298         -

#20  Short offline       Completed without error       00%      4228         -

#21  Short offline       Completed without error       00%      4158         -

you really should run smartd in the background and let it test on a regular basis.

----------

## lordalbert

After launch smartd (with this smart config line: /dev/sda -H -l error -l selftest -f -m root -M exec /usr/share/smartmontools/smartd-runner ) , i have only this log:

 *$ cat /var/log/syslog | grep smartd wrote:*   

> 
> 
> Sep 24 10:51:56 swing smartd[2858]: smartd 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-36-generic] (local build)
> 
> Sep 24 10:51:56 swing smartd[2858]: Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
> ...

 

and

 *cat /var/lib/smartmontools/smartd.ST3500320AS-9QM1STG7.ata.state wrote:*   

> 
> 
> ata-error-count = 1
> 
> ata-smart-attribute.0.id = 1
> ...

 

after some hours, it give no more messages in /var/log/syslog.

PS: After the slowness problems, i formatted the disk with dd if=/dev/zero of=/dev/sda  and than reinstalling the operation system (but i have a problem to restore my gentoo installation.. so now i'm using ubuntu in order to install a system in a little time..)  and NOW, it seems to works without any problem.

Is better to buy a new hard disk or can i wait and hope that this disk could be working for more time?

----------

## energyman76b

smartd does things like scheduling self tests and reporting to you, if there is a problem running those tests.

Google is your friend.

One reallocated sector is usually not a problem.

But spin retries are worrisome.

----------

## lordalbert

I tried to change sata cable and power cable. But spin retries and UDMA_CRC_Error_Count values don't change.  They are values occurred in past, righ? So, they can't change... but i should watch if they increment themself?

Some month ago i changed my power supply, because the old one makes some noise.... Now i have a corsair CX430, it seems a good power supply. I think it was not the problem.

So, is better to change hard disk? (now it seems work correctly...)

----------

## energyman76b

 *lordalbert wrote:*   

> I tried to change sata cable and power cable. But spin retries and UDMA_CRC_Error_Count values don't change.  They are values occurred in past, righ? So, they can't change... but i should watch if they increment themself?
> 
> Some month ago i changed my power supply, because the old one makes some noise.... Now i have a corsair CX430, it seems a good power supply. I think it was not the problem.
> 
> So, is better to change hard disk? (now it seems work correctly...)

 

they will never go down. Correct. And yes, you should look out for changes. If the respin and UDMA_CRC_ERR_Count values are stable, you should be fine. 

Btw, 'seems' and some random brand name means nothing when it comes to PSUs.

----------

