# weird harddrive problem [SOLVED]

## Amity88

There is a periodic clicking sound coming from my laptop hdd, only when no program is explicitly accessing it. The problem temporarily stops for a few weeks when I do a read-write test using e2fsck. The test itself reported no errors.

I ran lm-profiler to see what programs were accessing the disk while it clicked and found out that the only program was the journaling thing (jbd2) every 5 seconds.... and strangely while I ran lm-profiler the clicking stopped, only to come back after I quit the program....

The clicking doesn't happen when I boot into pure CLI....

I did a google search on it, most people says that it's because of bad sectors... which doesn't seem to be the case here. Any pointers would be helpful.

kernel : 2.6.36-r5

DE: xfce4.8.0

----------

## mbar

run the following preferably from CLI or even better some livecd:

```
smartctl -t long /dev/sdX
```

then, wait for some hours and look at the output of:

```
smartct -a /dev/sdX
```

----------

## ppurka

Look at the output of 

```
smartctl -a /dev/sda | grep Load_Cycle
```

If the last number in the output is increasing by 1 after every click, it means that your hard disk has some aggressive power management settings and it is putting the hard disk to sleep every few seconds. And then, some process such as jbd2 is accessing the hard disk which is spinning it up yet again.

The maximum spin down/up cycles that laptop hard disks can tolerate is between 300,000 to 600,000. If your hard disk is spinning up/down so frequently (as in every few seconds), you will hit that limit very fast (within a year or two). This is a big problem. The solution is to try to set less aggressive power management settings using hdparm -B and hdparm -S.

----------

## Amity88

@ppurka,

    I'm pretty sure that it's not a power management issue cos I had that problem before when it was running on batteries. I had to modify a pm_utils file to disable hdd powermanagement altogether also it sounded more like a motor turning on&off rapidly back then. I took your advice and checked it anyways, Load_Cycle stays fixed at 98.

@mbar, 

    I ran the test a couple of times. It states "no errors found"... heres the output of smartctl -t long /dev/sda

```

smartctl 5.40 2010-10-16 r3189 [x86_64-pc-linux-gnu] (local build)

Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".

Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 55 minutes for test to complete.

Test will complete after Thu Mar 24 23:46:50 2011

```

and smartctl -a /dev/sda 

```

smartctl 5.40 2010-10-16 r3189 [x86_64-pc-linux-gnu] (local build)

Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Model Family:     Fujitsu MHV series

Device Model:     FUJITSU MHV2080BH PL

Serial Number:    NW9ZT723STTL

Firmware Version: 892C

User Capacity:    80,026,361,856 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   7

ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a

Local Time is:    Fri Mar 25 00:10:34 2011 IST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x82)   Offline data collection activity

               was completed without error.

               Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:        ( 471) seconds.

Offline data collection

capabilities:           (0x5b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               No Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     (  55) minutes.

SCT capabilities:           (0x003d)   SCT Status supported.

               SCT Error Recovery Control supported.

               SCT Feature Control supported.

               SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   100   100   046    Pre-fail  Always       -       133512

  2 Throughput_Performance  0x0005   100   100   030    Pre-fail  Offline      -       18612543

  3 Spin_Up_Time            0x0003   100   100   025    Pre-fail  Always       -       1

  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       6563

  5 Reallocated_Sector_Ct   0x0033   100   100   024    Pre-fail  Always       -       0 (2000, 0)

  7 Seek_Error_Rate         0x000f   100   100   047    Pre-fail  Always       -       3298

  8 Seek_Time_Performance   0x0005   100   100   019    Pre-fail  Offline      -       0

  9 Power_On_Seconds        0x0032   084   084   000    Old_age   Always       -       2h+17m+01s

 10 Spin_Retry_Count        0x0013   100   100   020    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2855

192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       93

193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       47080

194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       41 (Min/Max 21/55)

195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       248

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0 (0, 6942)

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x000f   100   100   060    Pre-fail  Always       -       22436

203 Run_Out_Cancel          0x0002   100   100   000    Old_age   Always       -       2632735653648

240 Head_Flying_Hours       0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%      8220         -

# 2  Extended offline    Aborted by host               80%      8219         -

# 3  Extended offline    Aborted by host               80%      8219         -

# 4  Extended offline    Aborted by host               80%      8219         -

# 5  Extended offline    Completed without error       00%      8214         -

# 6  Extended offline    Completed without error       00%      7758         -

# 7  Short offline       Completed without error       00%      7757         -

# 8  Extended offline    Aborted by host               80%      7540         -

# 9  Short offline       Completed without error       00%      7540         -

#10  Extended offline    Aborted by host               80%         6         -

#11  Short offline       Completed without error       00%         6         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

----------

## ppurka

Your load_cycle_count is not 98, it is 47080

```
193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       47080
```

----------

## Amity88

sorry, my mistake.... I checked it again. Now Load_Cycles is 47081 and still doesn't change after the clicks..... I did a read-write test last night, again it showed no errors..... but unlike before the clicks haven't gone away   :Evil or Very Mad: 

----------

## Amity88

I checked dmesg... it shows some weird activity. I'm pretty sure this usually doesnt happen (the last part), heres the pastebin:

http://pastebin.com/xw1DjVtB

----------

## ppurka

The tail of your dmesg is weird. I don't know what is up with your hard disk. Your mounting information from kernel is different from mine. Yours:

```
[    2.273852] EXT3-fs (sda7): error: couldn't mount because of unsupported optional features (240)

[    2.308062] hub 2-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x101

[    2.322855] EXT4-fs (sda7): mounted filesystem with ordered data mode. Opts: (null)
```

Mine:

```
[    2.341678] EXT3-fs (sda6): error: couldn't mount because of unsupported optional features (240)

[    2.342281] EXT2-fs (sda6): error: couldn't mount because of unsupported optional features (240)

[    2.368127] EXT4-fs (sda6): mounted filesystem with ordered data mode. Opts: (null)

[    2.369871] VFS: Mounted root (ext4 filesystem) readonly on device 8:6.

```

As you can see at teh end it tells me that it mounted an ext4 filesystem, whereas in your case, it says it mounted an ext2 filesystem which is really weird.

----------

## Amity88

hmm... I fixed that, here's the new dmesg : http://pastebin.com/M6yqqKP2

But, the harddrive still clicks   :Sad: 

----------

## NeddySeagoon

Amity88,

Clicking noises from harddrives come from two sources - both related to head movements.

1. The parking and unparking of the heads with each spinup.  Spinups are caused by power cycling the drive or power management turning off the spin motor.

Laptop drives are designed to tolerate this much better than desktop drives ... its already been covered above.

2. From the drive recalibrating the head after a failed seek. This is a very bad thing. It means the drive does not know where the head is. Its normally a sign of a dying drive.

As well as failed seeks, it can be caused by failed reads or writes.  After a failed write, the drive will remap the faulty sector to space space provided for the purpose.  

New drives look perfect - no bad sectors.  They are not really, the drive maps out bad sectors when the drive is new and throughout the life of the drive.  This is invisible to the operating system.

The OS will see failed reads as the data cannot be be recovered.  The drive will try rereads in an attempt to map remap the faulty sector. 

Writing to such an unreadable sector will either work (depending on the error) or force a sector remap.  

Keep an eye on 

```
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0 (0, 6942) 

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0 
```

Thats a measure of failed sectors that have been remapped.

As the SMART log says 

```
SMART Error Log Version: 1

No Errors Logged 
```

we know that no errors have happened since the drive was new ... maybe since the firmware was last flashed.

These lines  

```
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2855

193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       47080 
```

Show you have an average of 16.5 head loads for every power cycle.

On a desktop, thats a lot. On a laptop, it depends on the power on time.

There is nothing in your logs but emperical evidence suggests otherwise.

----------

## Amity88

NeddySeagoon,

    Thanks for the quick reply. I've checked these values repeatedly and there wasn't any change

```
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0 (0, 6942) 

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0 
```

 *NeddySeagoon wrote:*   

> Amity88,
> 
> These lines  
> 
> ```
> ...

 

    Sometime back, the harddrive power management was set to aggressive on battery. It would keep turning on and off..... scary, so I disabled it  completely.... Could this be the cause of the high Load_Cycle_Count?

   Also, I checked the SMART values before and after the click. Check this out : http://pastebin.com/ArmqHVPH ... The first two were taken in quick succession and the third one immediately after the click. I've only copied the parameters that changed.

   I found this link specifically on the problem I'm having (clicks only on idle).

 *Quote:*   

> 
> 
> Another possible cause is the drive firmware running a low level surface media check periodically during idle time.
> 
> It is not known whether the problem is a sign of impending drive failure. The root cause of the problem is not yet known. It is quite likely to be a normal mode of drive operation. The problem is very prevalent. 
> ...

 

    What would you suggest that I do? I've already taken backups.... would a reformat to EXT3 solve it? I don't think the clicks happen while running windows(prob cos there's always some faint activity).

----------

## padoor

disable spindown in power management 

in laptops the drive failure occurs soon after clickety click sound comes when a file is read

i have lost 2 drives like that in last 4 years

there is drive check program in seagate support 

it can tell life laft for the drive

best is not to depend on the failing drive.

----------

## NeddySeagoon

Amity88,

Its not a file system issue. Its a drive hardware issue.

It looks like the click is part of the recovery from a seek error. Seek errors are normally a sign of impending failure.

Get the drive test software from your hard drive manufactuers website and run the tests. Be careful of write tests - they may destroy your data.

Many of these tools check your warranty status over the internet and offer to print a RMA if the drive is determined to be faulty and still within the warranty period.

----------

## Amity88

NeddySeagoon,

   I've already used the comprehensive hdd test from bios(I think it works similar to bios). It's been about 4 years now since I first turned on the laptop, so no question of warranty, I guess I'll just have to prepare for a hdd failure one of these days........ hmm, strangely the clicks have stopped for now...... Thanks for your help, I really appreciate it   :Smile: 

@padoor, 

   It isn't a power management issue. I had it before and disabled it completely..... my disk is 4 years old, some 40,000 load cycles.... the manufacturer says it'll go about 300,000   :Very Happy:  ..... the clicks have gone now, seems to be a periodic thing..... clicks for a few days, stops for a couple of months... clicks for a few days, stops for 2 months... and the cycle repeats   :Mad: 

----------

## NeddySeagoon

Amity88,

As it appears to be seek error related, it could be related to a single track or small group of tracks on the platter surface.

This means it may well be associated with whatever file(s) are located on the drive at that position. If you don't attempt to access the file(s), it won't happen.

It may not happen every time anyway.

----------

## padoor

the new hard drives have 5 years warranty 

4years laptop is not that old

still it can be used for another 2 3 years if used sparingly on processor load

----------

## Amity88

NeddySeagoon,

   hmm, if it was a prob with one/(one group) tracks, wouldn't it be reported in the test? and wouldn't it be added to the list... I ran the manufacturer's diagnostic utility sometime back.... again no errors, SMART status is 'good'.... And when it clicks, it only clicks when the drive is idle.... never during any file access..... I'll have to prepare for the eventual breakdown I guess.

Padoor,

   The laptop's warranty was 1 year so I assumed that it applied for everything. I think I'll check up on that, could be useful  :Smile:  .... I do feel that it's a bit too early for the hdd to give up on me. I have a p4 desktop that's a little more than 8 years old, and the harddrive is kicking strong  :Smile: 

----------

## NeddySeagoon

Amity88,

How do you know the drive is idle?

Do you have swap ... is it in use?

swap will not appear in any list of open files.

The warranty on the whole laptop may be a year from the laptop vendor.  Individual parts may well be longer.

SMART collects all sorts of data about your drive - which you have posted, then uses a huristic to come up with a SMART Pass/Fail flag.

When you look at the information collected by SMART and apply your own knowledge you get a different answer.

I can't predict when the drive will fail - do keep you backups up to date.  You may not get much notice.

----------

## Amity88

NeddySeagoon,

    Actually... I'm not 100% sure about the idling bit. Just the general signs, lights, sounds and lm-profiler to see if any programs are attempting to access the drive. As for the swap, 'free' reports that it's not being used, besides only some 10% of physical memory is used.... yes, lack of backups messed things up once..... so I made it a point to keep up to date backups.

----------

## padoor

you have marked it as solved?

based on what you say it is solved

----------

## ppurka

 *padoor wrote:*   

> you have marked it as solved?
> 
> based on what you say it is solved

 I suppose he gave up   :Razz: 

----------

## Amity88

Padoor,

   Solved as in it decided to stop troubling me with the clicks on its own and Neddy pointed out that based on my logs it's most likely caused by seek errors which indicate an impending drive failure   :Sad: 

Ppurka,

    err... that's another  way of putting it   :Laughing:   ... but considering that it's one of those things that you can't tinker with to solve, I guess I should have marked it 'unsolvable' instead   :Wink: 

----------

## padoor

 :Smile:   :Very Happy: 

all the best    :Smile: 

PS :

i was interested in solving the clickty click problem

so i thought you found a solution   :Sad: 

----------

## Amity88

oh, you have a similar problem?.... 'clickety click'?.... I don't have the rumbling sound.... it's just a click .... 5..10s pause.... click ...... The rumbling seem to increase as the drives become older, dunno why. I'm planning on doing some experiments with my desktop harddrive, it would be a bad idea to mess around with this one, I'll post if something interesting comes up..... If it's just a problem with the noise, have you tried the acoustic management?... also this and this seems interesting

I checked the warranty, you were right, the harddrive had a 3 year warranty (ended last year).

P.S. do you think the 'solved' is a bit misleading?   :Confused: 

----------

## padoor

no problems  :Smile: 

anyways who have such problems are sure to read this thread in future.

i had clickety click sound only when it was reading the first track and system used to reboot if such a noise comes up

then would work for sometime and the same thing will happen again

i changed the hdd which was already 5 yrs old then new drive works fine for last 4 yrs.

i have one more year warrantee on it.  :Smile: 

desktop samsung 80GB failed fully in 3 years. it is all in the game so we can't complain much

----------

