# Is my harddrive about to die?

## johntramp

Hi, I find this in dmesg, does it mean my harddrive is about to die? *Quote:*   

> 030ff>] sysenter_past_esp+0x54/0x75
> 
> ata1: command 0x25 timeout, stat 0xd0 host_stat 0x61
> 
> ata1: status=0xd0 { Busy }
> ...

 thanks

----------

## itsmegawtf

maybe, for more comments - run fsck.{yourFS}.

----------

## feld

have u tried a different kernel to make sure u dont have something buggy going on.

i'm no expert but that doesnt look the greatest.

-Feld

----------

## i92guboj

 *itsmegawtf wrote:*   

> maybe, for more comments - run fsck.{yourFS}.

 

Option 1: buggy kernel drivers.

Option 2: your drive is saying good-bye.

In none of these situations would I run fsck. That can only mess up your filesystem. Nothing good. I would try first another kernel, if possible, a livecd precompiled one or something like that. It would be better because that way you make sure that there is no user introduced errors in the kernel.

Then, I would emerge smartctl and do a smartctl -H /dev/hda (or whatever device it is). That will tell you the health state of your hd. As last ressort, do a backup if you can (if normal programs fail there is always the command "dd") and run fsck when the drive is unmounted, from a livecd for example.

----------

## flybynite

You need to run smartctl to check the status of the disk.  This will check the builtin failure detection of the disk.

* sys-apps/smartmontools

     Available versions:  5.33

     Installed:           5.33

     Homepage:            http://smartmontools.sourceforge.net/

     Description:         control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology 

emerge -va smartmontools

Then

smartctl -a /dev/hda

Which will show something like this for a good disk:

```

gate1 ~ # smartctl -a /dev/hdc

smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model:     ST3200822A

Serial Number:    3LJ06ZE9

Firmware Version: 3.01

User Capacity:    200,049,647,616 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   6

ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2

Local Time is:    Sun Oct  9 22:28:26 2005 CDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

                                        was completed without error.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever

                                        been run.

Total time to complete Offline

data collection:                 ( 430) seconds.

Offline data collection

capabilities:                    (0x5b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        No General Purpose Logging support.

Short self-test routine

recommended polling time:        (   1) minutes.

Extended self-test routine

recommended polling time:        ( 111) minutes.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   051   047   006    Pre-fail  Always       -       72127917

  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       90

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       568101779

  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       9521

 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       112

194 Temperature_Celsius     0x0022   044   052   000    Old_age   Always       -       44

195 Hardware_ECC_Recovered  0x001a   051   046   000    Old_age   Always       -       72127917

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0

202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%      6427         -

# 2  Short offline       Completed without error       00%      6185         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

----------

## johntramp

 *Quote:*   

> odysseus john # smartctl -a /dev/sda
> 
> smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
> 
> Home page is http://smartmontools.sourceforge.net/
> ...

  :Sad: 

I tried knoppix and got no errors,  but I am not quite sure what causes the errors yet and did not spend long playing around with files on the drive.

I think I will just invest in a new harddrive.

----------

## Sleipnir

Try to find a drive fitness test (DFT) on the homepage of

your HDD manufacturer. Thats the only reliable way to

test if the drive is ok.

----------

## flybynite

 *johntramp wrote:*   

> 
> 
> I think I will just invest in a new harddrive.

 

Probably the best idea.  However if you want to make sure here is a link to the Drive Test from Western Digital.  I just wouldn't run the disk too much till you get your data off it! 

You can download a bootable cd or floppy from this page to run tests.

http://support.wdc.com/download/index.asp?cxml=n&pid=999&swid=30

I did notice that libata has smart support in the current dev version.  Just not merged into mainline yet.  It shouldn't be too long now....

----------

## johntramp

hmm...     the same day as I bought a new harddrive to backup data on the old one, it refueses to mount. I have tried to dd the filesystem to another harddrive but that crased out with I/O errors after ~50mb.

Is there anything I can do to get the data off it?  Someone told me to put the drive in a freezer overnight :S  sounds like an old wive's tale to me tho.

I haven't run that test yet flybynite tho, I want to wait to see if I can get anything back first.

----------

## i92guboj

A freezer? Well, at least it's something original. For the heads is as bad the cold as the heat. That, without counting that im almost sure that you havent an hermetic enough thingy to put the hd into so the humidity does not destroy the little life that may remain on it.

My advise: concentrate in small block of info, make a mental map of what you want to recover. When hds are dying in such manner, they always tend to crash with io errors in a few minutes, so, the best is to rescue little blocks, one each time. When it crashes, unplugg it, and try tommorrow again another little piece of data, preferably in a not so hot hour of the day (at night? ). That is the only chance that you have to recover the info unless you are willing to pay a proffessional data recovery service.

----------

## riscycdj

If the problem is the controller chips then yes. Freezer in a bag overnight and then quickly booting and extracting the data works. Better if you get freezer spray or invert a can of "air in a can" to keep the chips cool.

This works about 10% of the time and only if it is only just failing. A better method is getting another EXACTLY the same drive and swapping the controller boards over. This works well but only if you have two identical drives.

If the problem is with the drive or the head you are stuffed and will need to pay big bucks to get the data back.

Good luck!

----------

## johntramp

how do I know if it is a problem with the controller or the disk?

----------

## riscycdj

Good question. Not really easy to tell. But if the disk sounds like a jet taking off then there is probably something mechanically wrong. If it seems to make no clicking noise or repeated clicks then the voice coil might be stuffed.

I would still try the freezer trick and then go from there. It can't get any worse....can it  :Smile: 

----------

## johntramp

I do hear the drive power on and off sometimes :S

anyway, this is what Iget from dd running off a live cd

root@knoppix:/mnt/hdc1 #  dd if=/dev/sda1 of=/mnt/hdc1/ddsda.img conv=sync,noerror

dd: reading `/dev/sda1': Input/output error

96960+0 records in

96960+0 records out

49643520 bytes transferred in 70.069536 seconds (708489 bytes/sec)

dd: reading `/dev/sda1': Input/output error

96960+1 records in

96961+0 records out

49644032 bytes transferred in 70.069950 seconds (708492 bytes/sec)

dd: reading `/dev/sda1': Input/output error

96960+2 records in

96962+0 records out

49644544 bytes transferred in 70.088087 seconds (708316 bytes/sec)

dd: reading `/dev/sda1': Input/output error

96960+3 records in

96963+0 records out

49645056 bytes transferred in 70.088502 seconds (708320 bytes/sec)

dd: reading `/dev/sda1': Input/output error

96960+4 records in

96964+0 records out

49645568 bytes transferred in 70.088865 seconds (708323 bytes/sec)

dd: reading `/dev/sda1': Input/output error

96960+5 records in

96965+0 records out

49646080 bytes transferred in 70.109145 seconds (708126 bytes/sec)

dd: reading `/dev/sda1': Input/output error

96960+6 records in

96966+0 records out

49646592 bytes transferred in 70.109548 seconds (708129 bytes/sec)

dd: reading `/dev/sda1': Input/output error

96960+7 records in

96967+0 records out

49647104 bytes transferred in 70.109908 seconds (708132 bytes/sec)

The file I get from this is

root@ubuntu:/mnt/hdc1 # ls -l

total 48944

-rw-r--r--  1 root root 50036736 2005-10-13 08:37 ddsda.img

It does not get any bigger than 48mb.  Is there a way I can start the dd after this point on the harddrive maybe?

Thanks

----------

## widan

 *johntramp wrote:*   

> It does not get any bigger than 48mb.  Is there a way I can start the dd after this point on the harddrive maybe?

 

There is a program called dd_rescue that ignores I/O errors. You can try that.

If you want to try to start after the bad area, look at the "skip" and "seek" options of dd (skip a number of blocks in the input and output files, respectively). Use both set to the same value, so the recovered blocks are in the correct position in the output file.

The procedure to do a complete image that way is something like this: start copying from the beginning. When you encounter an error, take the number of "records in" and add 1 (to skip over the bad block). If you have 10000 records in, add "skip=10001 seek=10001" to your dd command. Then you continue doing it the same way for subsequent errors (but add the number of "records in" you got to the current skip value, and add one more). This can take a long time if there are a lot of errors.

----------

## johntramp

widan  thanks for that  

I will give ddrescue a go when I get home,  otherwise I suppose I will have to try the "skip" and "seek" options

----------

## johntramp

* sys-fs/dd-rescue 

     Available versions:  1.10 ~1.11

     Installed:           none

     Homepage:            http://www.garloff.de/kurt/linux/ddrescue/

     Description:         similar to dd but can copy from source with errors

* sys-fs/ddrescue 

     Available versions:  1.0

     Installed:           none

     Homepage:            http://www.gnu.org/software/ddrescue/ddrescue.html

     Description:         Copies data from one file or block device (hard disk, cdrom, etc) to another, trying hard to rescue data in case of read errorswhat is the difference between these two?

----------

## flybynite

Sometimes you have to work with the tools at hand.  Here is the scoop for using dd in these situations taken from the sleuthkit informer:

 *Quote:*   

> 
> 
> Error Correction
> 
> If 'dd' comes across an error while reading a block from the input file, an error will be generated and the copying process will stop. You can cause 'dd' to keep on going when it encounters an error if you provide the 'conv=noerror' flag. Unfortunately, with just that flag, 'dd' will skip writing that block and the remaining data will be in the wrong location. 
> ...

 

Note this author refers to creating an exact duplicate for forensic purposes.  For just getting your data back, you don't need to worry about any extra padding.  Since some filesystems use known offsets for superblocks etc, always use both the noerror and sync options in case of any errors.  The ide bus has to be reset after errors so the copy will take a real long time  :Smile: 

----------

## johntramp

I don't think I will be able to get anything back from this harddrive  :Sad:   Using both dd-resuce and dd it will create a file just shy of 50mb and then crash. I have tried telling them to start at blocks well past this point but that still does not work at all.

From what I have read the file created by dd should keep growing as zeros are added on where the errors lie, but this does not happen.  The harddrive also makes a high pitched sound when it is running, which I am sure is not good.

----------

