# I think my hard drive is dieing...

## WastingBody

This morning when after I logged into my desktop I started hearing this little "werring" sound going on and off every now and then. After I turned my computer off, I can no longer boot to that hard drive. I get read errors and the like. Now I am booted into the system rescue cd and am currently using rsync to copy all of my important files to my netbook. The drive mounts fine, the rsync operation is working perfectly. I haven't run a filesystem check or anything just in case there is a chance of something going wrong and I lose something. Is there anything that I should know? Anyone have any advice?

The worst part of this whole failure is that I just bought a 1 TB drive for my other computer to host backups the day before this all happened. I  have never done backups before simply because I never had anything to put them on. Just very rotten luck on my part.

----------

## Ken69267

well, I'd backup everything that you can and then asking what S.M.A.R.T. reports. sys-apps/smartmontools is the smart package in gentoo and `smartctl -a /dev/yourdevice` will tell you a good bit of info about it.

(I believe the system rescue cd has smart on it)

----------

## WastingBody

I'm not really sure what to make of the output.

```
root@sysresccd /root % smartctl -a /dev/sda

smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family:     Seagate Barracuda 7200.10 family

Device Model:     ST3250310AS

Serial Number:    6RY38ZFQ

Firmware Version: 3.AAC

User Capacity:    250,059,350,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   7

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Fri Jan  1 19:00:53 2010 UTC

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x82)   Offline data collection activity

               was completed without error.

               Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:        ( 430) seconds.

Offline data collection

capabilities:           (0x5b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               No Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   1) minutes.

Extended self-test routine

recommended polling time:     (  64) minutes.

SCT capabilities:           (0x0001)   SCT Status supported.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   108   097   006    Pre-fail  Always       -       19967651

  3 Spin_Up_Time            0x0003   098   097   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       937

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   083   062   030    Pre-fail  Always       -       224139541

  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       10823

 10 Spin_Retry_Count        0x0013   100   099   097    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       928

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   060   050   045    Old_age   Always       -       40 (Lifetime Min/Max 38/40)

194 Temperature_Celsius     0x0022   040   050   000    Old_age   Always       -       40 (0 23 0 0)

195 Hardware_ECC_Recovered  0x001a   079   060   000    Old_age   Always       -       1658

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0

202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%     10820         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

----------

## Vorlon

I think that disk is hosed.   It probably has a bad boot sector, which is why you can access it but not boot to it.   

Get as much off of it as you can.

if you're very brave (foolish?) you can try to repartition it after you have gotten all your data, but I'd get rid of the disk.

----------

## Ken69267

it doesn't look like a mechanical failure tbh. It reports PASSED and the attributes are all sane. (if the VALUE value is less than or equal to THRESH its failing)

After you've backed up might try to scan for badblocks and fsck.

EDIT: I'm almost positive its a badblock as 

```
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1 
```

See that 1? That means one block needs to reallocated.

`smartctrl -t short` would tell you approximately where it is. I had this happen to me recently but luckily it was in my swap partition and I could just dd the block away.

----------

## WastingBody

I think I've backed everything up that I need including my make.conf and my kernel's .config. I am brave and foolish, so I think I'll just try a reinstall and see what happens.

----------

## NeddySeagoon

WastingBody,

Get ddrescue (without the hypen) and manke an image of the drive to a file on your nice shiny new 1Tb drive

ddrescue tries really hard to recover your data and will only halt on success.

The SMART data looks ok and your drive only has just over 10,000 running hours, so its not even middle aged.

When you have your image, run the vendors test software on the disk. Thats a download from the website.

----------

## WastingBody

I'm running the SeaTools diagnostics tool from Seagate. I'm running the long test; it has found 10 errors so far. I hope it fixes it and I won't be left with a broken hard drive. On a good note my drive is still covered under its warranty, so if it does completely die soon I can have it replaced.

I think I'll keep ddrescue in mind if something else happens. I'll skip on it this time because all that's left that I haven't backed up is my root partition. I can easily rebuild my Gentoo install.

----------

## Ken69267

I doubt the drive is totaled, it's got better vitality attributes than my drives at the least  :Razz: .

Going on 150 reallocated bad blocks on mine, yours has zero (soon to be one once you fix the current one).

----------

## WastingBody

SeaTools reported that it had fixed the bad blocks. I try booting to my drive. It runs a fsck on my home partition, but it fails saying something about unexpected inconsistency and dumps me out to the console with a read-only filesystem. Should I try to fsck from the system rescue cd? If I do what options should I use to check with if any?

On another note large file support was causing me a little pain on new server. I was unable to mount my ext4 partitions for a little while because of the default options of mkfs.ext4.

I'm editing this post from my now alive desktop. Thanks everyone!

----------

## NeddySeagoon

WastingBody,

Those 10 errors will be bad blocks that Seatools forced to be relocated by writing to the blocks.

Your data that was there will be lost. Smartmontools should show a non zero reallocated sector count.

Its a little worrying that the drive does not reallocate sectors while it can still read them, as its supposed to.

Its worth poking Seagate about that, as this problem will happen again.

This is the second incident like this, concerning a Seagate drive  on the forums over the past week or so.

Seagate probably know about it and there may be a firmware upgrade for your drive to fix it.

----------

## WastingBody

There were no firmware updates for my drive. :/ 

If this problem progresses would it be a wise decision to go ahead and replace the drive?

----------

## NeddySeagoon

WastingBody,

email Seagate about the issue.  Sectors do die as the drive is used and the drive is supposed to remap them to spares while it can still read the data. The issue will recur.

I would press seagate for a warranty replacement now - before you lose any more data.

----------

## WastingBody

Alright, thanks again. ^_^

----------

