# Is my harddisk broken?

## mfyahya

My two year old dell inspiron 6000 laptop suddenly started freezing, and stopped booting up. I booted from the livecd, and most of the file system is unaccessible  :Sad: 

```

livecd ~ # mount /dev/sda /mnt/gentoo 

livecd ~ # ls /mnt/gentoo/home

ls: /mnt/gentoo/home: Input/output error

```

```

livecd ~ # e2fsck -f /dev/sda

e2fsck 1.38 (30-Jun-2005)

e2fsck: Attempt to read block from filesystem resulted in short read while trying to open /dev/sda

Could this be a zero-length partition?

```

```

livecd ~ # smartctl -a /dev/sda

smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

Device: ATA      SAMSUNG MP0804H  Version: UE10

Serial number: S042J20L254571

Device type: disk

Local Time is: Fri Jun  1 05:29:21 2007 UTC

Device does not support SMART

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']

Device does not support Self Test logging

```

dmesg is full of messages such as

```

end_request: I/O error, dev sda, sector 125568

ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00

ata1: status=0x51 { DriveReady SeekComplete Error }

ata1: error=0x04 { DriveStatusError }

ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00

ata1: status=0x51 { DriveReady SeekComplete Error }

ata1: error=0x04 { DriveStatusError }

ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00

ata1: status=0x51 { DriveReady SeekComplete Error }

ata1: error=0x04 { DriveStatusError }

ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00

ata1: status=0x51 { DriveReady SeekComplete Error }

ata1: error=0x04 { DriveStatusError }

ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00

ata1: status=0x51 { DriveReady SeekComplete Error }

ata1: error=0x04 { DriveStatusError }

ata1: translated ATA stat/err 0x51/04 to SCSI SK/ASC/ASCQ 0xb/00/00

ata1: status=0x51 { DriveReady SeekComplete Error }

ata1: error=0x04 { DriveStatusError }

sd 0:0:0:0: SCSI error: return code = 0x8000002

sda: Current: sense key=0xb

    ASC=0x0 ASCQ=0x0

Info fld=0x1ea80

end_request: I/O error, dev sda, sector 125568

```

Is this due to a broken hardisk, or something wrong with my motherboard? How can I determine which one it is?

----------

## cynric

There are a few other threads on this board which discuss this issue and I recall a kernel a setting which is mentioned in DriveReady SeekComplete Error, Drive Status Error, more. A lot of people seem to have had their hard drives die shortly after seeing this message so I'd suggest installing and running smartmontools first. The reason being is that if the drive is failing, rebooting may be the last thing you're able to try -- at least that's been my experience, keep the system up as long as you can while you back it. If it passes smartctl, then try that kernel option. Hope that helps and good luck.

----------

## mudrii

I think is time for the backup and call Dell

You could try to stress test hdd to be 100% sure

----------

## mfyahya

It just died completely  :Sad: 

The disk was an upgrade I got from newegg or someplace a year ago. It's a samsung. I wonder if they'll replace it..

----------

## roadrunner_gs

Hello

try:

# smartctl -a -d ata /dev/sda

that should bring up the S.M.A.R.T.-parameters for SATA-discs.

But it seems like your harddisk is somewhat damaged.

edit: Oh, to late.

----------

## cynric

Since death seems to come quickly after seeing these messages, I wonder if it might not be appropriate to sticky one of the relevant threads or make a FAQ. From the quick search I did here, that error shows up in a few hits and the drive went kaput before getting an answer. So far I've seen three things to do once a user sees this:

Immediately backup and pray to $DIETY

 Run smartmontools

 Enabled IDEDISK_MULTI_MODE

----------

## desultory

 *cynric wrote:*   

> Since death seems to come quickly after seeing these messages, I wonder if it might not be appropriate to sticky one of the relevant threads or make a FAQ.

 Would you be willing to write it?

----------

## daschapa

Uhop Ohhh... I'm having the same dmesg...

C'ya, i have to burn cds to backup my whole hd

----------

## cynric

 *desultory wrote:*   

> Would you be willing to write it?

 

I've never seen this message so I have no woring knowledge of it, but I'll check the other posts and see what I can come up with. I'm guessing I just start a new thread which is then made sticky? I'd prefer to have someone check it over first if possible.

----------

## desultory

 *cynric wrote:*   

> I've never seen this message so I have no woring knowledge of it, but I'll check the other posts and see what I can come up with.

 Between the research you have already done and feedback during the submission process, not having actually handled this problem yourself should not itself be a problem.

 *cynric wrote:*   

> I'm guessing I just start a new thread which is then made sticky? I'd prefer to have someone check it over first if possible.

 That is, in effect, the established process.

----------

## cynric

Thanks for the reply, desultory. The hopefully useful post is at, [FAQ] DriveStatusError (error=0x04) for those who are interested.

----------

## Theophile

I'm getting these errors. I tried to backup the data by using 'mv' but I get I/O errors on every file that it attempts to copy. I went ahead and did 'smartctl- a /dev/hdb' and it gave me this:

```
=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar SE family

Device Model:     WDC WD2500JB-00GVC0

Serial Number:    WD-WCAL78128718

Firmware Version: 08.02D08

User Capacity:    250,059,350,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   6

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Fri Jun  8 03:59:59 2007 CDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

                                        was suspended by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                 (6960) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        No General Purpose Logging support.

Short self-test routine 

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        (  88) minutes.

Conveyance self-test routine

recommended polling time:        (   5) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000b   095   095   051    Pre-fail  Always       -       9577

  3 Spin_Up_Time            0x0007   208   130   021    Pre-fail  Always       -       2100

  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always       -       845

  5 Reallocated_Sector_Ct   0x0033   199   199   140    Pre-fail  Always       -       8

  7 Seek_Error_Rate         0x000b   193   193   051    Pre-fail  Always       -       845

  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10253

 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0013   100   253   051    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38

194 Temperature_Celsius     0x0022   108   088   000    Old_age   Always       -       42

196 Reallocated_Event_Count 0x0032   195   195   000    Old_age   Always       -       5

197 Current_Pending_Sector  0x0012   173   173   000    Old_age   Always       -       1099

198 Offline_Uncorrectable   0x0012   198   198   000    Old_age   Always       -       93

199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       1

200 Multi_Zone_Error_Rate   0x0009   199   199   051    Pre-fail  Offline      -       38

SMART Error Log Version: 1

ATA Error Count: 346 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 346 occurred at disk power-on lifetime: 10253 hours (427 days + 5 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 4f 10 48 f0  Error: UNC at LBA = 0x0048104f = 4722767

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  29 00 08 4f 10 48 18 00      09:29:46.000  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:46.000  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:46.000  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:46.000  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:46.000  READ MULTIPLE EXT

Error 345 occurred at disk power-on lifetime: 10253 hours (427 days + 5 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 4f 10 48 f0  Error: UNC at LBA = 0x0048104f = 4722767

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  29 00 08 4f 10 48 18 00      09:29:44.100  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:44.100  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:44.100  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:44.100  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:44.100  READ MULTIPLE EXT

Error 344 occurred at disk power-on lifetime: 10253 hours (427 days + 5 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 4f 10 48 f0  Error: UNC at LBA = 0x0048104f = 4722767

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  29 00 08 4f 10 48 18 00      09:29:42.150  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:42.150  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:42.150  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:42.150  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:42.150  READ MULTIPLE EXT

Error 343 occurred at disk power-on lifetime: 10253 hours (427 days + 5 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 4f 10 48 f0  Error: UNC at LBA = 0x0048104f = 4722767

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  29 00 08 4f 10 48 18 00      09:29:40.200  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:40.200  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:40.200  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:40.200  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:40.200  READ MULTIPLE EXT

Error 342 occurred at disk power-on lifetime: 10253 hours (427 days + 5 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 4f 10 48 f0  Error: UNC at LBA = 0x0048104f = 4722767

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  29 00 08 4f 10 48 18 00      09:29:38.250  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:38.250  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:38.250  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:38.250  READ MULTIPLE EXT

  29 00 08 4f 10 48 18 00      09:29:38.250  READ MULTIPLE EXT

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

I have no idea what this says. I'm assuming it's bad.

Is there any hope for recovering this data?

----------

