# Need assistance with interpretation of smartctl -a

## jesterchen

Hi folks,

a harddisk of mine started beeping this morning. After a reboot it had stopped, I started smartctl -t long - and now I am helpless with the interpretation of the result.

Can anyone give me a hint on these results?

Thanks.

```
root@khnum /mnt # smartctl -a /dev/sdb

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.1.4-gentoo] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Model Family:     Seagate Momentus 4200.2

Device Model:     ST9808210A

Serial Number:    3LF2A710

Firmware Version: 3.05

User Capacity:    80,026,361,856 bytes [80.0 GB]

Sector Size:      512 bytes logical/physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   6

ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2

Local Time is:    Sat Jan 21 12:52:55 2012 CET

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x82) Offline data collection activity

                                        was completed without error.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                (  426) seconds.

Offline data collection

capabilities:                    (0x5b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        No General Purpose Logging support.

Short self-test routine 

recommended polling time:        (   1) minutes.

Extended self-test routine

recommended polling time:        (  84) minutes.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   053   046   034    Pre-fail  Always       -       1223705

  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   095   095   020    Old_age   Always       -       5637

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   083   060   030    Pre-fail  Always       -       219529981

  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       9115

 10 Spin_Retry_Count        0x0013   100   100   034    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   095   095   020    Old_age   Always       -       5677

192 Power-Off_Retract_Count 0x0032   098   098   000    Old_age   Always       -       5678

193 Load_Cycle_Count        0x0032   004   004   000    Old_age   Always       -       192959

194 Temperature_Celsius     0x0022   038   053   000    Old_age   Always       -       38 (0 11 0 0 0)

195 Hardware_ECC_Recovered  0x001a   053   046   000    Old_age   Always       -       1223705

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   199   000    Old_age   Always       -       6

200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0

202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1

ATA Error Count: 39 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 39 occurred at disk power-on lifetime: 6422 hours (267 days + 14 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 60 a6 d4 e2  Error: UNC at LBA = 0x02d4a660 = 47490656

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 00 57 a6 d4 e2 00      00:31:00.611  READ DMA

  27 00 00 00 00 00 e0 00      00:31:00.602  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:31:00.602  IDENTIFY DEVICE

  ef 03 45 00 00 00 a0 02      00:31:00.597  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00      00:30:54.269  READ NATIVE MAX ADDRESS EXT

Error 38 occurred at disk power-on lifetime: 6422 hours (267 days + 14 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 60 a6 d4 e2  Error: UNC at LBA = 0x02d4a660 = 47490656

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 00 57 a6 d4 e2 00      00:30:47.856  READ DMA

  27 00 00 00 00 00 e0 00      00:30:41.506  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:30:41.506  IDENTIFY DEVICE

  ef 03 45 00 00 00 a0 02      00:30:41.489  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00      00:30:54.269  READ NATIVE MAX ADDRESS EXT

Error 37 occurred at disk power-on lifetime: 6422 hours (267 days + 14 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 60 a6 d4 e2  Error: UNC at LBA = 0x02d4a660 = 47490656

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 00 57 a6 d4 e2 00      00:30:47.856  READ DMA

  27 00 00 00 00 00 e0 00      00:30:41.506  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:30:41.506  IDENTIFY DEVICE

  ef 03 45 00 00 00 a0 02      00:30:41.489  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00      00:30:41.476  READ NATIVE MAX ADDRESS EXT

Error 36 occurred at disk power-on lifetime: 6422 hours (267 days + 14 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 60 a6 d4 e2  Error: UNC at LBA = 0x02d4a660 = 47490656

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 00 57 a6 d4 e2 00      00:30:28.756  READ DMA

  27 00 00 00 00 00 e0 00      00:30:41.506  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:30:41.506  IDENTIFY DEVICE

  ef 03 45 00 00 00 a0 02      00:30:41.489  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00      00:30:41.476  READ NATIVE MAX ADDRESS EXT

Error 35 occurred at disk power-on lifetime: 6422 hours (267 days + 14 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 60 a6 d4 e2  Error: UNC at LBA = 0x02d4a660 = 47490656

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 00 57 a6 d4 e2 00      00:30:28.756  READ DMA

  27 00 00 00 00 00 e0 00      00:30:28.733  READ NATIVE MAX ADDRESS EXT

  ec 00 00 00 00 00 a0 02      00:30:28.713  IDENTIFY DEVICE

  ef 03 45 00 00 00 a0 02      00:30:28.696  SET FEATURES [Set transfer mode]

  27 00 00 00 00 00 e0 00      00:30:28.665  READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%      9114         -

# 2  Extended offline    Completed: read failure       70%      6203         47490656

1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 1

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
```

----------

## tomtom69

Hi,

looks like the disk will not stay alive for a longer time any more.

Look at ID#1 (Raw_Read_Error_Rate), ID #7 (Seek_Error_Rate), ID #195 (Hardware_ECC_Recovered). The RAW vakues here are quite high, and VALUE is significantly below 100, showing that these errors mean a pre-fail condition of your drive.

A rising number of read erros can be caused by particles or a damaged surface of the disk.

tom

----------

## NeddySeagoon

jesterchen,

Visit the drive vendors site to interpret the RAW data.  Some drives cound up, some count down, so big RAW numbers are a feature of some drives and do not indicate a problem.

That IDENTIFY DEVICE keeps appearing is a bad sign. It indicates that drive is being hard reset by the kernel and the kernel is discovering the device all over again.

Before condeming the drive, change the data cable (unless irs a laptop) 

You only appear to have a single error at    LBA = 0x02d4a660 = 47490656.

If your write to that block on the drive, the drive will reallocate the block and all will be well for a while - except that the data at that block is gone for good.

What impact that will have depends on what is stored there.  It culd be a block of file data, a block of a directory or even a block of filesystem metadata.

dmesg may have some useful information.

----------

## jesterchen

Many thanks, you two. I begin to understand these values.

Seagate does not seem to offer detailed smart interpretations, but http://www.pcreview.co.uk/forums/seagates-seek-error-rate-raw-read-error-rate-and-hardware-ecc-recovered-smart-attributes-t4040327.html gives a hint, why these values are so high (first bits for ther error count, last bits for the count of e.g. seeks). So these values are quite normal.

The IDENTIFY DEVICE has only come up 112 days of running time ago - so I think this one is over. It could be that I once had the (laptop) disc for checks on an external USB interface which was faulty...

After all I will start regular backups again; if the drive dies, a new laptop is in order.

Thanks again.

----------

## NeddySeagoon

jesterchen,

If you want to try a last ditch recovery of the data in the faulty block (there may be more than one block) try ddrescue.

ddrescue tries very hard to read all the data from a damaged/dying drive. You only need one more read to recover the data.

You will need 80G of empty space on another drive/machine for the image plus spme space for the log.

The ddrescue log is human readable so you can keep an eye on progess.  Also, ddrescue uses it to know what to do next.

ddrescue tries very hard to read your data.

The chances are, that if ddrescue reads your faulty blocks, the drive will remap them an you will see the

```
   5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0 
```

change.

Drives use Partial Response, Maximum Likelyhood to read data.  Crudely put, the drive writes the bits so close together that they overlap, then guesses what the original data was when it reads. This mechanism allows drives to anticipate read failures, so the data can me moved to a spare sector. Most of the time it works well and the drive hides bad secotors from the operating system, every now and again a pending failure is not spotted.  This may be all thats wrong.  On the other hand, this could be the first block of many.  We don't know yet.

----------

## jesterchen

Thanks again.

Actually I have no clue whatsoever, what data has been in this sector, and I have no need to recover it. But I will do a ddrescue /dev/sdX /dev/null to try to get the bad sector marked. As far as I know, the read operation should be sufficent for that... And if not: a new laptop is just missing a neat excuse :)

----------

## NeddySeagoon

jesterchen,

Bad sectors are not marked by the drive, they are read one last time then abandoned and the data is written elsewhere on the drive.

This only works if the read works.  ddrescue will need the log file, even if you send the data to /dev/null

If you wish to mark the bad bocks in the filesystem, you can use badblocks.  Read its man page before you start as it can be destructive to your data.

----------

## jesterchen

Uhm... I am sorry, I was too tired to get my thinking straight:

I thought I wanted the bad sectors reallocated by the drive, not marked. But after all this might stress the drive too much, so I will just continue (my backups work again) until the next error arises - and then I will buy a new laptop  :Smile: 

Maximum likelyhood... this reminds me of an physical/mathematical exercise with quantum bits, where I should find the probability that after a long time still the correct information is read from the qubit  :Surprised: )

Thanks for all your support.

----------

## salahx

IF you want the drive to reallocate the bad sector, you might want to look at the hdparm --read-sector and --write-sector commands.

----------

