# Bad Blocks? (Solved)

## nlsa8z6zoz7lyih3ap

/dev/mapper/sdc is a Western Digital Caviar Green 2TB Hard Drive (a few years old) that is encrypted using dm-crypt and an ext4 filesystem. It has no partitions so all of /dev/mapper/sdc is just a single ext4 filesystem.

A recent routine boot fsck  indicated problems asking for a manual fsck.

I ran   e2fsck -vcp /dev/mapper/sdc with these results:

 *Quote:*   

>   e2fsck -vcp /dev/mapper/sdc
> 
> Error reading block 350224385 (Attempt to read block from filesystem resulted in short read) while reading inode and block bitmaps.  
> 
> /dev/mapper/sdc: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
> ...

 

I then copied everything off of this drive to a spare disk,

reformatted (mke2fs -t ext4 /dev/mapper/sdc) and then started rerunning   *Quote:*   

> e2fsck -vcp /dev/mapper/sdc

 

there has been no output so far. Hoever I tested it with dumpe2fs (while e2fsck was running) and got the following:

 *Quote:*   

>  sudo dumpe2fs /dev/mapper/sdc|grep -3 bad
> 
> dumpe2fs 1.42 (29-Nov-2011)
> 
>   Free blocks: 24674304-24707071
> ...

 

Does this mean that the drive is failing and should be replaced, or could something else be involved?

I don't mind replacing it, but would hate to put in a new disc and find the same problem right away.

PS /dev/mapper/sdb (which is also a 2TB western digital caviar green is also showing bad blocks with the same tests,

but all partitions on /dev/mapper/sda (which is a seagate 500GB) show no bad blocks)

  32768 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes[/quote]Last edited by nlsa8z6zoz7lyih3ap on Mon Jul 09, 2012 3:04 pm; edited 1 time in total

----------

## aCOSwt

Why don't you just ask badblocks (from sys-fs/e2fsprogs) to tell you ?

----------

## nlsa8z6zoz7lyih3ap

Thanks.

The last line of output from  *Quote:*   

> sudo badblocks -s -v /dev/mapper/sdc

  is

 *Quote:*   

> 100556555one, 1:00:24 elapsed. (86/0/0 errors)
> 
> 

 

So it appears that there are bad blocks.

QUESTION:

```
e2fsck -vcp /dev/mapper/sdc
```

is (as I understand it) suppposed to  instruct the ext4 file system to not use those blocks.

Is it considered safe to carry on using the disc after this, or would it be standard practice to just replace the disc.?

Do you know exactly what the output "(86/0/0) " means?

----------

## NeddySeagoon

nlsa8z6zoz7lyih3ap,

get smartmontools and ask the drive.

Check your warranty status too.  I bought 5 of these drives for a media server. So far I have had two warranty replaements, tehy bth failed after about 9 months.

The drive should remap bad blocks when they are predicted to be failure prone, so you never actually see any bad blocks at the Os level.

A write to the affected blocks should force a remap too.

While the above is all very interesting, check your warranty before you try any 'fixed' and post your smartctrl  output. 

There is no point is messig with an iffy drive that qualifies for a free replacement.

----------

## nlsa8z6zoz7lyih3ap

Thanks for steering me to smartctl.

After subjecting the drive to several tests  I read the log as follows:

 *Quote:*   

> smartctl -l selftest /dev/sdc
> 
> smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.21-gentoo-b] (local build)
> 
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
> ...

 

Thanks in advance for your interpretation of this.

PS Twice in the last month e2fsck has found errors on this  drive during boot, after a clean shutdown.

Once my vmware virtual machine (which lives on this drive) mysteriously refused to boot. (Fortunately I make frequent backups, and so lost nothing.

----------

## Ant P.

The drive's almost certainly dying. `smartctl -a /dev/sdc` would be useful to see too.

----------

## NeddySeagoon

nlsa8z6zoz7lyih3ap,

The smartmon log is more useful ... from memory its the -x option

----------

## nlsa8z6zoz7lyih3ap

 *Quote:*   

> The smartmon log is more useful ... from memory its the -x option

 

 *Quote:*   

> smartctl -x  /dev/sdc
> 
> smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.21-gentoo-b] (local build)
> 
> Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
> ...

 

----------

## nlsa8z6zoz7lyih3ap

 *Quote:*   

> The drive's almost certainly dying. `smartctl -a /dev/sdc` would be useful to see too

 

```

 smartctl -a /dev/sdc

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.2.21-gentoo-b] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar Green (Adv. Format)

Device Model:     WDC WD20EARS-00MVWB0

Serial Number:    WD-WCAZA0780869

LU WWN Device Id: 5 0014ee 2af9ea618

Firmware Version: 51.0AB51

User Capacity:    2,000,398,934,016 bytes [2.00 TB]

Sector Size:      512 bytes logical/physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   8

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Fri Jul  6 16:45:05 2012 PDT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x84) Offline data collection activity

                                        was suspended by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 121) The previous self-test completed having

                                        the read element of the test failed.

Total time to complete Offline 

data collection:                (38400) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        ( 255) minutes.

Conveyance self-test routine

recommended polling time:        (   5) minutes.

SCT capabilities:              (0x3035) SCT Status supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   103   103   051    Pre-fail  Always       -       49158

  3 Spin_Up_Time            0x0027   173   164   021    Pre-fail  Always       -       6350

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       557

  5 Reallocated_Sector_Ct   0x0033   173   173   140    Pre-fail  Always       -       585

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       6129

 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       545

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       136

193 Load_Cycle_Count        0x0032   182   182   000    Old_age   Always       -       56324

194 Temperature_Celsius     0x0022   114   111   000    Old_age   Always       -       36

196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       264

197 Current_Pending_Sector  0x0032   200   001   000    Old_age   Always       -       272

198 Offline_Uncorrectable   0x0030   200   199   000    Old_age   Offline      -       228

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   134   134   000    Old_age   Offline      -       17802

SMART Error Log Version: 1

ATA Error Count: 3718 (device log contains only the most recent five errors)

        CR = Command Register [HEX]

        FR = Features Register [HEX]

        SC = Sector Count Register [HEX]

        SN = Sector Number Register [HEX]

        CL = Cylinder Low Register [HEX]

        CH = Cylinder High Register [HEX]

        DH = Device/Head Register [HEX]

        DC = Device Command Register [HEX]

        ER = Error register [HEX]

        ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3718 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 c0 c6 fc eb 0a      21:27:17.589  READ DMA

  ec 00 00 00 00 00 a0 0a      21:27:17.570  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 0a      21:27:17.570  SET FEATURES [Set transfer mode]

Error 3717 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 c0 c6 fc eb 0a      21:27:14.760  READ DMA

  ec 00 00 00 00 00 a0 0a      21:27:14.741  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 0a      21:27:14.741  SET FEATURES [Set transfer mode]

Error 3716 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 c0 c6 fc eb 0a      21:27:11.931  READ DMA

  ec 00 00 00 00 00 a0 0a      21:27:11.912  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 0a      21:27:11.912  SET FEATURES [Set transfer mode]

Error 3715 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 c0 c6 fc eb 0a      21:27:09.102  READ DMA

  ec 00 00 00 00 00 a0 0a      21:27:09.083  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 0a      21:27:09.083  SET FEATURES [Set transfer mode]

Error 3714 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 c0 c6 fc eb  Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328

  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 08 c0 c6 fc eb 0a      21:27:06.261  READ DMA

  ec 00 00 00 00 00 a0 0a      21:27:06.242  IDENTIFY DEVICE

  ef 03 46 00 00 00 a0 0a      21:27:06.242  SET FEATURES [Set transfer mode]

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed: read failure       90%      6128         31187072

# 2  Extended offline    Completed: read failure       90%      6128         31187076

# 3  Extended offline    Completed: read failure       90%      6128         31187072

# 4  Short offline       Completed: read failure       90%      6128         31187076

# 5  Extended offline    Completed: read failure       90%      6128         31187072

# 6  Conveyance offline  Completed: read failure       90%      6127         31187076

# 7  Short offline       Completed: read failure       90%      6127         31187072

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

Log feformatted from quote to code for easy reading by NeddySeagoon 

----------

## NeddySeagoon

nlsa8z6zoz7lyih3ap,

The important stuff first ...

```
Warranty Inquiry for : Canada

Serial Number     Model Number     Warranty Status  Warranty Exp Date

WCAZA0780869     WD20EARS-00MVWB0     IN WARRANTY   10/01/2013
```

In the UK at least, WD will ship you a new drive before you send your old one in.  They need a creit card number in case the old drive is not returned.

Return postage is your cost and its worth insuring the scrap drive too, since you will be billed if its not received.

Since you have a few months yet to return the drive, the following is for interest only.

The 

```
VALUE WORST THRESH
```

columns provide the interesting data. These are normalised numbers and can be read the same for all drive vendors.

VALUE shows the corrent value of a paramter., WORST is the closest to failing the parameter has been in the drives life. THRESH is the value consider to be a falure.

That is if VALUE or WORST <= THRESH, the parameter has failed. RAW_VALUE is vendor or even drive specific, since its a 32 bit field that may contain several bit fields, e.g. four 8 bit values.

```
Vendor Specific SMART Attributes with Thresholds: 

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       264 

197 Current_Pending_Sector  0x0032   200   001   000    Old_age   Always       -       272
```

Shows the drive has already reallocated some sectors and had more it would like to realocate.

Get WD to send you a warrantly replacement before you send your drive in.  Use ddrescure to image the old drive onto the new one.

As its a whold drive image, you will need to tell it to write the logfile someone else.

When/if you get all your data back, or you give up trying, send the dead drive back.

Once ddrescure is down to doing retries, its worth moving the dead drive around while ddrescue runs.  Try it on all four edges, upside down and any other attitudes you can easily prop it up in.  You just need one more read.

The error count in 

```
Error 3718 occurred at disk power-on lifetime: 6124 hours (255 days + 4 hours) 

  When the command that caused the error occurred, the device was active or idle.
```

is incremented for every failed command, so 

Each time 

```
Error: UNC 8 sectors at LBA = 0x0bfcc6c0 = 201115328
```

is read you get a new error record.

----------

## nlsa8z6zoz7lyih3ap

Thanks so much for explaining all this to me. It was very helpful indeed.

 *Quote:*   

> Computer users fall into two groups:-
> 
> those that do backups
> 
> those that have never had a hard drive fail.

 

I do frequent backups of everything and so  do not have to still try to get data of of the drive.

Thanks again.

----------

