# drive state [closed]

## idella4

I have a drive that is showing many signs of dying. Neddy has given me a few tips, and I expect hime to be here later on.

For now, someone please tell me how to use smartctl.  Its currently ?? running a long test on the troublesome drive, but the -h gives no indication of of how to display the findings.  A short test should be enough to outline its state of health.

```

genny ~ # smartctl --test=long /dev/hde

smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (local build)

Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".

Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 20 minutes for test to complete.

Test will complete after Thu Nov 25 00:52:10 2010

Use smartctl -X to abort test.

```

Now this is worrying

```

genny ~ # smartctl -H /dev/hde

smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (local build)

Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

```

I am sure smartctl is off track here.

```

genny ~ # parted /dev/hde

GNU Parted 2.3

Using /dev/hde

Welcome to GNU Parted! Type 'help' to view a list of commands.

(parted) p                                                                

Model: QUANTUM FIREBALLlct10 20 (ide)

Disk /dev/hde: 20.4GB

Sector size (logical/physical): 512B/512B

Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags

 1      419MB   10.1GB  9668MB  primary

 2      10.1GB  20.4GB  10.3GB  primary

genny ~ # mount /dev/hde1 /mnt/btrfs1

mount: /dev/hde1: can't read superblock

genny ~ # mount /dev/hde2 /mnt/btrfs1

mount: /dev/hde2: can't read superblock

```

```

genny ~ # smartctl --test=short /dev/hde

smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (local build)

Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Short self-test routine immediately in off-line mode".

Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 1 minutes for test to complete.

Test will complete after Thu Nov 25 00:41:31 2010

```

How do I view the test findings for crying out loud???

----------

## NeddySeagoon

idella4

smartctl -h or smartctl --help or even man smartctl   may shed some light on the topic.

I think 

```
smartctl /dev/hde 
```

just prints the report.

You must have a very old kernel and udev or even a static /dev to have hd* nodes in /dev

----------

## idella4

Neddy, 

welcome, you made it,

yes I happen to be in a zen 2.6.34, just haven't bothered to change it over since it all works ok.  Would do if I considered it vital.

I have

```

idella@genny ~/Documents $ sudo smartctl -a -d ata /dev/hde

Password: 

smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (local build)

Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Device Model:     QUANTUM FIREBALLlct10 20

Serial Number:    874010245409

Firmware Version: A03.0900

User Capacity:    20,416,757,760 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:   4

ATA Standard is:  ATA/ATAPI-4 T13 1153D revision 15

Local Time is:    Thu Nov 25 01:45:08 2010 WST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                 (   1) seconds.

Offline data collection

capabilities:                    (0x1b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        No Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        No General Purpose Logging support.

Short self-test routine 

recommended polling time:        (   1) minutes.

Extended self-test routine

recommended polling time:        (  20) minutes.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x0029   100   253   020    Pre-fail  Offline      -       0

  3 Spin_Up_Time            0x0027   086   070   020    Pre-fail  Always       -       1790

  4 Start_Stop_Count        0x0032   097   097   008    Old_age   Always       -       2383

  5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000b   100   100   023    Pre-fail  Always       -       0

  9 Power_On_Hours          0x0012   086   086   001    Old_age   Always       -       9731

 11 Calibration_Retry_Count 0x0013   100   090   020    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   097   097   008    Old_age   Always       -       2368

 13 Read_Soft_Error_Rate    0x000b   100   100   023    Pre-fail  Always       -       0

199 UDMA_CRC_Error_Count    0x001a   001   001   000    Old_age   Always       -       5310

196 Reallocated_Event_Count 0x0010   100   100   020    Old_age   Offline      -       0

197 Current_Pending_Sector  0x0032   100   100   020    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 0

No Errors Logged

SMART Self-test log structure revision number 0

Warning: ATA Specification requires self-test log structure revision number = 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%      9730         -

# 2  Short offline       Completed without error       00%      9730         -

# 3  Short offline       Completed without error       00%      9730         -

# 4  Abort offline test  Aborted by host               00%      9730         -

# 5  Short offline       Completed without error       00%      9730         -

# 6  Abort offline test  Aborted by host               30%      9730         -

# 7  Short offline       Completed without error       00%      9730         -

# 8  Short offline       Completed without error       00%      9730         -

Device does not support Selective Self Tests/Logging

```

What is going on ?  You know the details of how it's failing.  smartctl reports it pretty healthy.

You knoe it can't mount /dev/hde1 or 2 and the image from dd couldn't mount.  I am confused, thought smartctl would lay it all out.

Oh, the kernel.. I tried emerging vmware workstation and couldn't.  Using zen 2.56.34 because it's the closest kernel to emerging components effectively.  Have a post in portage & programming, actually submitted a bug because it's such a mess.

----------

## NeddySeagoon

idella4,

All the evidence suggests the drive is fine but the filesystems it contains are damaged.

Thats perfectly possible if for some reason, the driver wrote rubbish to the filesystem metadata, or even all over the drive.

----------

## idella4

hmmmm, 

yes, right.  well I'd say it takes some more investigating.   The smartctl suggests it's  fine, other outcomes suggest it's not.  The drive single handedly stopped the booting process in its tracks, had to unplugged it to get it to boot.  Only after  applying your tip it now boots ok.  It froze up using dd.  It occasionally clicks and whirs.  I think the next step is to take images of the partitions and reformat.  The tip I really need is how to use hexedit to follow and store a selected file.  

The main evidence for the negative is 

```

idella@genny ~/bin $ sudo  mount -o loop,ro,offset=419489280 /dev/sdb /mnt/tmp

Password: 

mount: /dev/loop0: can't read superblock

```

Remember?

I also have in support of the negative

```

 genny linux-2.6.36-hardened-r2 # smartctl -a /dev/sdb

smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (local build)

Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Device Model:     QUANTUM FIREBALLlct10 20

Serial Number:    874010245409

Firmware Version: A03.0900

User Capacity:    20,416,757,760 bytes

Device is:        Not in smartctl database [for details use: -P showall]

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

..............................................

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x0029   100   253   020    Pre-fail  Offline      -       0

  3 Spin_Up_Time            0x0027   086   070   020    Pre-fail  Always       -       1790

  4 Start_Stop_Count        0x0032   097   097   008    Old_age   Always       -       2384

  5 Reallocated_Sector_Ct   0x0033   100   100   020    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000b   100   100   023    Pre-fail  Always       -       0

  9 Power_On_Hours          0x0012   086   086   001    Old_age   Always       -       9738

 11 Calibration_Retry_Count 0x0013   100   090   020    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   097   097   008    Old_age   Always       -       2369

 13 Read_Soft_Error_Rate    0x000b   100   100   023    Pre-fail  Always       -       0

199 UDMA_CRC_Error_Count    0x001a   001   001   000    Old_age   Always       -       5310

196 Reallocated_Event_Count 0x0010   100   100   020    Old_age   Offline      -       0

197 Current_Pending_Sector  0x0032   100   100   020    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   253   000    Old_age   Offline      -       0

............................................

SMART Error Log Version: 0

No Errors Logged

SMART Self-test log structure revision number 0

Warning: ATA Specification requires self-test log structure revision number = 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%      9737         -

# 2  Short offline       Completed without error       00%      9736         -

Device does not support Selective Self Tests/Logging

```

I don't really know how to interpret this data output.

----------

## manaka

According to the SMART data, the disk surface is OK. You have no sector failing ECC checks (Offline_Uncorrectable). Also, the number of sectors reallocated in the whole disc life (Reallocated_Sector_Ct) is zero.

You have a relatively high number of ATA frames failing CRC checks (UDMA_CRC_Error_Count), though. This would point to a cable or electrical problem, I guess.

----------

## idella4

manaka.

thanks, so, I have done as I planned,; I have taken a dd type image of each partition, have currently 2 copies of each.  To test the state of the drive, I have again formatted one of the partitions.

The partitions were restored along with the partition table. That is in place.  They were mounted and listed the content correctly.  Then they would not mount, just like that.  This simply does not compute.  Anyway, the partitions were useless in that state.  One is reformatted and has had a large file copied into  it.  For now may as well leave the second, one is enough to test the drive's usability or health..

The tip I really need is how to use hexedit to follow and store a selected file.

Here

```

01F4E970   08 00 00 00  09 2C 00 00  42 5A 68 39  31 41 59 26  53 59 71 3C  .....,..BZh91AY&SYq<

01F4E984   C4 0F 00 02  7F 7F 86 EC  B0 02 00 58  7F FF 9F 7F  EF DF 64 FF  ...........X......d.

01F4E998   FF DF EA 00  00 82 00 00  08 40 03 1C  6C C5 68 01  43 44 9E 94  .........@..l.h.CD..

```

is the start of a file.

Here 

[quote]

COMMANDS (quickly)

   Moving

       <, > :  go to start/end of the file

----------

## NeddySeagoon

idella4,

Keep in mind that all things are files to hexedit.

If you feed it a whole drive. the whole drive is the file. If you work on a partition, the partition is a file.

If you want hexedit itself to follow a file on a filesystem you have to point it to that file. In turn, that means the file system has to be mounted.

Thats were you are coming unstuck.  You can't mount the filesystem because its broken.

One of you, (you or hexedit) has to read some of the filesystem metadata to find all the data that belongs to the file.

Hexedit can't as it uses the kernel file system driver, which in turn, needs a mounted filesystem.

That leaves you to unpick the metadata to discover which blocks belong to the file. And we know that the metadata is damaged or the filesystem would mount.

With vfat, its relatively painless, with ext2 is a bit harder (possibly due to lack of practice) with ext3 and ext4 its harder still.

Tools like sleuthkit help, but only if they understand the filesystem in question. Sleuthkit gives odd results on broken filesystems, it can only read whats there, not what should be there.

How do you know the drive hardware stopped the boot process and not something that was read from the drive?

Both can do that. Broken filesystem metadata can produce sounds from a drive too, as it seeks for tracks and sectors it doesn't have.

Disconnecting the drive isolates both the drive and the data it contains, so your circumstantial evidence for a hardware issue is not clear.

As manaka says, it may be the data cable.

----------

## idella4

Neddy,

```

How do you know the drive hardware stopped the boot process and not something that was read from the drive? 

```

ok, got that.  Hmmm ok. I follow.    As stated, there's nothing of any value.

All in all hexedit isn't a usable tool for the task.  The only query left hanging is why did it mount the partition a week or so ago, then no longer mount.  Perhaps the cable.  well, perhaps.  The data content on both were in order and mounted, the flow of events is in my post.

I think it's time to go to bed and forget about it.

Oh well, time to close this chapter I think.  Catch you later.

----------

## manaka

I was thinking about the message "can't read superblock". If the FS are ext2/3/4, you could try specifying a backup superblock when mounting. The tool testdisk (see next paragraph) could tell you which ones they are.

 *Quote:*   

> 
> 
> The tip I really need is how to use hexedit to follow and store a selected file.
> 
> 

 

I see, idella4. Perhaps testdisk could be of help here. Did use it once to recover a deleted partition table, but I think it can recover files too.

----------

## NeddySeagoon

manaka,

The fs is btrfs ... when that breaks you get to keep the pieces - all of them.

----------

## idella4

Not sure if this chapter is closed or not.  Neddy I see you're close by.

After having reformatted /dev/sdb1, once again it can't remount.   I tend to think there is a hardware component to this.

smartctl ensures the drive is healthy, but that tip about the cable just might be it.  That's the second tome in succession the fprmatted btrfs partition becomes unmountable without having worked or touched it.  btrfs is flawed, but really!!!!!!

I'm going to shutdown, swap the cables on that drive with the other ide and see what occurs.  I really don't think it's btrfs this time.

I have tinkered with creating btrfs partitions on 2 other drives and this doesn't occur.

well, so far 2 reboots and the /dev/sdb1 /now once again reformatted in btrfs is ok.

Would still like a suggestion on how to extract the content of

```

(none) kexec # ls -l /mnt/data/part*

-rw-r--r-- 1 root root  9667607040 Nov 25 21:25 /mnt/data/partition1-btrfs.img

-rw-r--r-- 1 root root 10329661440 Nov 25 20:50 /mnt/data/partition2-btrfs.img

```

without being a btrfs dev.

right, closing in on it.

```

genny kexec # mount /dev/sdb1 /mnt/btrfs1

mount: /dev/sdb1: can't read superblock

genny kexec # mkfs -t btrfs /dev/sdb1

WARNING! - Btrfs v0.19-35-g1b444cd-dirty IS EXPERIMENTAL

WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on /dev/sdb1

        nodesize 4096 leafsize 4096 sectorsize 4096 size 9.00GB

Btrfs v0.19-35-g1b444cd-dirty

genny kexec # mount /dev/sdb1 /mnt/btrfs1

genny kexec # ls /mnt/data/openSUSE-11.2-DVD-x86_64-iso

openSUSE-11.2-DVD-x86_64.iso  openSUSE-11.2-DVD-x86_64.iso.md5

genny kexec # cp -a /mnt/data/openSUSE-11.2-DVD-x86_64-iso /mnt/btrfs1

```

right,  this manages to once again corrupt the btrfs on /dev/sdb1.  I did another run and for the first time corrupted a btrfs image on another drive.  It appears to be yet another btrfs bug.  This time, the use of btrfs- device add  || delete corrupts a partition.

The above corruption is partly forgivable, I've missed the last few lines,  since it told it to add it to itself, unintentional.  However, adding again this time corrupted /dev/sdd8/.

This is what does it.

```

genny bin # btrfs device delete /dev/sdb1 /mnt/btrfs1

mount: /dev/sdb1: can't read superblock

```

So what appears to occur is; -

adding a device to a btrfs filesystem takes over the data or file system on that device, it's as if it's re-formatted it as part of the add device script.

On viewing the content of the device once added, the content is that of the device to which it was added.

This is a form of manually extending the partition size, though from a separate phsical drive or volume. 

On deleting the device form the initial device, the second device / partition is virtually in a raw state.

That is, the docs never bothered to point out this inevitable loss of data by the addition and deletion of devices.

The second device I added had one folder of content on it.  On adding it, it was just gone.

How brutal, how clumsy.  How unexpected.  

Well mourning the loss of the pieces is part of the experimenting I suppose.

devsk was right.

It may just be that all this occurs and applies to the ext file systems as well, but until looking a btrfs I never knew of the concepts of sub-volumes & snapshots & multiple devices.

To be fair, multiple devices lends itself to a raid setup & that supposes all devices formatted with the common raid type fs.

I suppose this is just more of the learning curve, but how many more ways can they ruin established data?

----------

