# smartctl and output

## juniper

I just had a filesystem problem on reboot.  I was asked to fun fsck manually and I did.  there were a number of "multiply claimed inodes" and I was asked if I wanted to clone thing etc.  since I am not very smart about these things, I just answered yes to everything.  I probably shouldn't have done this, i hope i didn't do any permanent damage.

anyways, the errors went away and it *seems* things are fine.  I think the problem was caused by a number of recent unclean poweroffs.  I also wanted to run a few filesystem checks.  I thought I would give smartctl a whirl.  This drive is almost 4 years old, so I am concerned about it getting messed up.

Here are my questions.

1)  After an unclean shutdown, I want fsck to check all filesystems.  How do I do that?  i have in fstab

```

/dev/sda5               /boot           ext3            noauto,noatime          1 2

/dev/sda7               /home           ext3            noatime,user_xattr      0 1

/dev/sda6               none            swap            sw                      0 0

/dev/sda8               /               ext3            noatime,user_xattr      0 2

```

i thought the last two number were supposed to force checks.

I read that I can use tune2fs to lower the automatic check count (check after every 5 reboots or something).  

2)  I ran smartctl, which I *think* i can use on mounted systems (my hope is that it would give a warning if this is a bad idea  :Smile:  ) but I don't understand the output.  here it is.

```

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Scorpio family

Device Model:     WDC WD800VE-75HDT0

Serial Number:    WD-WXE305310524

Firmware Version: 09.07D09

User Capacity:    80,026,361,856 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   6

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Wed Apr  8 13:04:30 2009 BST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00)   Offline data collection activity

               was never started.

               Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:        (4800) seconds.

Offline data collection

capabilities:           (0x7b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               No General Purpose Logging support.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     (  63) minutes.

Conveyance self-test routine

recommended polling time:     (   5) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0003   194   188   021    Pre-fail  Always       -       1275

  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -       3617

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       444

 10 Spin_Retry_Count        0x0012   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       3599

192 Power-Off_Retract_Count 0x0032   001   001   000    Old_age   Always       -       228843

193 Load_Cycle_Count        0x0032   124   124   000    Old_age   Always       -       228867

194 Temperature_Celsius     0x0022   108   097   000    Old_age   Always       -       39

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   253   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

----------

## Hu

A quick check of the init scripts suggests that, at least for baselayout 1, you can force a full fsck by having a file named forcefsck in / at boot, or by passing forcefsck on the kernel command line.  I have not checked whether this functionality still exists in baselayout 2, though I cannot think of any reason to remove it.

Which variant of ext are you using?  The journaled ones should be resistant to sudden failure during light usage.

Using smartctl to show drive data should be safe, even when the drive has mounted filesystems.

----------

## yabbadabbadont

I think that the forcefsck functionality is part of the fsck program itself and doesn't have anything to do with baselayout.  I know this same method worked on old Unix systems back in the 80's.

----------

## juniper

hmmm.  I am using ext3.  I also thought it should be resistant to unexpected power offs.  Actually, these power offs occurred when suspend to ram failed to resume.  I had a number of these in a row, so maybe that is the cause.

Just as a measure, I ran rkhunter and it came up with nothing.  So, I don't really know the cause of this.  I imagine it was the suspend to ram crashes.

can you interpret the smartctl output?

----------

## Abraxas

It is most likely the suspend to ram crashes that corrupted the system.  My processor wasn't seated well once and my system kept crashing and I would lose random things.  It didn't help that I was using laptop-mode which delays writes.  Fsck it and hope for the best.  Journaling just guarantees that you either have the data on the disk or you don't...it doesn't guarantee that the data will be there if you hard lock or crash.

----------

## eccerr0r

 *Quote:*   

> 
> 
> ```
>   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       444
> 
> ...

 

Looks like this drive has been bitten by the head unload issue due to over-aggressive power saving.  However the drive otherwise appears to not really be reporting any issues, just aging information.

Did you just run the self test?

Tried a long self test?

Check the hdparm -B option to disable power management, but any wear from head load/unload has already been done.

----------

## juniper

i don't use hdparm or sdparm, so I don't think I do any aggressive power saving on the drive.

as for whether i ran a short or long test, I just did

smartctl -a /dev/sda

i guess my drive is probably still ok.  I might get a newer kernel and hope s-t-r is more stable.

----------

## fbvortex

If you use "smartctl -H -l error <devname>" you'll get a more concise listing which will tell you what you really want to know when checking the disk health.  I think it's a subset of what gets dumped when you use -a, but probably less confusing overall.

----------

## juniper

here is what i get

```

smartctl -H -l error /dev/sda

smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

SMART Error Log Version: 1

No Errors Logged

```

----------

