# SMART disk monitoring - Values reversed?

## lyallp

Here is some pre-reading on SMART http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_S.M.A.R.T._attributes

I have enabled SMART monitoring on my hard disks.

Having all these nice SMART diagnostics is cool, but I seem to be having diagnostics reported which, to me, seem backward.

For example, in the excerpt below, My reallocated event count is going down, not up.

I begin to wonder if the SMART daemon is using signed characters instead of unsigned characters when extracting the counters and that my hard drive has had so many reallocations (read that as bad sectors re-mapped) that I have wrapped into the negatives.

Both drives are IDE Western digital of relatively ancient origin. (120gb and 200gb)

```

root@lyalls-pc:log

# hdparm -v /dev/hda

/dev/hda:

 multcount     = 16 (on)

 IO_support    =  0 (default 16-bit)

 unmaskirq     =  0 (off)

 using_dma     =  1 (on)

 keepsettings  =  0 (off)

 readonly      =  0 (off)

 readahead     = 256 (on)

 geometry      = 65535/16/63, sectors = 234441648, start = 0

root@lyalls-pc:log

# hdparm -v /dev/hdd

/dev/hdd:

 multcount     = 16 (on)

 IO_support    =  0 (default 16-bit)

 unmaskirq     =  0 (off)

 using_dma     =  1 (on)

 keepsettings  =  0 (off)

 readonly      =  0 (off)

 readahead     = 256 (on)

 geometry      = 19457/255/63, sectors = 312581808, start = 0

```

Any thoughts?

```

Jul 26 18:36:54 lyalls-pc hda: WDC WD1200BB-00CAA1, ATA DISK drive

Jul 26 18:36:54 lyalls-pc hdb: WDC WD2000JB-00GVA0, ATA DISK drive

<snip and grep for only SMART log entries>

Jul 28 14:04:39 lyalls-pc smartd[6535]: Device: /dev/hdd, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 200

Jul 28 18:34:36 lyalls-pc smartd[6535]: Device: /dev/hdd, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 200 to 100

Jul 28 23:34:37 lyalls-pc smartd[6535]: Device: /dev/hda, SMART Usage Attribute: 196 Reallocated_Event_Count changed from 164 to 163

Jul 30 18:35:43 lyalls-pc smartd[6407]: Device: /dev/hda, SMART Prefailure Attribute: 5 Reallocated_Sector_Ct changed from 176 to 174

Jul 30 18:35:43 lyalls-pc smartd[6407]: Device: /dev/hda, SMART Usage Attribute: 196 Reallocated_Event_Count changed from 163 to 162

Jul 30 19:35:44 lyalls-pc smartd[6407]: Device: /dev/hdd, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 200 to 100

Jul 30 20:05:44 lyalls-pc smartd[6407]: Device: /dev/hda, SMART Prefailure Attribute: 5 Reallocated_Sector_Ct changed from 174 to 173

Jul 30 20:05:44 lyalls-pc smartd[6407]: Device: /dev/hda, SMART Usage Attribute: 196 Reallocated_Event_Count changed from 162 to 161

Jul 31 18:47:18 lyalls-pc smartd[6573]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 200 to 199

Jul 31 19:17:20 lyalls-pc smartd[6575]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 199 to 200

Jul 31 19:17:21 lyalls-pc smartd[6575]: Device: /dev/hdd, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 200

Jul 31 19:47:21 lyalls-pc smartd[6575]: Device: /dev/hdd, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 200 to 100

Jul 31 21:17:21 lyalls-pc smartd[6575]: Device: /dev/hdd, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 200

Jul 31 22:47:21 lyalls-pc smartd[6575]: Device: /dev/hdd, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 200 to 100

Jul 31 23:17:21 lyalls-pc smartd[6575]: Device: /dev/hdd, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 100 to 200

Aug  1 02:47:21 lyalls-pc smartd[6575]: Device: /dev/hdd, SMART Prefailure Attribute: 7 Seek_Error_Rate changed from 200 to 100

```

Just to provide the full picture, here is all the SMART info...

```

root@lyalls-pc:log

# smartctl --all /dev/hda

smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar family

Device Model:     WDC WD1200BB-00CAA1

Serial Number:    WD-WMA8C3835155

Firmware Version: 17.07W17

User Capacity:    120,034,123,776 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   5

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Wed Aug  1 21:29:26 2007 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x84)   Offline data collection activity

               was suspended by an interrupting command from host.

               Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:        (4680) seconds.

Offline data collection

capabilities:           (0x3b) SMART execute Offline immediate.

               Auto Offline data collection on/off support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               Conveyance Self-test supported.

               No Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               No General Purpose Logging support.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     (  87) minutes.

Conveyance self-test routine

recommended polling time:     (   5) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000b   200   199   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0007   101   092   021    Pre-fail  Always       -       5733

  4 Start_Stop_Count        0x0032   098   098   040    Old_age   Always       -       2736

  5 Reallocated_Sector_Ct   0x0033   173   173   140    Pre-fail  Always       -       429

  7 Seek_Error_Rate         0x000b   200   200   051    Pre-fail  Always       -       0

  9 Power_On_Hours          0x0032   071   071   000    Old_age   Always       -       21745

 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0013   095   094   051    Pre-fail  Always       -       14

 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1694

196 Reallocated_Event_Count 0x0032   161   161   000    Old_age   Always       -       39

197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0

199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       71

200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       1

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Device does not support Selective Self Tests/Logging

root@lyalls-pc:log

# smartctl --all /dev/hdd

smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar SE family

Device Model:     WDC WD1600JB-00EVA0

Serial Number:    WD-WMAEK2901680

Firmware Version: 15.05R15

User Capacity:    160,041,885,696 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   6

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Wed Aug  1 21:29:42 2007 CST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x84)   Offline data collection activity

               was suspended by an interrupting command from host.

               Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)   The previous self-test routine completed

               without error or no self-test has ever 

               been run.

Total time to complete Offline 

data collection:        (5061) seconds.

Offline data collection

capabilities:           (0x79) SMART execute Offline immediate.

               No Auto Offline data collection support.

               Suspend Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               Conveyance Self-test supported.

               Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x01)   Error logging supported.

               No General Purpose Logging support.

Short self-test routine 

recommended polling time:     (   2) minutes.

Extended self-test routine

recommended polling time:     (  67) minutes.

Conveyance self-test routine

recommended polling time:     (   5) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0007   153   151   021    Pre-fail  Always       -       2850

  4 Start_Stop_Count        0x0032   099   099   040    Old_age   Always       -       1647

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000b   100   253   051    Pre-fail  Always       -       0

  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       17711

 10 Spin_Retry_Count        0x0013   100   100   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0013   100   100   051    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1250

194 Temperature_Celsius     0x0022   128   253   000    Old_age   Always       -       22

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always       -       0

199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       5

200 Multi_Zone_Error_Rate   0x0009   200   155   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

root@lyalls-pc:log

# 
```

Actually, closer examination of /dev/hda, I think it may be time to replace it... 429 reallocated sectors.   :Sad: 

----------

## eccerr0r

As far as I know for the smart report there's four important numbers, 

value worst threshold raw_value

value is a manufacturer-assigned health value hash and has nothing to do with the true number of errors.

worst is the worst that number for value that the drive has seen recently

threshold is the lowest number for value that the manufacturer deemed as "healthy"

raw_value is a, well, raw value obtained from the drive.

Raw_value for error counts should be monotonic increasing.  Value will decrease for a lot of stats, but for stuff like temperature and current error rate, it will fluctuate.

----------

