# Is my harddisk fail? smartcrl Temperature_Celsius

## sapparod

Hi all,

MySQL is running with gentoo 2.6.19. (I know it's quite a old but this is database server. So I prefer not to touch the system too much).

Recently, the system fail randomly every around 1-3 month. 

After installing the monitoring tool, I found that before the system go down all memory were used and Mysql process unable to close the table. So, I think, the most likely problem must be harddisk.

http://img696.imageshack.us/img696/456/graphimagephp.png

So, I use the smartmontools to verify this. (if you guys have another tool that's cool. please suggest)

The following is output from smartctl. 

```

=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar SE16 family

Device Model:     WDC WD2500KS-00MJB0

Serial Number:    WD-WCANK3827027

Firmware Version: 02.01C03

User Capacity:    250,059,350,016 bytes

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   7

ATA Standard is:  Exact ATA specification draft version not indicated

Local Time is:    Tue Mar  9 11:43:34 2010 ICT

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

See vendor-specific Attribute list for marginal Attributes.

General SMART Values:

Offline data collection status:  (0x85) Offline data collection activity

                                        was aborted by an interrupting command from host.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                 (7680) seconds.

Offline data collection

capabilities:                    (0x7b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        (  90) minutes.

Conveyance self-test routine

recommended polling time:        (   6) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0003   192   188   021    Pre-fail  Always       -       5400

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       56

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always       -       0

  9 Power_On_Hours          0x0032   060   060   000    Old_age   Always       -       29812

 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always       -       0

 11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       55

190 Temperature_Celsius     0x0022   068   033   045    Old_age   Always   In_the_past 32

194 Temperature_Celsius     0x0022   118   083   000    Old_age   Always       -       32

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%     29812         -

# 2  Short offline       Completed without error       00%     29811         -

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
```

As you can see, the selftest report no error found. 

Anyway, I still doubt Temperature_Celsius .

I've googled around but cannot find a good conclusion on this value.(Temperature_celsius)

So, I would ask you guys suggestion regarding my issue. 

Thanks all.

----------

## eccerr0r

My WD disks are like that, they monitor temperature.  Basically if the HDD gets warm enough over the threshold that WD thinks is safe, it will "fail" it, I believe it's probably so they can worm their way out of some warranty replacements.

Fortunately or unfortunately the two disks I had "overheat" (and SMART fail) at a point - are out of warranty now anyway due to the stingy 1y warranty.  They're still working perfectly fine, at least... and yes I had them get quite warm before so it's fairly certain that's what happened.

----------

