# Kernel Panic, looks like a HD problem

## Spanik

Since about a week my desktop starts to behave strange. At first sometimes the mouse and keyboard seemed to lock. Then I had a few times that Opera could not write the bookmarks. But the last 2 days I got Kernel Panics. I wrote down the message of the last crash:

"Kernel Panic - not syncing : <4> Reiserfs panic (device sdc3): vs-7042 entry_points_to_object: entry must be ready

Pid: 3378, comm: claws-mail Not tainted 3.0.6-gentoo"

It happened when I closed Claws and right now it refuses to start. Sdc3 is the / of the setup. This is an SSD that has already a few years work done.

Now I have already put in a new SSD in order to install a new Gentoo, but right now I really want to know if there is a way to tell if:

- the whole pc is slowly disintegrating: could be as this is about 12 years old now

- the SSD /dev/sdc is failing

- there is a simple filesystem error on /dev/sdc3

If the pc is failing then I won't spend the time doing a complete setup on it but I'm a bit in trouble with some legacy pci cards that won't be easy/cheap to replace. If it is the SSD that is failing so fast then I won't put one in for the OS anymore. If it's a filesystem error then maybe I can get it running agian long enough to build a new OS.

It is the first time I got one of these, so how do I start digging around what is wrong?

----------

## NeddySeagoon

Spanik,

Check your SSD SMART data with smartmontools

Post the output of 

```
smartctl -a /dev/sdc
```

Filesystem errors are never simple - it may be that too.

----------

## Spanik

Is this provided on the livedvd? Don't have smartctl installed and emerge doesn't work (one of the reasons I need to re-install Gentoo).

----------

## NeddySeagoon

Spanik,

I don't have the livedvd - if you, try it.

Its very difficult to break gentoo so badly that you need to reinstall. emerge not working isn't one of them.

A dead or dying hdd might be though.

----------

## Spanik

I know, but this pc hasn't been updated for very long, I changed the profile but now it is in conflict with whatever is installed etc. Then there is the systemd thing going on. So I'd like to just switch to a new install on a new disk and keep this one around "for just in case". Bootable with the applications as they are now. So if needed I can go back. Had to do this already once for old files of Rezound that for one reason or another I cannot get working on series 3 kernels.

I'll download the livedvd and try it.

----------

## NeddySeagoon

Spanik,

Ah, thats different.  Its often faster to reinstall than to update a very old system.

smartmontools, which provides smartctl is on System Rescue CD and that's a much smaller download.

----------

## Spanik

I started the system and KDE has a disk utility that let you look at the smart info. Don't know how to interprete this and as I can't even open a browser anymore I'll just type those that are clearly errors over:

```

Overall assessment: Disk is healty

198 Uncorrectable Sector Count  Normalized: 4

                                                      Worst:         246

                                                      Threshold:   0

                                                      Value:          18569 sectors

199 UDMA CRC Error Rate           Normalized: 145

                                                      Worst:         82

                                                      Threshold:   0

                                                      Value:          37679

200 Write Error Rate                     Normalized: 240

                                                      Worst:         253

                                                      Threshold:   0

                                                      Value:          490

201 Soft Read Error Rate             Normalized: 67

                                                      Worst:         192

                                                      Threshold:   0

                                                      Value:          929

202 Data Address Mark Errors     Normalized: 58

                                                      Worst:         8

                                                      Threshold:   0

                                                      Value:          157

203 Run Out Cancel                      Normalized: 172

                                                      Worst:         147

                                                      Threshold:   0

                                                      Value:          N/A

```

As said I don't know how to read those numbers but they don't give me confidence. Hard to compare as I don't have any other SSD that is in use. But the hd next to it has for the errors just 0 on all counts. Temperature is 23 °C according to its smart status.

EDIT: just ran smartctl on the SSD in the laptop:

```

SMART Attributes Data Structure revision number: 1

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x0003   100   100   070    Pre-fail  Always       -       0

  5 Reallocated_Sector_Ct   0x0003   100   100   000    Pre-fail  Always       -       0

  9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       68

 12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       353

177 Wear_Leveling_Count     0x0003   100   100   000    Pre-fail  Always       -       978

178 Used_Rsvd_Blk_Cnt_Chip  0x0003   100   100   000    Pre-fail  Always       -       0

181 Program_Fail_Cnt_Total  0x0003   100   100   000    Pre-fail  Always       -       0

182 Erase_Fail_Count_Total  0x0003   100   100   000    Pre-fail  Always       -       0

187 Reported_Uncorrect      0x0002   100   100   000    Old_age   Always       -       0

192 Power-Off_Retract_Count 0x0003   100   100   000    Pre-fail  Always       -       13

196 Reallocated_Event_Count 0x0003   100   100   000    Pre-fail  Always       -       0

198 Offline_Uncorrectable   0x0003   100   100   000    Pre-fail  Always       -       0

199 UDMA_CRC_Error_Count    0x0003   100   100   000    Pre-fail  Always       -       0

232 Available_Reservd_Space 0x0003   100   100   010    Pre-fail  Always       -       0

241 Host_Writes_32MiB       0x0003   100   100   000    Pre-fail  Always       -       1165

242 Host_Reads_32MiB        0x0003   100   100   000    Pre-fail  Always       -       3747

```

Items 198 and 199 are well and true 0 on this one. So it looks as if that SSD is failing.

----------

## NeddySeagoon

Spanik,

The values in VALUE WORST and THRESH are normalised.  A parameter has failed if  VALUE or WORST is less that or equal to THRESH.

RAW_VALUE is vendor specic.  All the values are 32 bit numbers but there may be several bit packed raw vales in the same 32 bit number.

That means that large RAW_VALUE is not always a cause for concern.

ID 5 Reallocated_Sector_Ct is useful.  It need not be zero.

ID 196 Reallocated_Event_Count is also useful. Again, it need not be zero.

There are no failures in your typed values.

Try replacing the SATA data cable

----------

## Spanik

Ok, didn't knew that. Parameter 5 is not listed however 196 is:

[code]

196 Reallocation Count Normalized: N/A

                                      Worst: N/A

                                      Threshold: 0

                                      Value: 0

[code]

There are quite a few parameters where Value = Threshold:

- Read Error Rate

- end-to-end -error

- Hardware ECC Recovered

- Soft ECC coreection

- Thermal Asperity Rate

Always both values are 0.

I'll power down and open the pc, check cables and re-seat the memory. And then boot with the live-dvd and try to see if a filesystem checks find anything.

----------

## NeddySeagoon

Spanik,

Don't do a fsck.  It may well make things worse.  Use smartctl to run the short test.

If that passes, try the long test.

It does not sound good if Value=Threshold but I've never seen the KDE output so I am not confident of what its telling.

Is the value the VALUE WORST THRESH value or the RAW_VALUE?

----------

## Spanik

It is the raw value I suppose. The "worst" and "normalized" are given as N/A. 

Download has finished. I'll see what that gives if smartctl is on it.

----------

## Spanik

The LiveDVD doesn't include smartctl so I had to use the RescueCD. This gives this:

```

smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.10.60-std441-amd64] (local build)

Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family:     Indilinx Barefoot based SSDs

Device Model:     OCZ-VERTEX

Serial Number:    554XK0501K5213NMYGVZ

Firmware Version: 1.6

User Capacity:    96,029,466,624 bytes [96.0 GB]

Sector Size:      512 bytes logical/physical

Rotation Rate:    Solid State Device

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

Local Time is:    Sun Jan 25 17:12:07 2015 UTC

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x02)   Offline data collection activity

               was completed without error.

               Auto Offline Data Collection: Disabled.

Self-test execution status:      ( 249)   Self-test routine in progress...

               90% of test remaining.

Total time to complete Offline 

data collection:       (    0) seconds.

Offline data collection

capabilities:           (0x1d) SMART execute Offline immediate.

               No Auto Offline data collection support.

               Abort Offline collection upon new

               command.

               Offline surface scan supported.

               Self-test supported.

               No Conveyance Self-test supported.

               No Selective Self-test supported.

SMART capabilities:            (0x0003)   Saves SMART data before entering

               power-saving mode.

               Supports SMART auto save timer.

Error logging capability:        (0x00)   Error logging NOT supported.

               General Purpose Logging supported.

Short self-test routine 

recommended polling time:     (   0) minutes.

Extended self-test routine

recommended polling time:     (   0) minutes.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x0000   ---   ---   ---    Old_age   Offline      -       5

  9 Power_On_Hours          0x0000   ---   ---   ---    Old_age   Offline      -       4458

 12 Power_Cycle_Count       0x0000   ---   ---   ---    Old_age   Offline      -       1490

184 Initial_Bad_Block_Count 0x0000   ---   ---   ---    Old_age   Offline      -       182

195 Program_Failure_Blk_Ct  0x0000   ---   ---   ---    Old_age   Offline      -       0

196 Erase_Failure_Blk_Ct    0x0000   ---   ---   ---    Old_age   Offline      -       0

197 Read_Failure_Blk_Ct     0x0000   ---   ---   ---    Old_age   Offline      -       0

198 Read_Sectors_Tot_Ct     0x0000   ---   ---   ---    Old_age   Offline      -       1222207344

199 Write_Sectors_Tot_Ct    0x0000   ---   ---   ---    Old_age   Offline      -       2485828217

200 Read_Commands_Tot_Ct    0x0000   ---   ---   ---    Old_age   Offline      -       32251881

201 Write_Commands_Tot_Ct   0x0000   ---   ---   ---    Old_age   Offline      -       61045947

202 Error_Bits_Flash_Tot_Ct 0x0000   ---   ---   ---    Old_age   Offline      -       10326749

203 Corr_Read_Errors_Tot_Ct 0x0000   ---   ---   ---    Old_age   Offline      -       7536169

204 Bad_Block_Full_Flag     0x0000   ---   ---   ---    Old_age   Offline      -       0

205 Max_PE_Count_Spec       0x0000   ---   ---   ---    Old_age   Offline      -       5000

206 Min_Erase_Count         0x0000   ---   ---   ---    Old_age   Offline      -       439

207 Max_Erase_Count         0x0000   ---   ---   ---    Old_age   Offline      -       3079

208 Average_Erase_Count     0x0000   ---   ---   ---    Old_age   Offline      -       1600

209 Remaining_Lifetime_Perc 0x0000   ---   ---   ---    Old_age   Offline      -       68

211 SATA_Error_Ct_CRC       0x0000   ---   ---   ---    Old_age   Offline      -       0

212 SATA_Error_Ct_Handshake 0x0000   ---   ---   ---    Old_age   Offline      -       0

213 Indilinx_Internal       0x0000   ---   ---   ---    Old_age   Offline      -       0

Warning! SMART ATA Error Log Structure error: invalid SMART checksum.

SMART Error Log Version: 1

No Errors Logged

Warning! SMART Self-Test Log Structure error: invalid SMART checksum.

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported
```

Reseated all the memory and ran the Memtestx86. This didn't find anything.

I'm going to save whatever I can that isn't backed up (email!) and then re-install on another disk.

----------

## NeddySeagoon

Spanik,

That looks OK.  I suspect its a filesystem problem.

```
211 SATA_Error_Ct_CRC       0x0000   ---   ---   ---    Old_age   Offline      -       0

212 SATA_Error_Ct_Handshake 0x0000   ---   ---   ---    Old_age   Offline      -       0 
```

The SATA interface looks OK from the drives end too.

----------

## Spanik

OK, in that case I'm going to transfer everything needed and let a fsck run. Thanks for the help.

----------

## Black

If you have time, can you try booting with your old kernel and see if you get the same errors?

I was having issues recently after upgrading my kernel, but the drive works fine with an old kernel.

----------

