# Occasional kernel panic at shutdown (gentoo-sources-3.7.10)

## leo.the_zoo

Hello fellow Gentooers,

Since 3.5.x I occasionally observe kernel panic near system shutdown. I managed to capture a part of trace on a photo (I guess in this case I had no other choice): http://img805.imageshack.us/img805/6426/20130326024440.jpg

I have no idea why it happens. Please enlighten me what is going on and how I could overcome this problem.

More info on my system: Kernel config

Processor (excerpt from /proc/cpuinfo):

```
processor       : 0

vendor_id       : GenuineIntel

cpu family      : 6

model           : 30

model name      : Intel(R) Core(TM) i7 CPU       Q 720  @ 1.60GHz

stepping        : 5

microcode       : 0x3

cpu MHz         : 933.000

cache size      : 6144 KB

physical id     : 0                                                                                                          

siblings        : 8

core id         : 0

cpu cores       : 4

apicid          : 0

initial apicid  : 0

fpu             : yes

fpu_exception   : yes

cpuid level     : 11

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid

bogomips        : 3200.07

clflush size    : 64

cache_alignment : 64

address sizes   : 36 bits physical, 48 bits virtual

power management:

```

Uname:

```
$ uname -a

Linux LEO 3.7.10-gentoo #1 SMP Sat Mar 2 17:31:28 CET 2013 x86_64 Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz GenuineIntel GNU/Linux
```

Hardware:

```
# lspci -k

00:00.0 Host bridge: Intel Corporation Core Processor DMI (rev 11)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

00:03.0 PCI bridge: Intel Corporation Core Processor PCI Express Root Port 1 (rev 11)

        Kernel driver in use: pcieport

00:08.0 System peripheral: Intel Corporation Core Processor System Management Registers (rev 11)

        Subsystem: Device 0043:0047

00:08.1 System peripheral: Intel Corporation Core Processor Semaphore and Scratchpad Registers (rev 11)

        Subsystem: Device 0043:0047

00:08.2 System peripheral: Intel Corporation Core Processor System Control and Status Registers (rev 11)

        Subsystem: Device 0043:0047

00:08.3 System peripheral: Intel Corporation Core Processor Miscellaneous Registers (rev 11)

        Subsystem: Device 0043:0047

00:10.0 System peripheral: Intel Corporation Core Processor QPI Link (rev 11)

        Subsystem: Device 0043:0047

00:10.1 System peripheral: Intel Corporation Core Processor QPI Routing and Protocol Registers (rev 11)

        Subsystem: Device 0043:0047

00:16.0 Communication controller: Intel Corporation 5 Series/3400 Series Chipset HECI Controller (rev 06)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

00:1a.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

        Kernel driver in use: ehci_hcd

00:1b.0 Audio device: Intel Corporation 5 Series/3400 Series Chipset High Definition Audio (rev 06)

        Subsystem: ASUSTeK Computer Inc. Device 1333

        Kernel driver in use: snd_hda_intel

00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 06)

        Kernel driver in use: pcieport

00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 06)

        Kernel driver in use: pcieport

00:1c.2 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 3 (rev 06)

        Kernel driver in use: pcieport

00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 5 (rev 06)

        Kernel driver in use: pcieport

00:1c.5 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 6 (rev 06)

        Kernel driver in use: pcieport

00:1d.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

        Kernel driver in use: ehci_hcd

00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev a6)

00:1f.0 ISA bridge: Intel Corporation Mobile 5 Series Chipset LPC Interface Controller (rev 06)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

        Kernel driver in use: lpc_ich

00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 4 port SATA AHCI Controller (rev 06)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

        Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 06)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

        Kernel driver in use: i801_smbus

01:00.0 VGA compatible controller: NVIDIA Corporation GT215 [GeForce GTS 360M] (rev a2)

        Subsystem: ASUSTeK Computer Inc. Device 203c

        Kernel driver in use: nvidia

01:00.1 Audio device: NVIDIA Corporation High Definition Audio Controller (rev a1)

        Subsystem: ASUSTeK Computer Inc. Device 203c

        Kernel driver in use: snd_hda_intel

03:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)

        Subsystem: AzureWave AW-NE785 / AW-NE785H 802.11bgn Wireless Full or Half-size Mini PCIe Card

        Kernel driver in use: ath9k

06:00.0 SD Host controller: Ricoh Co Ltd MMC/SD Host Controller (rev 01)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

06:00.1 System peripheral: Ricoh Co Ltd R5U2xx (R5U230 / R5U231 / R5U241) [Memory Stick Host Controller] (rev 01)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

06:00.2 System peripheral: Ricoh Co Ltd PCIe xD-Picture Card Controller (rev 01)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

06:00.3 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 PCIe IEEE 1394 Controller (rev 01)

        Subsystem: ASUSTeK Computer Inc. Device 1f47

07:00.0 Ethernet controller: Qualcomm Atheros AR8131 Gigabit Ethernet (rev c0)

        Subsystem: ASUSTeK Computer Inc. Device 1820

        Kernel driver in use: atl1c

3f:00.0 Host bridge: Intel Corporation Core Processor QuickPath Architecture Generic Non-Core Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:00.1 Host bridge: Intel Corporation Core Processor QuickPath Architecture System Address Decoder (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:02.1 Host bridge: Intel Corporation Core Processor QPI Physical 0 (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:03.0 Host bridge: Intel Corporation Core Processor Integrated Memory Controller (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:03.1 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Target Address Decoder (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:03.4 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Test Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:04.0 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Control Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:04.1 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Address Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:04.2 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Rank Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:04.3 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 0 Thermal Control Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:05.0 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Control Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:05.1 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Address Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:05.2 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Rank Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

3f:05.3 Host bridge: Intel Corporation Core Processor Integrated Memory Controller Channel 1 Thermal Control Registers (rev 04)

        Subsystem: Intel Corporation Device 8086

```

At the end:

emerge --info 

I found this bug but since I did not notice the beginning, I can't tell if my issue is exactly the same.

More information shall be delivered on request.

----------

## dE_logics

The good stuff is above the screen.

It may be an FS problem. During shutdown, an unmount of FS may be causing the issue.

What does RC say just before the panic?

Also how about your HDDs SMART? smartctl -a /dev/sd*

----------

## leo.the_zoo

The problem does not happen each time I shut down my laptop. Sometimes it happened two times in a row, sometimes not even once a week. I'm usually not in front of the screen when it happens cause I usually leave my seat right after I start shutting down the system and then discover it's still up because of kernel panic.

I installed smartmontools and generated output for my root partition (smartctl does not accept wildcards):

```
# smartctl -a /dev/sda4

smartctl 5.42 2011-10-20 r3458 [x86_64-linux-3.7.10-gentoo] (local build)

Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===

Model Family:     Seagate Momentus 7200.4

Device Model:     ST9500420AS

Serial Number:    5VJ3W3KS

LU WWN Device Id: 5 000c50 021673f10

Firmware Version: 0002SDM1

User Capacity:    500,107,862,016 bytes [500 GB]

Sector Size:      512 bytes logical/physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   8

ATA Standard is:  ATA-8-ACS revision 4

Local Time is:    Fri Mar 29 11:48:45 2013 CET

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status:  (0x00) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Disabled.

Self-test execution status:      (   0) The previous self-test routine completed

                                        without error or no self-test has ever 

                                        been run.

Total time to complete Offline 

data collection:                (    0) seconds.

Offline data collection

capabilities:                    (0x73) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        No Offline surface scan supported.

                                        Self-test supported.

                                        Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (   2) minutes.

Extended self-test routine

recommended polling time:        ( 108) minutes.

Conveyance self-test routine

recommended polling time:        (   3) minutes.

SCT capabilities:              (0x103f) SCT Status supported.

                                        SCT Error Recovery Control supported.

                                        SCT Feature Control supported.

                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       124514236

  3 Spin_Up_Time            0x0003   097   097   085    Pre-fail  Always       -       0

  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2102

  5 Reallocated_Sector_Ct   0x0033   099   099   036    Pre-fail  Always       -       39

  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       249189789

  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       13093

 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0

 12 Power_Cycle_Count       0x0032   098   037   020    Old_age   Always       -       2070

184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0

187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

188 Command_Timeout         0x0032   100   096   000    Old_age   Always       -       2415

189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0

190 Airflow_Temperature_Cel 0x0022   063   048   045    Old_age   Always       -       37 (Min/Max 29/37)

191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       21

192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1

193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       536679

194 Temperature_Celsius     0x0022   037   052   000    Old_age   Always       -       37 (0 19 0 0 0)

195 Hardware_ECC_Recovered  0x001a   046   043   000    Old_age   Always       -       124514236

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       198547748171484

241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2555442319

242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3938290194

254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

```

I don't like "old_age" and "pre_fail" labels there... The disk is only three years old.

----------

## dE_logics

The HDD is ok.

3 years is quite a lot of age in hardware and software.

I think the only hope are the RC logs, you've to look at it each time at shutdown. I don't think logging will help, cause by the time the logs will be synced to disk, the kernel crashes, but still uncomment rc_logger="YES" in /etc/rc.conf; and post that in case of a kernel panic.

As a test, you may try and unmount all of your filesystems. And what kinds of FS are you using?

----------

## leo.the_zoo

I have a dual boot, so except ReiserFS I'm using for Gentoo, I also have vfat and ntfs partitions mounted.

My rc_logger has been uncommented for quite some time, but:

```
# tail /var/log/rc.log 

 [ ok ]

 * Starting clamd ...

 [ ok ]

 * Starting freshclam ...

 [ ok ]

 * Starting cupsd ...

 [ ok ]

rc default logging stopped at Fri Mar 29 11:36:24 2013

# umount /mnt/distfiles

# tail /var/log/rc.log 

 [ ok ]

 * Starting clamd ...

 [ ok ]

 * Starting freshclam ...

 [ ok ]

 * Starting cupsd ...

 [ ok ]

rc default logging stopped at Fri Mar 29 11:36:24 2013

```

The RC logger apparently works only at startup and shutdown. And I don't see anything unusual in /var/log/rc.log so the crash must have happened after logging had stopped.

----------

## mgranet

 *dE_logics wrote:*   

> The HDD is ok. 

 

I hate to disagree with you, but I must.

 *leo.the_zoo wrote:*   

> 193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       536679
> 
> 

 

Load_Cycle_Count shows the number of times the heads have been parked for energy conservation. On certain setups, with certain drive firmwares, a problem can occur where the heads constantly park/unpark. I have been unable to trace down exactly what setups are affected, but I seem to see it more on drives with 'green' firmwares. You can run the following to try to disable APM on the drive and prevent the problem, atleast in the interim.

```
hdparm -B 255 /dev/sda
```

This could indicate a failing disk.

Old_age and Pre_fail are descriptors. They don't mean that PID is currently old-age failing, or prefailing. A near-threshold value on power_on_hours would indicate that the drive is of old age. A near threshold value on a pre-fail PID means just that; it's likely to fail. That's not to say that SMART values are the be-all-end-all of drive integrity, but they can offer good insight if you know how to read them.

----------

## dE_logics

I usually skip Load_Cycle_Count, so all I saw was Reallocated_Sector_Ct, Reported_Uncorrect, Current_Pending_Sector, Offline_Uncorrectable

Regardless, this's not a HDD issue. But that actuator may fly off any time.

----------

## dE_logics

 *leo.the_zoo wrote:*   

> I have a dual boot, so except ReiserFS I'm using for Gentoo, I also have vfat and ntfs partitions mounted.
> 
> My rc_logger has been uncommented for quite some time, but:
> 
> ```
> ...

 

What about logs after a crash? 

Does it record anything just before the crash (for that date/time)?

If these things are not a problem, then it must be an ACPI issue.

----------

## leo.the_zoo

Looks like hdparm worked well since the value of Load_Cycle_Count does not change after hdparm execution. Perhaps it will help keep the disk sane for a longer time.

My logs for the day of crash seem to be just fine. That's why I think it is either unmounting root partition or maybe ACPI issue. I'll try to keep an eye on my laptop when switching off and deliver more information if panic occurs again.

----------

## asok19

I have the same happening: occasionally on shutdown (I think always during unmounting file systems) there is a kernel panic.  It happens on both my work and my home system, so I don't think it is due to a hardware fault.  But like leo.the_zoo, I am also using ReiserFS (in my case, on most of the partitions).  It definitely has been happening on 3.7 and 3.8 kernels, not sure about 3.6, and it would be near impossible to bisect and find the culprit, because I can't reproduce it.

Here is a horribly bad, but still readable screenshot of the panic.

----------

## TomWij

 *asok19 wrote:*   

> Here is a horribly bad, but still readable screenshot of the panic.

 

Bad hardware (bad memory, bad HDD) or bad scheduling related settings (bad HZ / tick / preemption) in the kernel; please run test for your hardware and try changing the kernel settings; it's not so much these settings in the kernel that are broken but some modules simply don't work together with them, also, please try to reproduce this on an untainted kernel without any VirtualBox modules loaded on boot to ensure it is not them causing this. Good luck! If you verified that it is not your hardware or the kernel settings, we could start with filing bugs about this. Though, hopefully you find the error instead, since this error is fairly generic.

----------

## augury

"The good stuff is above the screen."

OK.  You've got something up there and it won't let you go.  How many instances are there like these?  I've had the problem before myself.  Not recently though.

What are the basic steps for system halt?

sysinit is something I've delved into but the halt process I have not.

----------

## augury

You'll probably need to pastebin your kernel .config.

----------

