# 2.6.35 hangs badly

## Ormaaj

2.6.35 kernel has been causing everything to freeze up after running certain heavy loads like compiling. All processes freeze except when triggering interrupts by moving the mouse or switching between virtual desktops. For example - watching a video will freeze completely except when moving the mouse - everything returns to normal. The only indication is that my xmobar displays "cpu total not found" rather than the load %.

This has been going on since about rc1-rc2. It occurs in both unstable and stable zen kernel (2.6.34 with some backported things), all vanilla 2.6.35 rcs, but not vanilla 2.6.34.1. I really don't know where to start looking to diagnose this. A git bisect would take ages because it occurs unpredictably within an hour of booting. Arch is amd64. Not sure what kind of info would be useful in diagnosing.

----------

## d2_racing

Are you using the latest vanilla-sources or the gentoo-sources ?

----------

## Ormaaj

 *d2_racing wrote:*   

> Are you using the latest vanilla-sources or the gentoo-sources ?

 

sys-kernel/vanilla-sources are installed through portage to satisfy the virtuals but I'm just checking out tags directly from git for both zen-sources and vanilla kernel.org kernel. I tried switching back to vanilla from zen to see whether it was specific to one of their patches but apparently it isn't.

Currently testing whether this might be related to the recent intel i7 cpuidle driver...Last edited by Ormaaj on Thu Aug 05, 2010 4:22 pm; edited 1 time in total

----------

## dufeu

 *Ormaaj wrote:*   

> Currently testing whether this might be related to the recent intel i7 cpuidle driver...

 

I don't have any answers to this. I had some what may be AMD equivalent cpu idle {BIOS setting AMD C1E} issues. I 'resolved' the problem by disabling the function in the BIOS.

If you're indeed having cpuidle issues, it may be a more general problem with cpus that support it. So I'm just monitoring the thread.

----------

## Timbers2k

I'm having the exact same problem.  The worst case for me is playing Wow! After a few minutes the sound start stuttering and then everything starts to crawl. Once I exit Wow my desktop is very slow, and gkrellm is showing cpu use staying high, but nothing shows in top.

I'm also using a Core i7, and I did set up the new cpuidle driver. I'll try disabling that and see if it makes a difference.

----------

## Timbers2k

It works fine if you disable the "Cpuidle Driver for Intel Processors" option under "Power management and ACPI options".  Seems that this option is not quite ready for prime time.

----------

## maj

See, its the first thing I disabled when I found the problem - still exists for me, mplayer will output it's your system is too slow message, which given its an i7 system, I think not! Besides, video playback is handed off to the GPU!

Edit - ok, removed "ACPI Processor P-States driver" aswell as "Cpuidle Driver for Intel Processors", now the system is behaving itself!

Edit 2 - Or not, still have the issue - its less pronounced, but it still happens  :Sad: 

----------

## Cffeine

Hey all,

Just thought I'd throw in my experience.  I also have an i7 and was experiencing the same problem with 2.6.35 (gentoo-sources). Any heavy load would bring my computer to a crawl (with the exception of moving the mouse, clicking menus, etc. Similar to what the OP was experiencing). My system never seemed to recover but I'm not sure if I gave it adequate time to do so as the system would usually be so slow I would have difficulty getting the offending process to close. I would usually have to use SysReq+REISUB to get it to reboot. I tried disabling "ACPI Processor P-States driver" as well as "Cpuidle Driver for Intel Processors" as mentioned but with no luck.

I finally got it to run properly by also disabling all the "CPU Frequency scaling --->" options. Not sure which one was causing the problem as I didn't have time over the weekend to test every governor combination. 

My system now remains responsive as it should, but I have a couple of other strange issues that are probably not related. Namely: if I try and manually mount one of my hard disks on the command line, mount just hangs and the system goes slightly unstable -- I can log out, but If I try and shut the computer down, it just hangs. Not sure if this is a 2.6.35 kernel specific problem or if its because I disabled CONFIG_IDE as instructed by udev. I do know I didn't have this problem with 2.6.34 but I didn't take udev's advice for disabling CONFIG_IDE until I started messing with 2.6.35.

Not trying to wander off-topic, but here is the output of dmesg | tail after trying to do a mount operation:

```

[<c017aaf8>] ? do_kern_mount+0x2f/0xb8

 [<c018ce3c>] ? do_mount+0x657/0x6ba

 [<c018b557>] ? copy_mount_options+0x7d/0xe7

 [<c018cf05>] ? sys_mount+0x66/0x9d

 [<c01025d0>] ? sysenter_do_call+0x12/0x26

Code: d2 89 d0 5b c3 55 57 89 c7 56 53 e8 ec 36 15 00 8b 5f 04 8b 77 08 8b 2d c4 5f 47 c0 83 c9 ff eb 12 8b 14 8d 58 68 5d c0 8b 47 14 <8b> 04 10 99 01 c3 11 d6 41 ba 10 00 00 00 89 e8 e8 a2 32 ff ff 

EIP: [<c031d405>] __percpu_counter_sum+0x26/0x55 SS:ESP 0068:f5f63d20

CR2: 00000000025d3000

---[ end trace 1c099b19386537c6 ]---

note: mount[19808] exited with preempt_count 1
```

Not sure If I should file a bug since I don't know if I'm experienced enough to know if my problems are truly a kernel problem or something stupid I did.

Anyway, just thought I'd throw my 2.6.35 experiences out there in case it helps anyone. 

Later

----------

## dufeu

 *Cffeine wrote:*   

>  ... Namely: if I try and manually mount one of my hard disks on the command line, mount just hangs and the system goes slightly unstable -- I can log out, but If I try and shut the computer down, it just hangs. ... 

 

You're not stupid and you may or may not be having a kernel problem.

Given your existing experiences with cpu scaling, I'd try the following noting, however, these suggestions may or may not help and and are at your own risk:Confirm that you have the latest BIOS installed for your motherboard

Go through your BIOS and simplify your settings. This means turn off any overclocking etc. Be especially sensitive to anything which might have an impact on either power conservation {don't permit the BIOS to do power conservation} or on PCIe bus activity.

Personal experience suggests possible timing issues across your PCIe buss(es) which is one of the things which could cause your disk to 'hang'.

Replace the SATA cable to that disk. There are tools you can run to check for HDD faults. Beyond scope for the moment to cover them. Replacing the cable with a known good one is easier.

In the kernel, disable any disk controller driver you're not using.

Like the BIOS, simplify your kernel settings with an eye towards not using any power saving features etc.

Consider using a "Pappy's kernel seed" as your base kernel settings.

Good Luck!

----------

## Ormaaj

I can also confirm that the new intel idle driver isn't the problem - but if the solution is disabling all power saving then that would really suck as I have quite an overclocked system. P/C-states along with frequency scaling do a very nice job of keeping the temps down and saving power when idle (which is 99% of the time) so I don't really want to disable them keeping my CPU buzzing away at 4 Ghz at all times.

Maybe I'll just stick to 2.6.34.x until this gets fixed upstream. There should really be no reason to disable ACPI C-states.

EDIT: I haven't noticed the problem yet with the performance governor and ACPI P-state driver enabled along with intel cpuidle. Only been testing for about 20 minutes though so we'll see.

----------

## Ormaaj

Still seems fine after switching from ondemand to performance.

Also... just judging from the description relative to the symptoms, this one looks like a very likely culprit:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1f85f87d4f81d1e5a2d502d48316a1bdc5acac0b

http://lwn.net/Articles/386990/

----------

## dufeu

 *Ormaaj wrote:*   

> Still seems fine after switching from ondemand to performance.
> 
> Also... just judging from the description relative to the symptoms, this one looks like a very likely culprit:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=1f85f87d4f81d1e5a2d502d48316a1bdc5acac0b
> ...

 

Thank you for the links. I wish I better understood what they mean. {sigh}

My general guidance and approach to giving suggestions is to strip everything to the minimum and then add back items/features/settings one at a time. For most people, this path works well and is very understandable to them.

You, on the other hand, obviously know a lot better what you're doing!  :Very Happy: 

What concerns me with cpu idle, cpu frequency {over or under} and sleep modes are their potential impact on PCI(e) bus characteristics. The problem is that anything plugged into the bus may behave unpredictably if PCI(e) characteristics change. I've had several modern mobos (still being manufactured) where PCI(e) attached devices did some very strange things like GigaEthernet cards being able to run only at FastEthernet speeds, upper memory (above 4 Gigs RAM) errors and graphic display artifacts. I managed to trace these back to AMD cpu idle issues. And I don't even overclock my systems!

BTW - My personal take on the links you provided is that cpu idle is still under "active" development. So none of us should be surprised if specific systems behave strangely.

Take care and thanks again for the links.

 :Smile: 

----------

## Ormaaj

I don't think this should affect PCIe because none of this should have an effect on board frequencies or even the input frequency to the CPU. Current intel frequencies on i7 are quite complex. There's sort of a hierarchy of things which affect actual core frequencies: FSB (BCLK) -> multiplier -> ACPI P-States -> Intel Turbo. The latter two happen within the chip and should be pretty transparent to everything else, and Turbo you really don't have any control over and I don't really understand why Turbo isn't just another P-state because I would think that should have the same effect. Additionally, there are the C-states which control sleeping, and even more beyond that which is controlled by that intel idle driver... I guess ACPI wasn't good enough for Intel so they had to add their own proprietary extensions.

P-states are controlled in discrete increments by the frequency scaling driver and work independently for each core. It shoudln't cause side-effects in other subsystems. You can see the true frequencies and P-states and C-states in action if you install i7z (there's no ebuild afaik)

http://code.google.com/p/i7z/

and also powertop gives some C-states info

http://www.lesswatts.org/projects/powertop/

If you notice in i7z, p-states are still doing their thing even if you use the performance governor. Enabling the on-demand driver seems to make p-states more aggressive for some reason... when your cpu is idle it really gets down to lower frequencies than without it. Disabling the p-states driver entirely causes you to always run at the highest p-state.

----------

## ferrarif5

I've got the same issue as described in the thread, lagging response from keyboard, mouse, screen refresh with my CPU hitting 100% load, think I'll roll back to gentoo-sources-2.6.34 for now.

----------

## drescherjm

I may be having a different issue but instead of my kernel booting in 20s or so on my i7 3.0GHz it takes over 90s and typing or anything else in the terminal reminds me of the early 1990s and dialing into my university with a 1200 baud modem, heck 1200 baud was faster..

----------

## dufeu

 *drescherjm wrote:*   

> I may be having a different issue but instead of my kernel booting in 20s or so on my i7 3.0GHz it takes over 90s and typing or anything else in the terminal reminds me of the early 1990s and dialing into my university with a 1200 baud modem, heck 1200 baud was faster..

 

Have you reviewed this thread?  Delays during boot and shutdown [SOLVED] 

I realize it's AMD focused rather than Intel, but I suspect that the whole 'power saving' thing is going to be problematical for some time to come. In addition to AMD and Intel trying to gain a competitive edge in 'power savings', there are all the mobo manufacturers with their own (mis)understandings of what 'power savings' means, related BIOS issues and yada-yada-yada. I'm also seeing that the language translation issue from the mobo manufacturers is obfuscating what's happening as well.

With the plethora of available CPU 'features', possible BIOS settings and coordinated kernel settings, nothing is as simple or easy as it used to be.

----------

## drescherjm

Thanks. I will try disabling C1E. However that is only a temporary solution since I find it unacceptable to have to disable this power saving feature.

[EDIT]Well at least on my machine power management is not the cause. I disabled CxE and SpeedStep in my BIOS and the problem continued. So back to 2.6.34. [/EDIT]

----------

## the_bard

I'm running across symptoms as experienced by the above: Running gentoo-sources 2.6.35-r1, I'm getting odd "soft" hanging on boot. I've got a boot splash screen configured, so leaving it in silent mode simply displays the loading bar as normal, but it hangs. The system does respond to input... switching to the verbose boot splash causes the system to unhang, whereupon it appears to hang at the next event.

Any keyboard activity seems to temporarily resolve the issue, very briefly. I've babysat the boot process, hitting keys repeatedly, until my system tries to load KDM. I don't have the patience to continue at that point.

I upgraded to 2.6.35-r1 by copying the .config from 2.6.34, then performing a `make oldconfig`. Removing the new intel idle CPU and APCI P-States drivers have done nothing. If I remember correctly, I even disabled ACPI completely within the kernel... it made no difference. I swapped back to 2.6.34.

I'm running a Core i7-860 on an Intel DP55WG board, for what it's worth.

----------

## frostschutz

2.6.35 with intel cpuidle caused really bad hangs and panics in KVM for me. 2.6.35.3 without anything new (answered N to all when make oldconfig from 2.6.34.x) does not have bad hangs, but in less than 6 hours I already had a KVM machine crashing again. There is no problem whatsoever with 2.6.34.x. So something is definitely odd with 2.6.35 for me, on a Intel i7 920 machine. I'll try disabling anything power saving related next since disabling the cpu intel idle driver already improved things a lot. Other than that I'm out of ideas as to what could be the culprit.

----------

## dufeu

 *frostschutz wrote:*   

> 2.6.35 with intel cpuidle caused really bad hangs and panics in KVM for me. 2.6.35.3  ... 

 

For What It's Worth: Even though I use both vmware and virtualbox {not at the same time}, I currently leave off all virtualisation support.  I don't make any production use of either so it's one of the areas where I've simplified my kernel setup and accept the performance hit.

Also FWIW: I've always felt there's a dichotomy involved when combining aggressive power saving techniques {such as select cpu core idling} with the running of virtual machines. You either do one or the other but not both. i.e. "Power Saving" implies your load only occasionally needs all the hardware resources you have available. "Virtualization" implies that you are interleaving multiple workloads in order to achieve more complete utilization of resources. In my opinion these are contradictory goals and I've always geared my kernel settings to one or the other but not both.

As always, YMMV and all that.  :Very Happy: 

----------

## frostschutz

The VMs are idle themselves most of the time. With the host running 2.6.35, they get kernel soft lockups when they are not.

And all that does not explain why it works fine in 2.6.34 (and older), but not in 2.6.35, using basically the same config.

Edit: Disabling the Power Management section entirely seems to have further improved the situation.

Edit2: Nah, still soft lockups in KVM. Unrelated to power management after all. Must be KVM doing something strange.

----------

## zx2c4

@the_bard

Your symptoms are exactly the same as mine. Did you find a solution eventually?

----------

## optiluca

 *zx2c4 wrote:*   

> @the_bard
> 
> Your symptoms are exactly the same as mine. Did you find a solution eventually?

 

I am also suffering the same issues with the 2.6.35 line of the gentoo sources.  I have not experienced it once using zen sources.  The only significant differences in setup I can think of the the use of  BFS/BFQ schedulers and the SLQB allocator.

BTW I am also running an i7 system, dunno if that could be the source of the issue.

Anyway all I can say is that zen sources worked for me, even though I would still like to know what the hell is going on...   :Confused: 

----------

## Genewb

There's a tracker ticket--which hasn't been answered--and another report on the mailing list was sent yesterday, which also hasn't been answered. It seems that no kernel hackers care much.

----------

## drescherjm

Genewb

Thanks. The second thread led me to the solution. Nix found by bisecting the kernel that this problem was with the clocksource. In his case it was hpet. I was using tsc. Switching to acpi_pm and all is well.

https://forums.gentoo.org/viewtopic-t-842775-highlight-.html

----------

## maj

Seems to be working here  :Smile: 

----------

