# WHY did this happen?

## dE_logics

Right now I had a problem on my desktop system which was doing it's 7th day uptime (with some heavy tasks going on). I started the GUI on it occasionally.

ConsoleKit and DBus were in D (un interuptable state), all this started when I noticed that USB devices once attached did not get detected at all. lsusb showed just one device - 

Elan Microelectronics Corp.

Which was unplugged (my USB keyboard). Pendrives, mice... nothing works (but the system and even KDE did not hang), so I was back to the hard terminals

Thus I decided to restart konsolekit and dbus... both the operations hanged bash and I had to do a sysrq 

+ k to stop them.

Also cat /sys/bus/usb/devices/usb3/remove hanged the system. That elan device was on bus 003 according to lsusb.

All dbus* processes where in an uninterpretable state, and there were 3 console-kit processes also in the D state (that's cause I tried to restart consolekit thrice).

So I'm trying to diagnose what went wrong... why did it happen all of a sudden?

----------

## nativemad

I would say it's either the Kernel or faulty hardware (ram?).

Have you seen anything interesting in dmesg?

Is it the first time that you left your system up for that long? Have you ever used all of your ram before or is it probably the first time with that heavy computation you've mentioned? 

I mean 7 days is absolutely nothing!

----------

## dE_logics

My record uptime is of 8 days on my Desktop system, finally the power cable got stripped out halting it.

Nothing mysterious in dmesg. Ram was practically full always and swap was more than 70% occupied. Ram doesn't have any problems, I've checked it.

----------

## drescherjm

deleted. I doubt that the OP would have benefited (in this thread) about me talking about my uptimes.

----------

## nativemad

 *drescherjm wrote:*   

> deleted. I doubt that the OP would have benefited (in this thread) about me talking about my uptimes.

 

Yeah, it is a bit inviting!   :Wink: 

de_Logics, have you tried another Kernel? Maybe even a completely different config... You could boot a livecd and chroot over to your install... If it is staying longer than these 8 days, i would look in to your kernelconfig. If also a livecd hangs after some time, then it's probably an usb device or something that acts strange!? Maybe you're able to compile ehci as module and un/reload that module forcefully after that happened!?

----------

## dE_logics

 *drescherjm wrote:*   

> deleted. I doubt that the OP would have benefited (in this thread) about me talking about my uptimes.

 

I too know about uptimes of servers, I aint plan doing any of that on my Desktop.

@nativemad

That 8 days uptime was achieved by a 37 series kernel, now I've 39.

Right now the problem is obviously not reproducible (after a reboot i.e.), but if it does happen again, what should I do to diagnose the problem? I still have that consolekit log.. anything else?

----------

## nativemad

It happened to me once, that i had some strange config-option set that i carried with over a few versions. I never found out actually what it was, but the problem disappeared with a new config from scratch.

If it happens again, i would dump dmesg to a file to investigate with a fully working system later on (dmesg >/some/file).

Also /var/log/Xorg.0.log could be of interest as of evdev (it gets overwritten by an X restart... only the last session will stay as Xorg.0.log.old)

If lsusb doesn't work in that faulty case, i would try to unload the ehci kernelmodul and reload it (maybe you have to kill some daemons first). Be aware, that you probably should do it always like `rmmod -f ehci; modprobe ehci` in one command, if you've got an usb-keyboard!   :Wink: 

----------

## dE_logics

rmmod -f have always crashed my system.

I'll build a fresh .config from scratch with Linux-3.0.

----------

## dE_logics

This time after 6 days uptime, the GUI suddenly hanged. sysrq + k did not work. Only the keyboard lights were functional. sysrq + c did trigger a crash though, suggesting the kernel was listening to sysrq + k request but did not respond for some reason.

This occurred when KDE was running in an attempt to maximize the chromium windows (by mistake).

----------

## nativemad

That sound more like an X hangup... Did you looked in the Xorg.0.log afterwards!? What kind of graphic cards do you use with which driver?

----------

## disi

 *dE_logics wrote:*   

> This time after 6 days uptime, the GUI suddenly hanged. sysrq + k did not work. Only the keyboard lights were functional. sysrq + c did trigger a crash though, suggesting the kernel was listening to sysrq + k request but did not respond for some reason.
> 
> This occurred when KDE was running in an attempt to maximize the chromium windows (by mistake).

 

I am not sure if this is related. 

For me the kernel is unstable since 2.6.39.2, while 2.6.39.1 is fine. I can freeze the system by copying ~2GB of data over the network... only SysRq+b brings me out of it. Actual everything above 2.6.39.1 has this problem (tried vanillas, git etc.)

Do you happen to use the 'jme' network card driver?

----------

## dE_logics

 *nativemad wrote:*   

> That sound more like an X hangup... Did you looked in the Xorg.0.log afterwards!? What kind of graphic cards do you use with which driver?

 

That's gone. But there was I/O activity after the hangup (the HDD light). I've a Radeon with composting enabled.

 *disi wrote:*   

> I am not sure if this is related. 
> 
> For me the kernel is unstable since 2.6.39.2, while 2.6.39.1 is fine. I can freeze the system by copying ~2GB of data over the network... only SysRq+b brings me out of it. Actual everything above 2.6.39.1 has this problem (tried vanillas, git etc.) 
> 
> Do you happen to use the 'jme' network card driver?

 

I'm using 38.5. It appears this problem is virtually impossible to crack.

----------

## nativemad

I would `rc-update del xdm` now... So that the xorg-logfile is still there after the next reboot...  :Wink: 

It would be interesting to know if your system reacts on pings or would even let you log in through ssh after such a crash!?

Also, please try to keep up to date (xorg got stable already on amd64, x86 is coming soon) https://bugs.gentoo.org/show_bug.cgi?id=371857

----------

## dE_logics

X is running from GUI. BTW sysrq + k not working suggests something wrong at a lower level.

----------

