# i915: GPU hung, declared wedged.... tips?

## WvR

I use a Lenovo ThinkPad X201i with gentoo with full satisfaction. However, recently I am experiencing some issues:

Problem: when working with Gnome (v3.6.x), at some point the interface freezes. I can move the mouse, and the cursor will move over the screen, but for instance the clock is frozen. After several seconds, the active window is blacked out. Then, I get the black screen that says "Sorry, something has gone wrong and the system cannot recover. Call a system administrator".

The keyboard is responsive. I use CTRL-ALT-F1 to get to a tty, log in as root, and restart XDM. Then, X will restart, but before the GDM login screen appears, I get the same error message: "Sorry, something has gone wrong and the system cannot recover. Call a system administrator"

After trying several things, I have a feeling that it is an issue with the Intel i915 driver. A snippet from /var/log/messages

```

Jan 31 16:04:17 rine50 kernel: [ 5887.075138] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Jan 31 16:04:17 rine50 kernel: [ 5887.075145] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Jan 31 16:04:19 rine50 kernel: [ 5888.691015] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Jan 31 16:04:19 rine50 kernel: [ 5888.691326] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!

Jan 31 16:04:19 rine50 kernel: [ 5888.691334] [drm:i915_reset] *ERROR* Failed to reset chip.

```

Browsing in Google, I discovered this error, but in all cases with linux kernel 2.6.3x. I use 3.7.4. A more recent message pointed to a buggy combination of BIOS and hardware on a particular type of Intel motherboard. But in my case, the laptop was fine until now. Am I looking at broken hardware? If so, how to find out? Or can I somehow switch off the DRM and use the "old-style" Gnome instead of the "new-style" Gnome. I have used my laptop also with Xsession (twm) - in this case the error does not occur but I have not used the system long enough with twm to make a definitive conclusion.

Any tips are welcome. Is there a way to "stress test" the GPU?

----------

## BillWho

WvR,

Did you check /sys/kernel/debug/dri/0/i915_error_state   :Question:   Maybe some better clues there.

----------

## WvR

It happened again.... This time I copied the i915_error_state. It does not give much help. It is a very long list of register contents in hexadecimal form.

Just today the intel driver (xf86_video_intel) was updated but apparently that does not help.....

----------

## WvR

I found this thread

http://www.gossamer-threads.com/lists/linux/kernel/1617936

It seems that I am not the only one. I guess I will downgrade to 3.6.11 on the laptop (there is no real reason to use the ~amd64 kernel anyway)

----------

## WvR

Downgrading the kernel to 3.6.11 did not help. Yesterday evening two "crashes" in 10 minutes. The most irritating feature is that you have to restart the computer to solve it. Simply restarting X does not help because somehow the GPU cannot be "reset. Next try: downgrade the intel driver from 2.20.19-r1 to 2.20.13. Wish me luck...

----------

## BillWho

WvR,

Did you add or change any settings in /etc/X11/xorg.conf.d/20-intel.conf   :Question: 

Is DRM_I915 built-in or a module   :Question: 

Have a look at x11-apps/intel-gpu-tools. Maybe some tests can provide a clue.

----------

## WvR

No changes to anything. These problems seem to have started without a clearly identifiable cause. That is one of the reasons why I suspect hardware problems.

I downgraded xf86-video-intel to the stable version. Let's see if this brings any improvement.

----------

## toralf

 *WvR wrote:*   

> It happened again.... This time I copied the i915_error_state. It does not give much help.

 Well, that content is not intended to be readable by a common user. Just file a bug here https://bugzilla.kernel.org and attach the content of that file.

----------

## WvR

Since I downgraded to  x11-drivers/xf86-video-intel v2.20.13 the problem has not returned, so I am declaring it "solved" for the time being.

----------

## thens

Just recently I had this problem as well (while I was watching youtube in fullscreen mode, chromium, 3.8.13-gentoo) => X crashed.

```
Jun  6 21:31:15 think kernel: [108626.012334] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Jun  6 21:31:15 think kernel: [108626.012343] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state

Jun  6 21:31:17 think kernel: [108628.011221] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

Jun  6 21:31:17 think kernel: [108628.011472] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!

Jun  6 21:31:17 think kernel: [108628.011475] [drm:i915_reset] *ERROR* Failed to reset chip.

```

I'm currently trying to "reproduce" this problem but unsuccessful in doing so  :Sad: 

If anyone has an idea, please let me know.

----------

## mhex

today i experienced that too

```

[236610.909321] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung

[236610.909328] [drm:kick_ring] *ERROR* Kicking stuck wait on render ring

```

watching a downloaded mp4 video

http://www.dlr.de/dlr/desktopdefault.aspx/tabid-10081/151_read-7278//year-all/#gallery/11092

mplayer, vlc-player, avidemux all show only a black screen

x11-drivers/xf86-video-intel 2.20.13

Linux xx 3.8.13-gentoo #1 SMP Thu Jun 6 08:10:20 CEST 2013 x86_64 Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz GenuineIntel GNU/Linux

gcc version 4.6.3 (Gentoo 4.6.3 p1.11, pie-0.5.2)

----------

## mhex

Today in dmesg

```

[20190.412669] hda-intel 0000:00:1b.0: Unstable LPIB (65408 >= 8192); disabling LPIB delay counting

```

----------

## mhex

more info from Xorg.log:

```

(EE) [mi] EQ overflow continuing.  400 events have been dropped.

(EE)

(EE) Backtrace:

(EE) 0: /usr/bin/X (xorg_backtrace+0x34) [0x595be4]

(EE) 1: /usr/bin/X (0x400000+0x4fb44) [0x44fb44]

(EE) 2: /usr/bin/X (xf86PostButtonEvent+0xdd) [0x48adcd]

(EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f66e7e4f000+0x63b8) [0x7f66e7e553b8]

(EE) 4: /usr/bin/X (0x400000+0x7a2a7) [0x47a2a7]

(EE) 5: /usr/bin/X (0x400000+0xa5187) [0x4a5187]

(EE) 6: /lib64/libpthread.so.0 (0x3806a00000+0x10bf0) [0x3806a10bf0]

(EE) 7: /lib64/libc.so.6 (ioctl+0x7) [0x3805ee3327]

(EE) 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x31878040e8]

(EE) 9: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x19d10) [0x7f66e8c52d10]

(EE) 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1b537) [0x7f66e8c54537]

(EE) 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1c134) [0x7f66e8c55134]

(EE) 12: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1cbb9) [0x7f66e8c55bb9]

(EE) 13: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x1f2e4) [0x7f66e8c582e4]

(EE) 14: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x218ab) [0x7f66e8c5a8ab]

(EE) 15: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f66e8c39000+0x3b298) [0x7f66e8c74298]

(EE) 16: /usr/bin/X (0x400000+0x12063c) [0x52063c]

(EE) 17: /usr/bin/X (0x400000+0x37bbe) [0x437bbe]

(EE) 18: /usr/bin/X (0x400000+0x3af91) [0x43af91]

(EE) 19: /usr/bin/X (0x400000+0x29b54) [0x429b54]

(EE) 20: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3805e2460d]

(EE) 21: /usr/bin/X (0x400000+0x29e9d) [0x429e9d]

(EE)

[ 33422.706] (EE) intel(0): Detected a hung GPU, disabling acceleration.

[ 33422.706] (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg.

[ 33422.706] [mi] Increasing EQ size to 512 to prevent dropped events.

[ 33422.706] [mi] EQ processing has resumed after 473 dropped events.

[ 33422.706] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.

```

----------

