# [Solved]nvidia-drivers freezing system with GTX 650

## RichardGv

Update: After I moved my memory sticks to two other slots, the problem haven't appeared in two weeks. Presumably, solved. Thanks to ville.aakko for the suggestion!

Environment:

Gentoo ~amd64, pf-sources-3.15_p2 (Unsupported kernel, but is this problem actually related to it?)

GTX 650

gcc-4.8.3[hardened]

Update:: The problem occurs with vanilla-sources-3.15.6 compiled with vanilla GCC_SPECS as well.

Problem:

After starting X, the system often freezes, so frequently that makes X unusable. Usually at first, all sudden everything displayed on the X screen is struck, then the content on the screen is sometimes updated for a few times, very slowly, then it gets entirely struck. Cursor sometimes still works but sometimes it doesn't. Alt-SysRq keys sometimes works and sometimes doesn't. Ctrl-Alt-F{1..6} almost never works.

The card is GTX 650. I'm using nvidia-drivers-340.24 primarily, but downgrading to 337.25 doesn't help. When the kernel got struck, I would almost always find several lines related to nvidia-drivers in kern.log:

```

...

Jul 17 12:44:35 work kernel: [  110.402991] nvidia 0000:01:00.0: irq 45 for MSI/MSI-X

Jul 17 12:45:07 work kernel: [  142.934610] NVRM: GPU at PCI:0000:01:00: GPU-8328b9fe-45bc-4f30-da18-90e5aaf0cd08

Jul 17 12:45:07 work kernel: [  142.934614] NVRM: Xid (PCI:0000:01:00): 62, 12b2(1f80) 00000000 00000000

Jul 17 12:45:09 work kernel: [  144.951220] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:11 work kernel: [  146.952926] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:13 work kernel: [  148.954704] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:15 work kernel: [  150.956411] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:21 work kernel: [  156.967407] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:23 work kernel: [  158.969119] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:25 work kernel: [  160.976721] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:27 work kernel: [  162.978427] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:29 work kernel: [  164.980209] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:31 work kernel: [  166.981912] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:37 work kernel: [  172.990508] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:39 work kernel: [  174.992218] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:45 work kernel: [  180.998284] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:47 work kernel: [  182.999995] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:45:48 work kernel: [  183.540095] SysRq : Keyboard mode set to system default

Jul 17 12:45:48 work kernel: [  184.036512] SysRq : Terminate All Tasks

...

Jul 17 12:57:31 work kernel: [  184.337094] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>

Jul 17 12:57:53 work kernel: [  206.344156] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:57:55 work kernel: [  208.346218] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:58:04 work kernel: [  217.223316] NVRM: Xid (PCI:0000:01:00): 69, Class Error: ChId 0003, Class 0000a097, Offset 00000800, Data 2001054e, ErrorCode 0000000c

Jul 17 12:58:06 work kernel: [  219.226812] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 12:58:10 work kernel: [  223.233080] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

...

Jul 17 13:09:35 work kernel: [  474.369103] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>

Jul 17 13:09:58 work kernel: [  497.623375] NVRM: Xid (PCI:0000:01:00): 62, 12b2(221c) 04008bb9 20400148

Jul 17 13:10:11 work kernel: [  510.361115] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 13:10:13 work kernel: [  512.362830] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 13:10:24 work kernel: [  523.372183] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 13:10:36 work kernel: [  535.382402] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 13:10:48 work kernel: [  547.392619] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 13:11:01 work kernel: [  560.403687] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 13:11:14 work kernel: [  573.414756] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

...

Jul 17 14:43:17 work kernel: [ 5416.967807] EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: (null)

Jul 17 14:43:17 work kernel: [ 5417.055717] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)

Jul 17 14:44:27 work kernel: [ 5486.704690] NVRM: GPU at PCI:0000:01:00: GPU-8328b9fe-45bc-4f30-da18-90e5aaf0cd08

Jul 17 14:44:27 work kernel: [ 5486.704694] NVRM: Xid (PCI:0000:01:00): 62, 12b2(1fb4) 00000000 00000000

Jul 17 14:44:30 work kernel: [ 5489.316981] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 14:44:32 work kernel: [ 5491.318693] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 14:44:42 work kernel: [ 5501.364497] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 14:44:54 work kernel: [ 5513.404740] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 14:45:06 work kernel: [ 5525.454992] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 14:45:10 work kernel: [ 5529.549799] SysRq : Keyboard mode set to system default

Jul 17 14:45:11 work kernel: [ 5530.374494] SysRq : Terminate All Tasks

...

Jul 17 15:22:32 work kernel: [  190.377068] tun: Universal TUN/TAP device driver, 1.6

Jul 17 15:22:32 work kernel: [  190.377071] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>

Jul 17 15:23:08 work kernel: [  225.767801] NVRM: GPU at PCI:0000:01:00: GPU-8328b9fe-45bc-4f30-da18-90e5aaf0cd08

Jul 17 15:23:08 work kernel: [  225.767804] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 00040000

Jul 17 15:23:08 work kernel: [  225.799997] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 00040000

Jul 17 15:23:08 work kernel: [  225.800071] NVRM: Xid (PCI:0000:01:00): 69, Class Error: ChId 0001, Class 0000a097, Offset 00002384, Data 42ba0000, ErrorCode 0000000c

Jul 17 15:24:39 work kernel: [  317.113616] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 00040000

Jul 17 15:24:39 work kernel: [  317.113785] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 00040000

Jul 17 15:26:34 work kernel: [  432.383168] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 15:26:36 work kernel: [  434.384887] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 15:26:46 work kernel: [  444.423398] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 15:26:58 work kernel: [  456.463653] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 17 15:27:03 work kernel: [  461.062344] SysRq : Keyboard mode set to system default

Jul 17 15:27:03 work kernel: [  461.574772] SysRq : Terminate All Tasks

```

The exact moment the freeze occurs is rather unpredictable, but seemingly it might have some relationship to higher CPU/GPU load. It happens more frequently with Firefox or Chromium running, and a compositor (X Render or GLX) could trigger the freeze as well. But there isn't a predictable pattern I could find. And, I have been using the driver version for quite a period of time, but the freeze only become so frequent since July 16th.

The problem doesn't occur with my GTX 670 at home, using the same version of nvidia-drivers. However, both my Gentoo on hard drive and portable HD exhibit the same behavior with GTX 650.

I found quite a few similar issues on Google, but I haven't found a working resolution so far...

Things I've tried (and didn't work):

Changed to VGA (disable framebuffer things by using legacy boot and unset gfxpayload in grub). No changes.

I added 'Option "Accel" "false"' to xorg.conf. I then got a black screen on X.

I added 'Option "RenderAccel" "false"' to xorg.conf. No changes.

I recompiled the kernel, disabled DRM and agpgart totally, dropped the modules from /lib/modules, recompiled nvidia-drivers. No changes.

I enabled "Prefer Maximum Performance" in PowerMizer settings of nvidia-settings. No changes.

I removed ~/.nvidia-settings-rc, then re-run nvidia-settings to recreate it. No changes.

I switched to nouveau. Okay, this one worked, but FPS of glxgears dropped to 1/6 of the original value.

Additional info:

Kernel configuration of my Gentoo on portable HD: https://gist.github.com/richardgv/dab3abfe0fc86feec16a

Kernel log: https://dl.dropboxusercontent.com/u/283669/stc/nvidia-freeze-issue-kern.log.xzLast edited by RichardGv on Thu Aug 07, 2014 2:22 pm; edited 3 times in total

----------

## krinn

 *RichardGv wrote:*   

> 
> 
> Gentoo ~amd64, pf-sources-3.15_p2 (Unsupported kernel, but is this problem actually related to it?)
> 
> 

 

I think your answer is self contain there, just build a vanilla kernel and you'll get the answer.

note: i'm not asking you to not use pf-sources... build vanilla, run, test, answer given. Get back to pf-sources or not, but you know if you need to dig more kernel or not now.

----------

## RichardGv

 *krinn wrote:*   

> 
> 
> I think your answer is self contain there, just build a vanilla kernel and you'll get the answer.
> 
> note: i'm not asking you to not use pf-sources... build vanilla, run, test, answer given. Get back to pf-sources or not, but you know if you need to dig more kernel or not now.

 

Oh, I see. I just tried vanilla-sources-3.15.6, compiled with vanilla GCC_SPECS. Still freezes, but today the frequency is reduced. There's a higher chance that it works very very slowly instead of total freezing, and Ctrl-Alt-F{1..6} sometimes works.

Kernel log today:

```

...

Jul 21 10:05:43 work kernel: [  211.919300] NVRM: GPU at PCI:0000:01:00: GPU-8328b9fe-45bc-4f30-da18-90e5aaf0cd08

Jul 21 10:05:43 work kernel: [  211.919305] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 00040000

Jul 21 10:05:43 work kernel: [  211.919493] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000001 intr 00040000

Jul 21 10:05:56 work kernel: [  225.219547] NVRM: Xid (PCI:0000:01:00): 69, Class Error: ChId 0001, Class 0000902d, Offset 00000220, Data 1000f010, ErrorCode 0000000c

Jul 21 10:28:29 work kernel: [ 1578.743452] ereala-voice-ch[8468]: segfault at 7fff5f06bcc0 ip 00007f964e5096b7 sp 00007fff5f06bcc0 error 6 in libopus.so.0.5.0[7f964e4d0000+4c000]

Jul 21 10:29:23 work kernel: [ 1633.468704] ereala-voice-ch[8498]: segfault at 7fff4414c8f0 ip 00007f483b3b66b7 sp 00007fff4414c8f0 error 6 in libopus.so.0.5.0[7f483b37d000+4c000]

Jul 21 10:48:18 work kernel: [ 2769.074937] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000003 intr 00040000

Jul 21 10:48:18 work kernel: [ 2769.074999] NVRM: Xid (PCI:0000:01:00): 32, Channel ID 00000003 intr 00200000

Jul 21 10:48:18 work kernel: [ 2769.075039] NVRM: Xid (PCI:0000:01:00): 69, Class Error: ChId 0003, Class 0000a097, Offset 0000081c, Data 00000001, ErrorCode 00000004

```

----------

## ville.aakko

Hi,

I had a very similar issue (don't remember if I had the syslog messages, but the symptoms were exactly the same). I also had a lot of black windows if I enabled compositing.

EDIT: I have a GTX 660 Ti PE

The funny thing is, I'm not sure what fixed it (I don't have it anymore). I though I might have been some poorly inserted RAM (but I could compile away / do anyghin if I didn't use the graphics card); but OTOH, the problems started after an upgrade, before which the last one had been a long while ago. 

I had some packaged at @preserved-rebuild that kept re-compiling / re-listing, and did a revdep-rebuild (which did find something unrelated, but still rebuilt something); IIRC I upgraded kernel, and rebuild some system packaged (glibc or similar), and the problem went away! I was quite frustrated and did several things at the same time. I know, not the right way of fixing things - I used a bash-root-hammer   :Very Happy: 

My guess is, that some library had a bug / incompatibility with the nvidia-drivers, or some (system) library is compiled against different version of some other library and portage does not notice it for some reason. Try running revdep-rebuild.

Cheers!

----------

## Randy Andy

Hi Folks,

I have had similar trouble also, but only with my better Nvidia-Cards, so I came to the following conclusion: The better/performant the Nvidia hardware is, the worse is the nvidia-driver.

I never had this trouble with my low cost Nvidia consumer cards before, but with my Quadro FX 4800, Tesla chipset (not Keppler as yours).

It works relatively well with the nouveau driver, but I missed some important features and that was the reason for me to search long time for a working proprietary driver.

The only well working nvidia-driver for this card is the so called legacy series, which is actually the version ~304.123 (supports 1.16 xorg-server now) or the stable one +304.121, up to xorg 1.15.

So try one of this versions to get rid of your problems, hopefully.   :Wink: 

Much success, Andy.

----------

## pa1983

Randy Andy.

Tesla series GPU's are no longer supported. Nvidia dropped support not long ago. So legacy drivers is the only way in your case.

----------

## Randy Andy

 *pa1983 wrote:*   

> Randy Andy.
> 
> Tesla series GPU's are no longer supported. Nvidia dropped support not long ago. So legacy drivers is the only way in your case.

 

pa1983,

I have a  GT200GL [Quadro FX 4800] and as you can see here, your statement is not fully correct:

http://nvidia.custhelp.com/app/answers/detail/a_id/3142/kw/Tesla%20support/session/L3RpbWUvMTQwNTk0MjgxNy9zaWQvZHFtOVpRWmw%3D

But that's not the point of discussion here. 

The hint I tried to gave here for RichardGv is, although the nvidia-drivers exist for a long time in newer versions, than the 304-series, the bigger versions above doesn't contain a flawless working version for me, and I tried them all to find a working one for this specific graphic card and that's a really frustrating experience I'd never had before with my cheaper Nvidia cards.

Eventually the situation with your card/driver combination is similar to mine, although it's a different type of card and chip set.

So give it a shot and much luck with it.

Andy.

----------

## RichardGv

Summary of the new methods I've tried and their outcomes:

Compile pf-sources-3.15_p4 with a new configuration modified from Arch Linux .config. Still freezes.

Downgrade to nvidia-drivers-304.123. Still freezes. Log is provided below.

Move my 2 memory sticks to other slots. I'm still testing. No freeze so far.

By the way, the GPU temperature is moderately low.

@ville.aakko:

 *ville.aakko wrote:*   

> 
> 
> I also had a lot of black windows if I enabled compositing.
> 
> 

 

Oh, I haven't encountered the issue with compton (a compositor) and nvidia-drivers. (I started using them since almost 2 years ago.)

 *ville.aakko wrote:*   

> 
> 
> The funny thing is, I'm not sure what fixed it (I don't have it anymore). I though I might have been some poorly inserted RAM (but I could compile away / do anyghin if I didn't use the graphics card);
> 
> 

 

Oh, thanks for the tip. I just moved my 2 memory sticks to other slots. Let's see if it changes anything.

I've also tried memtest86+ for one pass and there's no errors. mcelog doesn't log anything related to memory. Seemingly EDAC is not supported on my box.

 *ville.aakko wrote:*   

> 
> 
> I had some packaged at @preserved-rebuild that kept re-compiling / re-listing, and did a revdep-rebuild (which did find something unrelated, but still rebuilt something); IIRC I upgraded kernel, and rebuild some system packaged (glibc or similar), and the problem went away! I was quite frustrated and did several things at the same time. I know, not the right way of fixing things - I used a bash-root-hammer  
> 
> My guess is, that some library had a bug / incompatibility with the nvidia-drivers, or some (system) library is compiled against different version of some other library and portage does not notice it for some reason. Try running revdep-rebuild.
> ...

 

Oh, thanks for the tip. Unfortunately, I tried running revdep-rebuild and it only rebuilt amule. Now it couldn't find anything broken.

@Randy Andy:

 *Randy Andy wrote:*   

> 
> 
> I have had similar trouble also, but only with my better Nvidia-Cards, so I came to the following conclusion: The better/performant the Nvidia hardware is, the worse is the nvidia-driver.
> 
> I never had this trouble with my low cost Nvidia consumer cards before, but with my Quadro FX 4800, Tesla chipset (not Keppler as yours).
> ...

 

That's an interesting conclusion.  :Very Happy:  Indeed it doesn't happen on my more expensive GTX 670 at home, though.

 *Randy Andy wrote:*   

> 
> 
> The only well working nvidia-driver for this card is the so called legacy series, which is actually the version ~304.123 (supports 1.16 xorg-server now) or the stable one +304.121, up to xorg 1.15.
> 
> So try one of this versions to get rid of your problems, hopefully.  
> ...

 

Thanks for the trick. It didn't help, sadly enough. The log is here:

```

...

Jul 23 10:52:36 work kernel: [   17.088503] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead.

Jul 23 10:52:37 work kernel: [   17.850121] u32 classifier

Jul 23 10:52:37 work kernel: [   17.850124]     input device check on

Jul 23 10:52:37 work kernel: [   17.850125]     Actions configured

Jul 23 10:53:47 work kernel: [   88.269290] EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: (null)

Jul 23 10:57:15 work kernel: [  296.365433] nvidia: module license 'NVIDIA' taints kernel.

Jul 23 10:57:15 work kernel: [  296.365436] Disabling lock debugging due to kernel taint

Jul 23 10:57:15 work kernel: [  296.370614] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none

Jul 23 10:57:15 work kernel: [  296.370675] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  304.123  Wed Jul  2 10:59:22 PDT 2014

Jul 23 10:57:16 work kernel: [  296.855101] NVRM: Your system is not currently configured to drive a VGA console

Jul 23 10:57:16 work kernel: [  296.855112] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver

Jul 23 10:57:16 work kernel: [  296.855113] NVRM: requires the use of a text-mode VGA console. Use of other console

Jul 23 10:57:16 work kernel: [  296.855123] NVRM: drivers including, but not limited to, vesafb, may result in

Jul 23 10:57:16 work kernel: [  296.855124] NVRM: corruption and stability problems, and is not supported.

Jul 23 11:02:06 work kernel: [  587.800750] fuse init (API version 7.23)

Jul 23 11:05:56 work kernel: [  817.091466] NVRM: GPU at PCI:0000:01:00: GPU-8328b9fe-45bc-4f30-da18-90e5aaf0cd08

Jul 23 11:05:56 work kernel: [  817.091470] NVRM: Xid (PCI:0000:01:00): 59, 0084(1754) 04009369 10002b68

Jul 23 11:05:58 work kernel: [  819.536961] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:00 work kernel: [  821.538670] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:12 work kernel: [  833.547041] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:14 work kernel: [  835.548757] NVRM: Xid (PCI:0000:01:00): 8, Channel 00000001

Jul 23 11:06:14 work kernel: [  835.548783] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:16 work kernel: [  837.550720] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:18 work kernel: [  839.552437] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:20 work kernel: [  841.557167] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:22 work kernel: [  843.558880] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:24 work kernel: [  845.560857] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:26 work kernel: [  847.562567] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:28 work kernel: [  849.564378] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:32 work kernel: [  853.568229] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:34 work kernel: [  855.569938] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:06:57 work kernel: [  878.585366] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:07:01 work kernel: [  882.588903] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:07:03 work kernel: [  884.590614] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

Jul 23 11:07:13 work kernel: [  894.598975] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

```

----------

## programmist11180

Hello, comrades.

I have similar problem (on Debian, not Gentoo).

```

kernel: [  198.888070] NVRM: GPU at PCI:0000:04:00: GPU-97

kernel: [  198.888075] NVRM: Xid (PCI:0000:04:00): 6, PE0001 

kernel: [  198.953293] NVRM: Xid (PCI:0000:04:00): 69, Class Error: ChId 0001, Class 0000502d, Offset 00000250, Data 00007f60, ErrorCode 0000000c

```

If you have installed acpid, try to remove it. It can solve the problem.

----------

## krinn

people report xid are hardware error, many just from heat but some cause by bad hardware part.

At least try this little script, it will do wonder for your debug : https://code.google.com/p/nvidia-fanspeed/

(you'll get temp and can set fan throttle base on temp, so if it freeze you will see if it has frozen at a certain temp...)

----------

## programmist11180

Xid errors documentation http://docs.nvidia.com/deploy/xid-errors/index.html

----------

## shazeal

 *Quote:*   

> Changed to VGA (disable framebuffer things by using legacy boot and unset gfxpayload in grub). No changes. 

 

Did you actually disable all the framebuffer stuff in the kernel itself? Non UEFI just keep Support for framebuffer devices enabled, everything inside that should be disabled. UEFI you need the EFI Framebuffer support enabled as well.

I am using 340.24 with zero problems at the moment on 760 GTX, using UEFI boot.

Things that broke stuff for me.

- Having any framebuffer enabled in kernel caused KP/Lockups

- Having any DRM enabled in kernel cause Xorg lockups.

- Xorg 1.16 breaks framebuffers after Xorg is booted.

I have used, vanilla kernel 3.12.26 patched with BFS/BFQ. Vanilla kernel 3.16.0 patched with BFQ/GCC optimizations. Zero issues with either.

----------

## RichardGv

Thanks for the new suggestions! The good thing is I have not been able to reproduce the issue since July 23rd, with neither pf-sources-3.15_p4 nor the new gentoo-sources-3.16.0, so it should be pretty safe to say the problem is solved for me -- at least right now. (I moved to gentoo-source after I found uksm bringing kernel freeze and some random kernel errors.) I'm not completely sure if it's related to my moving of memory sticks, though, since the problem mysteriously disappeared once beforehand as well. Thanks again for the advice from ville.aakko!

 *krinn wrote:*   

> people report xid are hardware error, many just from heat but some cause by bad hardware part.
> 
> At least try this little script, it will do wonder for your debug : https://code.google.com/p/nvidia-fanspeed/
> 
> (you'll get temp and can set fan throttle base on temp, so if it freeze you will see if it has frozen at a certain temp...)

 

I don't know the GPU temperature when it freezes, but I have conky displaying GPU temperature and never remember it getting too high: Usually it stays at 30'C - 45'C.

 *shazeal wrote:*   

> 
> 
> Did you actually disable all the framebuffer stuff in the kernel itself? Non UEFI just keep Support for framebuffer devices enabled, everything inside that should be disabled. UEFI you need the EFI Framebuffer support enabled as well.
> 
> I am using 340.24 with zero problems at the moment on 760 GTX, using UEFI boot.
> ...

 

Nope, I didn't disable framebuffer from kernel configuration, only from GRUB. Actually I have been using nvidia-drivers with efifb for two years without other problems except the warning...

I've also tried disabling DRM from kernel. Didn't help at the time.

And I was using xorg-server-1.15.1.

 *programmist11180 wrote:*   

> Xid errors documentation http://docs.nvidia.com/deploy/xid-errors/index.html

 

Oh, thanks! I didn't know there is an documentation about that! But why I was getting some weird 62 ("Internal micro-controller halt") and 69 (unlisted) errors...

----------

## F1r31c3r

This has been happening on and off for the past few months after an update came in. 

I can not trace down exactly the culprit. Someone changed something to cause the issue.

When i get chance i am going to try and roll back the kernel then see what happens. It does not do this all the time so it is not frequently repeatable from what i can see but it sure as hell does happen at totally off the mark times. 

As is usually the case, something got a bug fix and most likely the nvidia drivers did not get updated to the bugfix. Finding it is not easy and while nvidia drivers are closed source it makes it even harder.

Puked out messages for interested parties...

 *Quote:*   

> [21648.029859] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

 

 *Quote:*   

> [21662.092703] NVRM: Xid (PCI:0000:82:00): 32, Channel ID 00000003 intr 00004000

 

 *Quote:*   

> [21662.093066] NVRM: Xid (PCI:0000:82:00): 32, Channel ID 00000003 intr 00004000

 

I have found countless problems or issues with the Nvidia audio interface on their graphics cards for the HDMI Audio. It misbehaves allot.

We had a update of the GCC compiler a few months back so i currently have some compiles still built against the previous GCC version while others including my kernel built using the new GCC version. I mention this as it has been known to cause issues before.

Maybe a rebuild of the Xserver and or its dependencies may fix it.

Nvidia provide a XID page for debugginghttp://docs.nvidia.com/deploy/xid-errors/index.html#topic_5_3.

----------

## RichardGv

I have never spot the problem again since July 23, 2014. It just disappeared after I moved the memory sticks -- or maybe it's the weather or something else. Still have no idea what is causing the issue. I'm upgrading the kernel and the drivers normally.

 *F1r31c3r wrote:*   

> 
> 
> As is usually the case, something got a bug fix and most likely the nvidia drivers did not get updated to the bugfix. Finding it is not easy and while nvidia drivers are closed source it makes it even harder.
> 
> 

 

Yeah, binary blobs suck in this way. (But nVidia does provide better OpenGL support than other drivers, and is doing way better than fglrx.)

 *F1r31c3r wrote:*   

> 
> 
> I have found countless problems or issues with the Nvidia audio interface on their graphics cards for the HDMI Audio. It misbehaves allot.
> 
> 

 

Huh? Does nvidia-drivers take care of the HDMI audio? I thought snd-hda-intel and hda-codec-hdmi manage it.

----------

## F1r31c3r

 *RichardGv wrote:*   

> I have never spot the problem again since July 23, 2014. It just disappeared after I moved the memory sticks -- or maybe it's the weather or something else. Still have no idea what is causing the issue. I'm upgrading the kernel and the drivers normally.
> 
>  *F1r31c3r wrote:*   
> 
> As is usually the case, something got a bug fix and most likely the nvidia drivers did not get updated to the bugfix. Finding it is not easy and while nvidia drivers are closed source it makes it even harder.
> ...

 

Yes HDA Intel driver in the kernel handles it. The Nvidia Driver detects if it is enabled in the kernel or not upon installation. That seems to be the problem as some games on steam(binary drm stuff again) try to use the HDMI Audio instead of the systems M-Board audio. I also have Intel HD audio on my M-Board so it took some tricks to force the default audio device.

I am more inclined to think that the XID fault has something to do with the PCIe clock timings. I updated my BIOS and it seemed to have gone away. I found i could recreate it constantly when loading up Metro Last Light Redux through Steam. After the BIOS update it now does not error but i still have some instability issues with the graphics card. These stability issues go away when i force the card into performance mode in the nvidia-settings. 

That could indicate it faults when the gfx card switches speeds from 2.5T to 5T bus speed or something to do with the ramp up clocking as it moves from low clock speed power save mode to performance modes i.e. PCIe v2 speed to PCIe v3 speed etc. My BIOS allows me to force the PCIe bus speed so if it comes back i will attempt forcing the PCIe modes and re-test

It is not a temperature fault, as i monitored that not only with a kde plasmoid monitor but a IR temperature gun. they were about 5'c variation between the reported values.Any temperature above 75'c is a issue and i was well in the 50-60'c range.

----------

## F1r31c3r

So I changed the preempt model from low latency desktop(forced preempt) to desktop (Voluntary preempt) and it would seem that the errors have gone away for now.

Considering the error message said 'atomic or interrupt context' this would make some sort of sense at least. 

Usually with graphics card binary drivers they never install or work with anything less than low latency forced preempt. For those that don't know, the preempt is the way the kernel deals with scheduled processes. 

we shall see in the near future how and if it is any better.

UPDATE:

The yield CPU errors crash kwin so i turned of 'Suspend 3D effects when apps in full screen' and dropped the OpenGL 3.1 down to OpenGL 2.0 to test, further seems to be more stable. My idea was that when exiting a graphics demanding app kwin tries to re-enable the 3D effects and causes problems. In this case with 'suspending 3D effects for full screen applications' disabled it should stop kwin from trying to yield the CPU at that specific time. 

Well that is the theory anyway. At least voluntary preempt helped in recovering from this error rather than locking everything up and sending the whole screen corrupted.

If it happens again i shall compile kwin with debug and run it to try and get more output see what is going on. 

Anyone got any other feedback feel free to post it...  :Laughing: 

----------

## gentoorockerfr

same problem here with 3.19-pf kernel only!

gentoo64 nvidia gtx 650 

I will try to change ram positions.I have all positions with memory(4)

----------

## F1r31c3r

 *gentoorockerfr wrote:*   

> same problem here with 3.19-pf kernel only!
> 
> gentoo64 nvidia gtx 650 
> 
> I will try to change ram positions.I have all positions with memory(4)

 

I eventually got to the bottom of my problem with this error. It turned out to be a faulty graphics card. I upgraded to a gtx 970 and problem was solved. 

Of course goes without saying that you need be 100 % sure it's the card not your system. In my case it was the memory fault on the graphics card rather than my system so test everything. The 600 and 700 series seem to have this problem allot from what I can gather. I think it is caused by the card overheating and it's power management. 

As is always the case with gfx vendors they never admit design faults and just try to work around them hoping most won't notice.

----------

