# [SOLVED - bad RAM] Nvidia Xid errors

## KLarsen

Since around 2 weeks ago I've started getting NVRM Xid errors when gaming. I have a GTX660 running the latest stable nvidia-drivers (396.54). Around the time the errors started I put 2 new RAM sticks in, doubling the amount of memory. 

After a LOT of testing the culprit seems to be the new RAM sticks, when they are installed I get errors, when I take them out I can play games for hours with no errors (might just be a coincidence...). 

However, I get no errors in memtest86, even after running it for over 20 hours. The vendor won't initially RMA the memory without some error in memtest (I'm going to pressure them on this though). 

I've also tried downgrading nvidia-drivers and the kernel, recompiled the nvidia-drivers several times, put the GPU in the other PCI-E socket, shuffled the RAM around... 

The only thing I haven't tried yet is downgrading xorg-server and xorg-drivers, which did get upgraded around the time the errors started happening. I will do this tomorrow. 

These are the errors that appear in the log: 

```
Nov 4 12:35:34 unicorn kernel: NVRM: GPU at PCI:0000:0a:00: GPU-dfde4129-ba3c-74bc-84aa-ea76a1cf90ed

Nov 4 12:35:34 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 69, Class Error: ChId 0058, Class 0000a097, Offset 00002384, Data 40000001, ErrorCode 0000000c

Nov 4 12:57:30 unicorn kernel: NVRM: GPU at PCI:0000:0a:00: GPU-dfde4129-ba3c-74bc-84aa-ea76a1cf90ed

Nov 4 12:57:30 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 69, Class Error: ChId 0030, Class 0000a097, Offset 00001c80, Data 40000000, ErrorCode 0000000c

Nov 4 23:53:41 unicorn kernel: NVRM: GPU at PCI:0000:0a:00: GPU-dfde4129-ba3c-74bc-84aa-ea76a1cf90ed

Nov 4 23:53:41 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 12, COCOD 00000050 beef3901 0000a040 000001b8 1f789000

Nov 5 22:44:50 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 32, Channel ID 00000050 intr 00040000

Nov 5 22:54:26 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 12, COCOD 00000050 beef3901 0000a040 000001b8 2faac600

Nov 5 23:11:48 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 31, Ch 00000050, engmask 00000101, intr 10000000

Nov 6 23:48:26 unicorn kernel: NVRM: GPU at PCI:0000:0a:00: GPU-dfde4129-ba3c-74bc-84aa-ea76a1cf90ed

Nov 6 23:48:26 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 69, Class Error: ChId 0058, Class 0000a097, Offset 00001b00, Data 00004100, ErrorCode 0000000c

Nov 6 23:48:26 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 13, Graphics Exception: EXTRA_MACRO_DATA

Nov 6 23:48:26 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 13, Graphics Exception: ESR 0x404490=0x80000002

Nov 6 23:48:26 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 13, Graphics Exception: ChID 0058, Class 0000a097, Offset 00001b00, Data 00004100

Nov 6 23:48:35 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 12, COCOD 00000058 beef9097 0000a097 00001414 00000000

Nov 6 23:51:10 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 69, Class Error: ChId 0058, Class 0000a097, Offset 00001418, Data 00000004, ErrorCode 0000000c

Nov 7 13:02:52 unicorn kernel: NVRM: GPU at PCI:0000:0a:00: GPU-dfde4129-ba3c-74bc-84aa-ea76a1cf90ed

Nov 7 13:02:52 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 12, COCOD 00000038 beef3901 0000a040 000001b8 ffffffff

Nov 7 13:06:37 unicorn kernel: NVRM: Xid (PCI:0000:0a:00): 12, COCOD 00000038 beef3901 0000a040 000001b8 ffffffff

```

I usually get Xid 69, which according to https://docs.nvidia.com/deploy/xid-errors/index.html is either a hardware error or driver error. None of these errors point to a RAM problem. 

I can run Unigine benchmark through several passes without errors, only Steam games give me problems. Also, once in a while the KDE/Plasma compositor stops unexpectedly (no errors in the log though). In general the system is totally stable, running 24/7 and I can reliantly compile with no errors. 

So, can anyone help me and suggest something else to try and pinpoint the problem? Is the GPU going bad? Is the system RAM really the culprit? Any help would be much appreciated.Last edited by KLarsen on Sat Nov 17, 2018 12:40 pm; edited 1 time in total

----------

## bunder

I see you tried reseating the card...  are you overclocking the card at all?  How good is your case cooling?  Power supply rails?

----------

## KLarsen

The card is factory overclocked. 

Cooling should be good, neither the GPU nor the CPU gets above 60°C with the case closed. Opening the case, I still get errors. 

I do have another PSU I can check, I'll do so tomorrow.

----------

## KLarsen

I finally got errors in memtest86, I left it overnight for the third time and this morning it had found 64 errors. Time for RMA.

----------

