# No nvidia-driver works for me anymore

## CrazyTerabyte

I've described everything in details at my blog:

http://my.opera.com/CrazyTerabyte/blog/2009/02/28/dont-update-your-nvidia-drivers

I'm documenting everything in my blog because then I can easily show it whenever needed (in IRC, forums, or even nvidia bug report e-mail).

Summary:

I've updated from 2.6.25.x to 2.6.27.10. Then, I was forced to update nvidia-drivers from 169.12 (which used to work for me) to any newer version. But I didn't found any new version that works for me, all of them have the same "bug".

Please, take a look at that blog post for more details (including nvidia-bug-report.log and kernel .config). I'm posting this message in the forum hoping that someone might have a solution, or maybe other people might have the same problem.

----------

## poly_poly-man

Step 1: get to a kernel and stick there. Your drivers probably won't work next boot.

Step 2: emerge nvidia-drivers.

Also, don't use 180.35, as these seem to give everyone trouble.

I never have issues upgrading either of those packages.

----------

## doctork

Since June 2008, I have used

```

gentoo-sources: 2.6.24-r8 thru 2.6.28-r2

Nvidia drivers: 1.69.09-r1 thru 180.29  (180.35 was broken for me)

xorg-server: 1.3.0.0-r5 thru 1.5.3-r2

xorg-x11: 7.2 thru 7.4

kde: 3.5.9 thru 4.2.0
```

In that time the only problem I recall with nvidia was the kde4/180.35 issue mentioned above.

The hardware involved is:

```

amd phenom with 9600 GT

2x amd X2 with onboard 6150

Thinkpad t61 with Quadro NVS 140m

(all use amd64 with ~amd64 as needed)

```

--

doc

----------

## CrazyTerabyte

 *poly_poly-man wrote:*   

> Step 1: get to a kernel and stick there. Your drivers probably won't work next boot.
> 
> Step 2: emerge nvidia-drivers.

 

I might go back to 2.6.25.x kernel, but even then, the 173.14.09 driver did not work for me. I've posted about that here:

http://my.opera.com/CrazyTerabyte/blog/2008/07/04/nvidia-driver-strikes-back-yet-another-time

For that reason that I went back to 169.12.

I might go back to 2.6.35.x with 169.12 (which was removed from portage, but I can keep it on a local overlay). But, then, whenever I need to update my kernel, this issue will come back, as newer nvidia drivers don't work.

Could it be something related to kernel options? Could someone try the .config I've attached and see if it has issues with your hardware? Also, attach here your .config so we can compare them and try to track down the issue.

----------

## NeddySeagoon

CrazyTerabyte,

Your blog seems to be down for me.

To resolve this, we need your lspci output, your kernel .config file and perhaps your /var/log/Xorg.0.log file after a failed attempt to use nvidia-drivers.

lspci will tell us when you need (and what you must not have) in the kernel

the .config will tell what you have.

the log will tell why Xorg failed

----------

## CrazyTerabyte

I can open my blog here. Maybe sometimes it can fail, but usually is back a few minutes later. Can you try again?

The nvidia-bug-report.log already includes Xorg.0.log and xorg.conf. Just grab it at my blog. I've also attached the kernel .config there. (if you still can't access the blog, send me a private message)

Personally, I don't see anything wrong at Xorg.0.log. The actual reason is at /var/log/messages (or dmesg):

```
NVRM: Xid (0001:00): 13, 0001 00000000 0000502d 00000280 fffffc18 00000004
```

Here is the lspci:

```
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 03)

00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 03)

00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03)

00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03)

00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03)

00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)

00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03)

00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)

00:1c.2 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 3 (rev 03)

00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 03)

00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 03)

00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 03)

00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03)

00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03)

00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03)

00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03)

00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3)

00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller (rev 03)

00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03)

00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)

01:00.0 VGA compatible controller: nVidia Corporation GeForce 9500M GS (rev a1)

02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet Adapter (rev b0)

03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)

09:01.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)

09:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)

09:01.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 12)

09:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)

09:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12)
```

----------

## NeddySeagoon

CrazyTerabyte,

I've read your email, I still can't get to you blog.

A few things stand out. 

1) I would prefer the see your graphics card on an IRQ by itself. You don't have much control over that in a laptop but you can try playing with IRQ leveling in the kernel.

2) Your Xorg.conf has some unsupported and new features enabled. Does turning them off help ?

Specifically, 

```
 Option     "Coolbits"     "1"

 Option     "AllowGLXWithComposite" "true"

 Option     "AddARGBGLXVisuals" "true"

Section "Extensions" 

   Option "Composite" "Enable"  # For Beryl support

EndSection
```

Your Section "DRI" is not needed but harmless.

If turning off those options helps, turn them on one at a time until something breaks.

Coolbits allows overclocking ... so if you are overclocking the GPU, please don't.

----------

## CrazyTerabyte

 *NeddySeagoon wrote:*   

> 1) I would prefer the see your graphics card on an IRQ by itself. You don't have much control over that in a laptop but you can try playing with IRQ leveling in the kernel.

 

I haven't checked IRQ settings at the BIOS. Probably they are all "auto". And I have no idea about how to mess with IRQs on kernel.

 *NeddySeagoon wrote:*   

> 2) Your Xorg.conf has some unsupported and new features enabled. Does turning them off help ?

 

```
Section "Extensions" 

   Option "Composite" "Enable"  # For Beryl support

EndSection
```

Removing "Composite" extension breaks the 3D window manager, as it won't start. So I must leave it there.

```
 Option     "Coolbits"     "1"

 Option     "AllowGLXWithComposite" "true"

 Option     "AddARGBGLXVisuals" "true"
```

Although I had "Coolbits" enabled, I've never ever overclocked the GPU.

So I tried disabling the 3 options above. Still the same crash/freezing.

I personally think the issue is a bug within nvidia driver, maybe a bug triggered by a combination of my hardware, my kernel, and the kernel options.Last edited by CrazyTerabyte on Wed Mar 04, 2009 8:03 am; edited 1 time in total

----------

## CrazyTerabyte

I've just tested 2.6.28.7 kernel with nvidia-drivers-180.29. It also fails. Exactly the same way as all others.

I've also posted this issue at nV News Forums. Hey, NeddySeagoon, I'm not ignoring your help. I just feel that this bug is so tricky that more people should try to help (and people at that forum must be used to all quirks of nvidia-drivers). In addition, I think people from that forum should also be aware of this bug. Please, I don't want to be rude and I don't want to upset anyone!

----------

## NeddySeagoon

CrazyTerabyte,

I'm not upset in the least ... what  offered was not a fix, just some ways of narrowing down the problem.

I suspect the fix will have to come from nVidia in the form of a driver update. 

Do share the answer if/when you get one. Thats what makes the community work.

----------

## doctork

As opposed to Neddy, I suspect the fix will come when CrazyTerabyte switches to the latest version of xorg.  Rumor has it that the stabilization of 1.5.3* will occur "soon."  That in combination with hal makes xorg configuration essentially automatic.

CrazyTerabyte -- I'm curious as to why you're running vanilla sources as opposed to the Gentoo versions.  Your Asus system isn't much different from my Thinkpad T61.  The only differences I could see were the video card (mine is an NVS 140) and the ethernet card ( mine's an Intel).  I've had no problems with the T61.  Currently running 2.6.28-gentoo-r2, nvidia-drivers-180.29, and kde4-2.0.  Things like this are automatically added on x startup:  

```

(II) NVIDIA(0): Support for GLX with the Damage and Composite X extensions is

(II) NVIDIA(0):     enabled.
```

--

doc

----------

## Link31

 *CrazyTerabyte wrote:*   

> In addition, I think people from that forum should also be aware of this bug.

 

At least you are not alone, I have exactly the same problem.

I used to do regular updates of both the kernel (vanilla) and the nvidia drivers. But last week, the whole system suddenly crashed with little colored squares all over the screen. No 3D app was running at this moment, although I played a game some hours ago. Worse, the graphic card refused to wake up after a reboot and the screen stayed blank (only the graphic card was not working: Gentoo booted up fine since I was able to log in with SSH). I had to unplug the power and the battery and wait several hours before getting it to work again. Since this day, I experienced a lot of similar crashes.

When it crashed for the first time, I was running a 2.6.28.7 kernel with driver 180.22, xorg 1.3.0.0-r6. Both were pretty stable for several weeks before the first problem.

Downgrading to 177.82 or upgrading to 180.35 with the same kernel did not change anything. Thanks to your blog, I found that 2.6.25 with 169.12 seems not affected. At least I can use my computer and play games. But performances are really crappy with KDE4 and downgrading the kernel broke a few other things.

I thought it was a hardware problem, but I was not able to crash the graphic card in any way from windows (running the same games). So it definitely looks like a driver issue.

What is very strange is that driver versions that used to work fine (177.* and the first 180.*) do not work anymore. Did the lastest nvidia blob damaged the hardware?

Since the 169.12 seems to be the last driver that works, I'd like to find how to compile it with recent kernels (>= 2.6.27). 2.6.25 is a bit old, and I haven't checked yet what the kernel downgrade broke on my laptop but I'm pretty sure that there will be problems, and it is unacceptable that a binary blob prevents us to upgrade the kernel.

If anyone has tips to improve the performance of this driver with KDE4, I'd very much appreciate it.

And we must get the Nvidia techs to explain wtf happened with their lastest drivers.

----------

## Link31

More information:

- disabling Composite does not solve the problem

- 169.12 and 173.08 seem to work with the 2.6.25 kernel. But KDE4 is still very slow. I'm currently testing 177.80 with 2.6.25, as this is the first driver which can run KDE4 correctly.

edit : 177.80 seems to work. I played a game during one hour and there were no crashes so far. Not even a single of those dreaded "Xid" messages.

Now I'm going to try with a 2.6.27.

(if only we had a reliable way to reproduce the crash, this would be much easier to check...)

----------

## CrazyTerabyte

 *doctork wrote:*   

> As opposed to Neddy, I suspect the fix will come when CrazyTerabyte switches to the latest version of xorg.  Rumor has it that the stabilization of 1.5.3* will occur "soon."  That in combination with hal makes xorg configuration essentially automatic.

 

Updating to not-stable xorg versions is something that might give me a lot of work and a lot of trouble, so I prefer to not do it.

 *doctork wrote:*   

> CrazyTerabyte -- I'm curious as to why you're running vanilla sources as opposed to the Gentoo versions.

 

Basically, just because no one can blame Gentoo patches to the kernel, if I need to ask for help outside Gentoo community (like in #nvidia channel, or even submitting a bug report to nvidia). This way, no one can say: "Please try again with vanilla kernel before asking for help."

Link31, I wish you the best of luck. Actually, I wish us all the best of luck! As anyone can see from my blog, finding a nvidia-drivers version that work for you can be a pain and takes a lot of time.

For now, I think I found and workaround, which is to use Compiz instead of Beryl. Fortunately, current Compiz Fusion versions are a lot better than they were since the last time I tested.

Note that I said I found an workaround. I don't consider this issue solved, since I still believe there is a bug inside nvidia-drivers.

Thank you all for your help, specially NeddySeagoon.

 *NeddySeagoon wrote:*   

> Do share the answer if/when you get one. Thats what makes the community work.

 

Be sure I will always do that, and it's one of the main reasons I write posts on my blog.

Hey, Link31, please post your results too! Even if some combinations don't work, write them down, as it might make easier to track down the issue, find workarounds, or even easier for other people to find your post.

----------

## CrazyTerabyte

Good news! AaronP, a developer from Nvidia, has managed to reproduce the bug! This means that there should be a bug fix sometime soon (although not for the next version, due to time constraints).

----------

## Link31

Great news!

Just to be sure that we have encountered exactly the same bug (so this fix would solve it for both of us), did you ever see a corrupted screen when the driver crashed ? And when the screen blinked, how many times and how fast did it blink ?

For now, I've found that 177.80 with 2.6.28.7, with a little one-line change to the driver code to make it compile, seems stable enough (177.82 failed with the same kernel). I played a few games since yesterday, and it didn't crashed yet. I'm going to stick with these versions during at least a week to be sure that it works.

----------

## CrazyTerabyte

 *Link31 wrote:*   

> Just to be sure that we have encountered exactly the same bug (so this fix would solve it for both of us), did you ever see a corrupted screen when the driver crashed ? And when the screen blinked, how many times and how fast did it blink ?

 

Nope, the screen blinked black only once, and only for a fraction of a second. I didn't see any graphic corruption. (actually, if I waited long enough, like about 10~30 seconds, the screen would blink again, like if the driver was trying to recover from crash, but still no corruption)

So it appears to be a different bug. I think you should try to describe how to reproduce it the best you can, and then a nvidia developer might be able to reproduce the bug, hopefully.

----------

