# [SOLVED] Now nvidia195.36.24 + linux 2.6.33-r2  = frozen sys

## fert

Anyone else having an issue with the (currently) latest nvidia-drivers and gentoo-sources?

I am using an amd64 system, and when the nvidia module gets inserted (either manually, or automatically) the system locks up. Completely freezes, requiring a hard reboot.

I haven't yet been able to recover any sort of log file that gives me a hint as to what is going on.Last edited by fert on Mon May 17, 2010 2:38 pm; edited 2 times in total

----------

## Randy Andy

fert,

the same combination works like a charm for me, but that say's nothing cause our hardware combination may be different, sorry.

And by the way, that the nvidia driver is masked semms to has a reason...

But for sure, if you may compare somthing:

```

x11-drivers/nvidia-drivers

        Installed versions:  195.36.03!s(19:02:44 25.02.2010)(acpi gtk kernel_linux multilib -custom-cflags)

uname -a

Linux localhost 2.6.33-gentoo #1 SMP Thu Feb 25 18:50:38 CET 2010 x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz GenuineIntel GNU/Linux

01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GS] (rev a1) (prog-if 00 [VGA controller])

        Subsystem: PC Partner Limited Device 8090

        Flags: bus master, fast devsel, latency 0, IRQ 16

        Memory at fd000000 (32-bit, non-prefetchable) [size=16M]

        Memory at d0000000 (64-bit, prefetchable) [size=256M]

        Memory at fa000000 (64-bit, non-prefetchable) [size=32M]

        I/O ports at dc00 [size=128]

        [virtual] Expansion ROM at feae0000 [disabled] [size=128K]

        Capabilities: [60] Power Management version 2

        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+

        Capabilities: [78] Express Endpoint, MSI 00

        Capabilities: [100] Virtual Channel <?>

        Capabilities: [128] Power Budgeting <?>

        Capabilities: [600] Vendor Specific Information <?>

        Kernel driver in use: nvidia

        Kernel modules: nvidia

```

my actual kernel config could be found here:

http://paste.pocoo.org/show/183103/

Good luck, Andy.

----------

## DONAHUE

Possibly you have an older video card or a very new card that is not supported by 195-36-03?

Or 2.6.33 incompatible with even latest stable nvidia-drivers?

----------

## fert

Thanks for the thoughts. Unfortunately, I haven't made any progress yet.

Xeon X3350 (rebadged Q9450) and 8800GTS x2 in SLI (maybe something to look at?)

I had to resort to booting into 2.6.32-gentoo-r2 and nvidia-drivers 190.53 to get the system running again.

I will try recompiling the kernel with Randy Andy's .config just to see, but a quick glance doesn't show any apparently significant differences.

I'm just a bit leery of continuing to play with the combo, as when it crashes, it crashes hard. Each reboot, there has been some file(s) missing, requiring recompiling of one or more packages. I haven't experienced anything like that before.

One of these tries I will end up with a fubar'd system (That's what backups and rescue discs are for, though, right?).

----------

## Shining Arcanine

 *DONAHUE wrote:*   

> Possibly you have an older video card or a very new card that is not supported by 195-36-03?
> 
> Or 2.6.33 incompatible with even latest stable nvidia-drivers?

 

195.36.03 introduced support for the 2.6.33 kernel. Many people have switched to them to avoid having to patch the old drivers. I jumped on that boat last night and they are working perfectly for me with the vanilla 2.6.33 kernel.

----------

## friesia

I've got 195.36.03 drivers on 2.6.33 kernel, AMD64 and 8500GT.

There are problems with MPlayer using gl or vdpau driver (xv works fine so far).

Sometimes X crashes when launching a video, sometimes it freezes, sometimes subtitles blink.

----------

## fert

I see several people reporting success with 195.36.03 and 2.6.33 with single cards.

Is anyone running dual cards successfully? Maybe that's causing the problem?

http://www.nvnews.net/vbulletin/showthread.php?t=148386 seems to possibly suggest that is the case.

----------

## Randy Andy

Today a new built of the known Version-195.36.08 came out, see here:

http://www.nvidia.com/object/linux_display_amd64_195.36.08.html

Give it a try, when it's in the tree, to check if it fixes your trouble...

Regards, Andy.

----------

## Smeagel

I'm using 195.36.15 with 2.6.33-r2 and am getting failures to start with udevd.  Dual-graphics cards + a tesla.  Anybody have any suggestions?

----------

## fert

With kernel 2.6.33-r1, the update to nvidia-drivers 195.36.24 fixed the problem.

Now, with kernel 2.6.33-r2, 195.36.24 is showing the same symptoms (system completely freezes upon the insertion of the nvidia module). The kernels were both compiled with the same .config and use the same boot parameters.

Looking at nvnews.net, nvidia+sli+64-bit seems to have a long history of being fubar'd for some.

Does anyone want to venture a guess as to which change from gentoo-sources-2.6.33-r1 to gentoo-sources-2.6.33-r2 reintroduced the problem?

----------

## Smeagel

 *Quote:*   

> With kernel 2.6.33-r1, the update to nvidia-drivers 195.36.24 fixed the problem. 

 

fert - do you know if this fixes the uevents lock up that I was seeing?  Or just already-booted-lock-up?

I'm perfectly willing to roll back to -r1 if that's all it'll take.

Thanks for the info.

----------

## fert

 *Smeagel wrote:*   

> fert - do you know if this fixes the uevents lock up that I was seeing?  Or just already-booted-lock-up?
> 
> I'm perfectly willing to roll back to -r1 if that's all it'll take.
> 
> Thanks for the info.

 

I'm willing to bet it would fix that problem.

If I was booting a kernel that didn't have the nvidia module already built, the system would freeze after building and trying to insert the module.

If I was trying to boot into a kernel that already had the module built for it, my system would freeze at the uevents boot message.

2.6.33-r1 + 195.36.24 seems to be the only combination that works for me.

----------

## gaebb3r

 *Smeagel wrote:*   

> I'm using 195.36.15 with 2.6.33-r2 and am getting failures to start with udevd.  Dual-graphics cards + a tesla.  Anybody have any suggestions?

 

Might be related to hardware setup - my system's running fine with the combination 2.6.33-r2 and 195.36.15 (Xeon + Geforce 8800 GTS).

----------

## fert

 *gaebb3r wrote:*   

> 
> 
> Might be related to hardware setup - my system's running fine with the combination 2.6.33-r2 and 195.36.15 (Xeon + Geforce 8800 GTS).

 

Out of curiosity, just one 8800 or two?

From what I've gathered, SLI is the troublemaker...

----------

## Shining Arcanine

Have you selected the new kernel version via eselect and run module-rebuild rebuild on your system?

----------

## fert

 *Shining Arcanine wrote:*   

> Have you selected the new kernel version via eselect and run module-rebuild rebuild on your system?

 

No.

I've always manually managed the /usr/src/linux symlink and rebuilt nvidia-drivers, etc. after rebooting into the properly symlinked kernel. (not the Gentoo way I suppose, but old habits die hard).

I'll give it a shot and report back, but unfortunately, I think the problem runs deeper than me mis-linking or improperly rebuilding the module.

Update: No change when using eselect and module-rebuild.  :Sad: 

----------

## Shining Arcanine

That is really odd and it should not be happening. This is a shot in the dark, but would you mind installing sys-kernel/vanilla-sources-2.6.33.4 and seeing what happens?

By the way, what compilers do you have installed on your system and which is set as the system default?

----------

## fert

gcc-4.4.3-r2 (only installed version)

vanilla-sources-2.6.33.4 causes the same lockup with nvidia-drivers-195.36.24

----------

## Shining Arcanine

 *fert wrote:*   

> gcc-4.4.3-r2 (only installed version)
> 
> vanilla-sources-2.6.33.4 causes the same lockup with nvidia-drivers-195.36.24

 

I think that your issues are being caused by a regression introduced into the kernel in version 2.6.33.3.

The best way of dealing with this would be by doing a git bisect, which should allow you to determine the bug within approximately 10 iterations of compiling, installing, rebuilding the nvidia-drivers and rebooting. This should allow you to identify the commit that broke your system.

Unfortunately, I am not well versed on how to use git bisect, so you would probably want either someone else here or someone at the kernel mailing list to advise you. If you ask at the kernel mailing list, you will probably want to say something along the lines of "a commit in 2.6.33.3 broke my system; how do I do a git bisect to figure out which commit it was?".

----------

## fert

 *Shining Arcanine wrote:*   

> If you ask at the kernel mailing list, you will probably want to say something along the lines of "a commit in 2.6.33.3 broke my system; how do I do a git bisect to figure out which commit it was?".

 

Thanks. I'll do that. I'll probably hear alot of "that's what you get for using a closed source driver" though.

Gotta head off to work now, but will report back if I manage to uncover anything.

----------

## fert

Upon further investigation, it seems as though something in Gentoo's patches in 2.6.33-r1, that is changed or removed from -r2 must fix my problem.

Vanilla 2.6.33.2 doesn't work, but gentoo 2.6.33-r1 does.

If I'm reading the changelog correctly, gentoo-sources-2.6.33-r1 is 2.6.33.2 + gentoo patches, no?

----------

## fert

Arghhh...

I can't believe I forgot how much I wrestled with 2.6.33-r1 to get it to work with nvidia. The same "fix" works for -r2. I've wasted so much time on this.

The "fix" is simple. VGA arbitration does not play well with nvidia SLI. I just need to disable CONFIG_VGA_ARB in the kernel. For the life of me, I could not find this option through "make menuconfig", so I had to edit drivers/gpu/vga/Kconfig, and set the default to "n" to keep it from being reset to "y" upon "make all". Then I also had to manually edit .config to disable it there before building the kernel.

Smeagel: Sorry if I sent you on a wild goose chase. I think disabling VGA_ARB might work for you too.

I swear I must have Alzheimers or suffered from some form of temporary amnesia.

Oh hell, there's even a patch for this issue (untried by me) that would probably work as well:

http://www.nvnews.net/vbulletin/showthread.php?t=142656

----------

## Shining Arcanine

 *fert wrote:*   

> Arghhh...
> 
> I can't believe I forgot how much I wrestled with 2.6.33-r1 to get it to work with nvidia. The same "fix" works for -r2. I've wasted so much time on this.
> 
> The "fix" is simple. VGA arbitration does not play well with nvidia SLI. I just need to disable CONFIG_VGA_ARB in the kernel. For the life of me, I could not find this option through "make menuconfig", so I had to edit drivers/gpu/vga/Kconfig, and set the default to "n" to keep it from being reset to "y" upon "make all". Then I also had to manually edit .config to disable it there before building the kernel.
> ...

 

You can type "/" to bring up a search menu in make menuconfig that will help you find the location of that setting. There should be no need to modify the sources to have it default to off. You could also manually modify your .config file, although it is better to do modifications through make menuconfig.

----------

## fert

 *Quote:*   

> You can type "/" to bring up a search menu in make menuconfig that will help you find the location of that setting.

 

Yeah, but the option is still not visible. Can you see VGA_ARB under ->Device Drivers->Graphics Support?

I see now. It won't show up as an option in menuconfig unless you select CONFIG_EMBEDDED (which I had never done before), but without CONFIG_EMBEDDED=y, CONFIG_VGA_ARB will be hidden, default to "y", and fubar SLI.

I wonder if it is better to turn off VGA_ARB or to use the nvidia patch...

----------

## Shining Arcanine

 *fert wrote:*   

>  *Quote:*   You can type "/" to bring up a search menu in make menuconfig that will help you find the location of that setting. 
> 
> Yeah, but the option is still not visible. Can you see VGA_ARB under ->Device Drivers->Graphics Support?
> 
> I see now. It won't show up as an option in menuconfig unless you select CONFIG_EMBEDDED (which I had never done before), but without CONFIG_EMBEDDED=y, CONFIG_VGA_ARB will be hidden, default to "y", and fubar SLI.
> ...

 

The nvidia driver patch in a local overlay is the best way to go until it is included in the portage tree.

I suggest that you file a ticket so that it can be included into the portage tree.

----------

