# Problem with the latest nvidia drivers

## brutico

[Moderator note: this thread was posted twice.  The first response to this thread was in one copy.  The second response was in the other copy.  I combined the two threads into one. -Hu]

I did not start X with version 410.57 and I had to download the version. Has anyone had the same?

My video card GTX 1080

----------

## DawgG

yes, same problem. run ~amd64 with 4.18.9 on a two-kernel-solution as descibed https://wiki.gentoo.org/wiki/Nouveau_%26_nvidia-drivers_switching#Switching_using_two_kernels

the script supposed to "modpropbe nvidia" hangs at just this point. i checked via ssh and nvidia IS loaded, but cannot be unloaded even if the hanging 

```
modprobe -q nvidia
```

 is killed manually.

```
nvidia-modprobe
```

 does not change a thing; even if module is loaded startx returns "Failed to initialize nvidia kernel module."

Any ideas (besides downgrading)?

----------

## Keruskerfuerst

Can you give some more information ?

----------

## brutico

 *DawgG wrote:*   

> yes, same problem. run ~amd64 with 4.18.9 on a two-kernel-solution as descibed https://wiki.gentoo.org/wiki/Nouveau_%26_nvidia-drivers_switching#Switching_using_two_kernels
> 
> the script supposed to "modpropbe nvidia" hangs at just this point. i checked via ssh and nvidia IS loaded, but cannot be unloaded even if the hanging 
> 
> ```
> ...

 

only supports the 2080 range

https://www.nvidia.es/Download/driverResults.aspx/138290/es

----------

## DawgG

as per https://www.nvidia.es/Download/driverResults.aspx/138290/es i say 

 *Quote:*   

> Added support for the following GPUs:
> 
>     GeForce RTX 2080 Ti
> 
>     GeForce RTX 2080

 

does not mean  *Quote:*   

> only supports the 2080 range

 

i downgraded to the 396.54-version which works w/out problems (since i've had my share of nvidia-binary-troubles recently.)

----------

## runningnak3d

I am running 410.57 (on a GTX 1050) without problems on OpenRC .

I use Dantrell's Gnome overlay for Gnome 3.28 if that matters.

I can post any other relevant info that could help you.

-- Brian

----------

## ct85711

Well, I checked on my system and I have no problem with running 410.57 drivers with openrc, xfce on a 4.15.10 kernel.  From what I have seen, the 2 parts that minimizes the problems with nvidia-drivers I get is NOT updating the kernel as often (nvidia is known to be slow on supporting the newest kernel, so if you are running unstable branch, don't go and use the newest kernel right away unless you need to).  The other part, is that it tends to work much better if you do a full system reboot after you updated the nvidia drivers.  You may be able to get the drivers reloaded, but I find rebooting instead avoids those problems.  The part to keep in mind, if that the nvidia drivers is not just one file; so when you get them reloaded, that doesn't mean all of it's libraries are also reloaded along with it.

Note:  I my system has a GT 740 card in it.

----------

## i4dnf

No go on a 760 here, with a 4.18.9 kernel  :Sad: .

ct85711, runningnak3d do you by any chance have NUMA and/or CGROUPS enabled in your kernel config? 

(both are disabled here, but I've noticed on some CUDA related issues that NUMA/CGROUPS were [unintended] requirements)

----------

## arnvidr

Running normally for me, on a 4.18.7 kernel

----------

## Nikmind

I got the same problem and reverting to the previous driver works. As far as I can see when looking into the Xorg log files, is that it does not recognize that I have a monitor connected. I just says that it can't find anything connected to any of the ports on the graphics card. But I for sure have one connected  :Razz:  On the previous version of the drivers it finds the monitor just fine.

----------

## ct85711

I know I have NUMA enabled by default for all kernels that I build, cgroup is enabled but I never use it (mainly just some useless bloat that I haven't trimmed away in the kernel yet).  I don't use CUDA, so I never cared about that stuff.

Edit:  I have seen the monitor not detected issue, I've seen once and a while; but sadly I don't know anything for sure to fix it.  One cause I've seen, is if the monitor isn't turned on before the computer is booted.

----------

## j_c_p

NVIDIA Driver 410.57, kernel 4.18.10 and GTX960 without problem here (KDE desktop).

 *Quote:*   

> jcp@phoenix64 ~ $ lspci
> 
> 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD9x0/RX980 Host Bridge (rev 02)
> 
> 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD890S/RD990 I/O Memory Management Unit (IOMMU)
> ...

 

----------

## runningnak3d

 *i4dnf wrote:*   

> No go on a 760 here, with a 4.18.9 kernel .
> 
> ct85711, runningnak3d do you by any chance have NUMA and/or CGROUPS enabled in your kernel config? 
> 
> (both are disabled here, but I've noticed on some CUDA related issues that NUMA/CGROUPS were [unintended] requirements)

 

I have CGROUPS (and everything else that Docker needs) enabled.

Also, forgot to mention that I was running kernel 4.18.5.

-- Brian

----------

## i4dnf

Thanks to both of you. Unfortunately neither NUMA nor CGROUPS help.

Guess I'll just wait for the next driver version.

----------

## ct85711

What you may want to do, is just define the screen/monitor for xorg and not rely on the autodetect.  I've seen before that the xorg autodetect feature does not always work too well, so setting it up saves you from having to worry if xorg sees it or not (as it will already know what to setup.

https://wiki.archlinux.org/index.php/xorg#Monitor_settings

----------

## i4dnf

I do have "Monitor" and "Screen" sections, so at least in my case it's not caused by failing autodetect.

----------

## krinn

that whole thread is made of:

1: it doesn't work

2: it work

you guys are going for stats or expect to get real help just from "it doesn't work" ?

----------

## i4dnf

AFAIKT there's not much we can do other than gathering reports and trying to find a common denominator either for working systems or for not working ones (so far GPU generation doesn't seem to matter, kernel series doesn't seem to mater, some kernel settings like CGROUPS/NUMA don't seem to make a difference, [not] relying on autodetect doesn't make a difference, etc.). 

There's not much other info to provide since when/where it hangs, it simply hangs, there's no log output anymore neither in Xorg.log, nor in dmesg or messages, and this on system[s] where the only change is the nvidia driver.

----------

## Josef.95

nvidia-drivers-410.57 is the current beta release: --> https://devtalk.nvidia.com/default/topic/533434/linux/current-graphics-driver-releases

I think you should report bugs upstream.

----------

## schnitz81

I just ran into this and everything turned black. For a quick fix, just mask the latest version in package.mask and reemerge nvidia-drivers (to get the old 396.54 driver back, which seems to work fine with the latest kernel).

----------

## Cthulhu666

There's a new revision of the 410.57 driver that installs some more (new) libraries. I haven't tried it yet, but it may fix the black screen problem some, myself included, have been experiencing.

----------

## Aradayn

I am getting "No Screens Found" with a 1080 on 410.57. I am running an extremely stripped-down custom kernel.

I tried reverting to the previous driver release and I'm getting the same issues. Also, the video card isn't listed in lspci.

Please let me know if I can provide any additional information that can assist in discovering the problem.

----------

## Cthulhu666

 *Aradayn wrote:*   

> I am getting "No Screens Found" with a 1080 on 410.57. I am running an extremely stripped-down custom kernel.
> 
> I tried reverting to the previous driver release and I'm getting the same issues. Also, the video card isn't listed in lspci.
> 
> Please let me know if I can provide any additional information that can assist in discovering the problem.

 

Sounds like either a hardware problem or too stripped-down kernel. What bus is the graphics card connected to? I'm assuming it's the PCI-E bus, so you should definitely have that enabled. You could try posting your kernel log, as that might contain something useful.

I finally got around to try the 410.57-r1 driver and had the exact same problem. So the -r1 revision didn't change anything for me. It locks up to the point where even Magic SysReq is unable to do anything except changing the keyboard mode. I tried booting without starting X, which worked fine but it locks up hard, when trying to remount filesystems as read-only, when shutting down. I don't have time to investigate this issue further, so I've downgraded to 396.54 for now.

----------

## Aradayn

I can't seem to post my full dmesg log. I copied everything mentioning NVIDIA:

```

[    1.986561] usb 10-1: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd

[    1.996156] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.1/0000:0d:00.1/sound/card1/input9

[    1.996180] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.1/0000:0d:00.1/sound/card1/input10

[    1.996199] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.1/0000:0d:00.1/sound/card1/input11

[    1.996217] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.1/0000:0d:00.1/sound/card1/input12

[    1.997367] ata5: SATA link down (SStatus 0 SControl 300)

...

[    3.551550] udevd[695]: starting eudev-3.2.6

[    3.588019] usb 1-2.3: new full-speed USB device number 7 using xhci_hcd

[    3.627589] igb 0000:04:00.0 enp4s0: renamed from eth0

[    3.629461] nvidia_drm: loading out-of-tree module taints kernel.

[    3.629465] nvidia_drm: module license 'MIT' taints kernel.

[    3.629466] Disabling lock debugging due to kernel taint

[    3.686945] nvidia-nvlink: Nvlink Core is being initialized, major device number 252

[    3.687155] nvidia 0000:0d:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem

[    3.687249] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.54  Tue Aug 14 19:02:34 PDT 2018 (using threaded interrupts)

[    3.691509] random: crng init done

[    3.691510] random: 7 urandom warning(s) missed due to ratelimiting

[    3.749403] EXT4-fs (sda3): re-mounted. Opts: discard

[    3.800642] usb 1-2.3: not running at top speed; connect to a high speed hub

[    3.838647] usb 1-2.3: New USB device found, idVendor=0bda, idProduct=5400, bcdDevice= 0.06

[    3.838649] usb 1-2.3: New USB device strings: Mfr=17, Product=18, SerialNumber=19

[    3.838649] usb 1-2.3: Product: BillBoard Device

[    3.838650] usb 1-2.3: Manufacturer: Realtek

[    3.838651] usb 1-2.3: SerialNumber: 123456789ABCDEFGH

[    3.948047] usb 1-2.4: new high-speed USB device number 8 using xhci_hcd

[    4.084698] usb 1-2.4: New USB device found, idVendor=0bda, idProduct=5412, bcdDevice= 1.19

[    4.084700] usb 1-2.4: New USB device strings: Mfr=1, Product=2, SerialNumber=0

[    4.084701] usb 1-2.4: Product: 4-Port USB 2.0 Hub

[    4.084702] usb 1-2.4: Manufacturer: Generic

[    4.089715] hub 1-2.4:1.0: USB hub found

[    4.093697] hub 1-2.4:1.0: 2 ports detected

[    4.177457] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]

[    4.177458] caller 0xffffffffc07a4a93 mapping multiple BARs

[    5.089690] Adding 33553404k swap on /dev/sda2.  Priority:-2 extents:1 across:33553404k SSDsc

[    5.513643] IPv6: ADDRCONF(NETDEV_UP): enp4s0: link is not ready

[    5.736219] xhci_hcd 0000:0a:00.0: remove, state 1

[    5.736225] usb usb6: USB disconnect, device number 1

```

----------

## Cthulhu666

The snippet is too limited to be really useful.

You can upload the file to pastebin and link to it.

----------

## Aradayn

Here you are:

https://pastebin.com/rNT6ugE2

Thank you for your time, I really appreciate it. This was working in the past, with an earlier version of this kernel configuration. I guess my problem has nothing to do with the latest driver anyway. I've tried running through the Nvidia driver wiki page again, but no luck, exactly the same results.

----------

## Cthulhu666

No problem, only happy to help.

I'll take a look at the log when time allows it.

If you still have the old kernel config, you could try diff'ing the old and the new and see if anything you have removed stands out (PCI, ACPI, etc...).

----------

## krinn

Your log is showing at second 100 that a new usb device was add, showing the boot has been complete (at 8rd second) and a new usb device has been plug, a sign everything is working.

And your log show nvidia-drivers version 396.54 in use.

I don't think showing anyone a kernel.log of a working nvidia drivers and kernel would help anyone.

----------

## Aradayn

Should I move this to a different thread? Like I said earlier, my problem appears to not actually be related to the latest drivers.

And should I keep this in Kernel & Hardware? I kind of think so since my video card is not showing up when I run lspci.

----------

## Aradayn

My problem was unrelated. (I was disabling some PCI devices in the local script, and since a hardware configuration change this included my graphics card.)

Additionally, the latest drivers work fine for me without any issue.

----------

## i4dnf

Follow-up on this:

The problem appears to be the nvidia module hanging if loaded too early in the boot process.

Blacklisting the nvidia modules so that they only get loaded later, when X starts, is a usable workaround for now, at least in my case.

(Only blacklisting 'nvidia' doesn't work, the other nvidia-* modules need to be blacklisted too, as they pull 'nvidia' if they get loaded)

There's also what seems to be a related bug report: #667362

----------

## ct85711

Just curious, but did you try adding haveged to your startup.  My thinking is that this may be related similar to the slow startup due to insufficient entropy.  In this case, the system isn't timing out the module loading (hence the hanging part).

----------

## i4dnf

No, I didn't, but the problem still manifested on a 4.19.0 kernel with RANDOM_TRUST_CPU=y which prevents the entropy related delay, so I don't think that has any relevance.

----------

## pickd.mask

Problem still exists with nvidia-drivers-410.73 and 410.66 with both 4.18.16 and 4.19.0 kernels.

Blacklisting nvidia modules doesn't help.

I get same stuff as mentioned here: https://bugs.gentoo.org/667362#c0

 *Quote:*   

> timeout 'nvidia-udev.sh add'
> 
> slow: 'nvidia-udev.sh add' 
> 
> timeout: killing 'nvidia-udev.sh add'
> ...

 

And sometimes udevd says that "specified group 'kvm' unknown". 

Applying the patch from https://devtalk.nvidia.com/default/topic/1043346/linux/nvidia-driver-v410-73-fails-to-build-functional-modules/ doesn't help too, so I guess, I have to wait until devs from Nvidia will fix it.

IF they're about to fix it, of course.

Sometimes I think that Linus was right about Nvidia  :Very Happy: 

----------

## pickd.mask

Here's what I did to solve it. Works for me.

Also thank you all guys for posting about your issues and thoughts, it helped me to find out what was going on.

Link https://bugs.gentoo.org/670340#c8

I'll copypaste it here, just in case

 *Quote:*   

> So it appears that I was able to circumwent that issue by rethinking all comments about blacklisting modules coming from wise people.
> 
> At last I noticed that IF I blacklist all modules to prevent them from loading (by udev, from what I know), I can actually load modules manually via modprobe and somehow that works perfectly. NOTE: I couldn't load or remove nvidia modules if I haven't blacklisted them. 
> 
> After that the solution was simple. Probably it's not the best way, maybe it's plain dumb way, but it works for me.
> ...

 

----------

## Utsuho Reiuji

how do you prevent xdm to start before local was invoked? Your fix also works for me, but I have to restart xdm before I get to use my DE.

Edit: On a second look, I noticed that xdm actually doesn't start automatically using this fix. Still better than no desktop...

----------

## pickd.mask

 *Utsuho Reiuji wrote:*   

> how do you prevent xdm to start before local was invoked? Your fix also works for me, but I have to restart xdm before I get to use my DE.
> 
> Edit: On a second look, I noticed that xdm actually doesn't start automatically using this fix. Still better than no desktop...

 

On my system everything autostarts just fine. I didn't make any particular setup related to boot order or something. Maybe you should try to add line with "sleep 10;" to delay modprobe. Just a wild guess.

Here's some info about my runlevels:

```
ivan@pc ~ $ rc-update show boot

               binfmt | boot

             bootmisc | boot

          consolefont | boot

                 fsck | boot

             hostname | boot

              hwclock | boot

              keymaps | boot

           localmount | boot

             loopback | boot

              modules | boot

                 mtab | boot

   opentmpfiles-setup | boot

               procfs | boot

                 root | boot

                 swap | boot

               sysctl | boot

            syslog-ng | boot

         termencoding | boot

              urandom | boot

ivan@pc ~ $ rc-update show default

               cronie | default

                cupsd | default

                 dbus | default

              elogind | default

             iptables | default

                local | default

             net.eth0 | default

             netmount | default

              openvpn | default

                  xdm | default
```

Also I have #rc_parallel="NO" (commented) in /etc/rc.conf. Nothing else about parallel boot is present. That might be the case.

It also worth mentioning that I use kde and sddm.

Maybe something could be helpful.

----------

## Utsuho Reiuji

Thanks for your reply, pickd.mask, this helps a bit. I need to finish this massive update 1st to test things, but I'll try using rc_parallel="NO" just in case.

Not sure if that was the cause, but I noticed that /etc/init.d/local had

```
depend()

{

       after *

        keyword -timeout

}

```

in it. Maybe uncommenting that "after *" bit will help. I also added "after local" to /etc/init.d/xdm in the same section.

Edit: that worked.

----------

## pickd.mask

 *Utsuho Reiuji wrote:*   

> 
> 
> Edit: that worked.

 

That's great  :Smile: 

BTW, I checked my /etc/init.d/local and that line was uncommented by default, I never touched it.

----------

## buratino2015

I have the same issue on the kernel 4.14.83 and nvidia-drivers 410.78. Rolled back to the 396.54. Videocard: GTX-1070.

----------

## Maf

Same here with nvidia-drivers-415.18 with Linux 4.14.83, GTX 970. Had to roll back to 396.54.

----------

## Muso

 *Maf wrote:*   

> Same here with nvidia-drivers-415.18 with Linux 4.14.83, GTX 970. Had to roll back to 396.54.

 

With gentoo-sources 4.19.6 and the 415.18 nvidia-drivers, I have no issues at all.   :Confused: 

My video card is a GTX 950

----------

## pvh1987

Thanks a lot! This workaround solved a big problem I had today, with a totally broken system after updating all packages with emerge. Turned out the culprit was the buggy nvidia-drivers.

If you are using CUDA, you might want to add

```

blacklist nvidia_uvm

```

to /etc/modprobe.d/blacklist.conf as well. I do not know if this is necessary, but I would not take any chances. However, it does not seem to load when nvidia_drm is loaded, so I added it to /etc/local.d/nvidia-udev-workaround.start as well:

```

modprobe nvidia_uvm;

```

Finally, after a whole day of troubleshooting, my workstation works again. I don't know how buggy the driver is when it comes to gaming... maybe a performance boost would be in order after all this work, but probably too much to ask for   :Smile: 

----------

## mrbassie

 *pvh1987 wrote:*   

> I don't know how buggy the driver is when it comes to gaming...

 

I use bumblebee so I didn't have any of the probs in the thread, however I masked > 396.54 as the 4.10 driver rendered some textures incorrectly in games.

----------

## Kaorukun

 *pickd.mask wrote:*   

> Here's what I did to solve it. Works for me.
> 
> Also thank you all guys for posting about your issues and thoughts, it helped me to find out what was going on.
> 
> Link https://bugs.gentoo.org/670340#c8
> ...

 

That worked for me too.

Now using x11-drivers/nvidia-drivers-415.18 and sys-kernel/gentoo-sources-4.19.6

----------

## uraes

Uh, got hit by this... Defenitely graziest bug for many passed years  :Sad:  Three days without working PC... and counting. Even tried stable/unstable/older versions of udev,nvidia-drivers, xorg... recompiled all packages (1116 items)...

So far : blacklisting modules helps - no high load anymore, sddm starts.. somehow (meaning: I see mouse cursor frame on black screen. ATM I'm not even sure is this bug/nvidia(related) or I have messed up my configs). Going on with experimentations ...

kernel:  4.19.12

nvidia-drivers: 415.18

Edit: As complaining is biggest helper in IT industry.. after my post I found another thread suggesting about mess with symlinks. Probably, as meanwhile I tried to switch to nouveau, I got broken symlinks under /usr/lib64/ .. so I did 

```

$ cd /usr/lib64/

$ ls -l libGLESv*

$ rm libGLESv*

$ emerge nvidia-drivers

```

----------

## Stolz

One year later and I'm also having the same issue with current drivers

sys-kernel/gentoo-sources-5.4.2

drivers/nvidia-drivers-440.36

sys-fs/eudev-3.2.9

The workaround to blacklist the modules works. Thanks!

----------

