# [WORKAROUND] Linux-5.10 + nvidia-drivers causes blackscreen

## von_kossa

Hi!

I just tried to migrate from gentoo-sources 4.9 to 5.10.

Because of very long listnewconfig i choosed to do a make olddefconfig and cross my fingers.  :Smile: 

That was hoping for to much, i get som weird problems booting into the new kernel. i will try to explain this in i linear way:

First i thought everything was ok, i got some warning in starting "local" service.

Recompiled nvidia-drivers

rebooted computer

logged in, started X. Noticed that it took a longer time to initialize screen but worked.

Started to investigate the "local" service warning and found out that it was because i had it echo noop to scheduler for SSD. Apparently there is some major changes done here with the IO schedulers.

And from this point it started going downhill

When i change baselayout1 script to echo none instead and started X again the screen went black.

rebooted computer

changed removed echoing none to scheduler

rebooted computer, reboot hangs for some weird reason

force reboot computer

try to start X again, black screen again.

reboot computer again.

X works, but feels sluggish

From this point i tried to find out what was connected to what.

And i think that the scheduler setting has nothing to do with the black screen, apparently nvidia-drivers works randomly for me in 5.10.

Regarding the reboot issue it seems like the reboot process hangs if i log in as root and try to reboot.

If i log in as ordinary user and try to reboot it seems to work,

Something tells me if i solve the weird X behaviour it will also solve the reboot issue.

Do anyone have any idea why this happens? Or should i just start over with a fresh 5.10 config and try to replicate the settings from my old kernel?

Thanks in advanceLast edited by von_kossa on Thu Oct 06, 2022 6:24 pm; edited 2 times in total

----------

## von_kossa

listnewconfig:Last edited by von_kossa on Sun Oct 02, 2022 10:24 am; edited 1 time in total

----------

## von_kossa

dmesg5.10.144:Last edited by von_kossa on Sun Oct 02, 2022 10:25 am; edited 1 time in total

----------

## Goverp

von_kossa,

You won't get many stars for posting all that into a forum entry - the preferred way is to include a pointer to something like a pastebin file.

At a guess your X issues are to do with losing your framebuffer settings.  Check around the forums for entries about "My screen goes blank after GRUB" or words to that effect, or just read the wiki entry about framebuffers.

As to schedulers, more recent kernels use "schedutil" for cpu scheduling, and one of mq-deadline or bfq for IO scheduler (or none for SSDs and NVME devices).

----------

## von_kossa

Sorry about that, will think of pastebin in the future.

 *Quote:*   

> At a guess your X issues are to do with losing your framebuffer settings. Check around the forums for entries about "My screen goes blank after GRUB" or words to that effect, or just read the wiki entry about framebuffers.  

 

I don´t i follow you here, i am booting into console directly and start X server manually after. So my framebuffer definitly works.

I read the wiki entry and things didn´t become clearer...

----------

## pietinger

 *von_kossa wrote:*   

> [...] i choosed to do a make olddefconfig and cross my fingers.

 

Maybe this is the reason. I never recommend "make olddefconfig"; instead do a "make oldconfig" (and answer all questions).

If you want start again, go into your /usr/src/linux-5.10.x and do

```
# make distclean

# cp /usr/src/linux-4.9.x/.config .

# make oldconfig

# do all the rest like make -jX make install, make modules_install and do grub if you have
```

Test this new kernel.

----------

## von_kossa

I cleaned out all the framebuffer settings of old and followed the recommended settings.

I can now reboot the computer without hangups it seems.

So next step is to recompile nvidia-drivers and test X.  :Smile: 

----------

## figueroa

When I upgraded from 4.9 to 5.10, I did it in small bites, going from 4.9 to 4.14, then 4.19, then 5.4, and finally 5.10 over the course of a couple of weeks. This way I'm more confident of bringing forward my settings from the former kernel, and I'm not overwhelmed with decisions doing make oldconfig doing the new kernel.

The logic of my steps was to follow the chain of LTS kernels.

----------

## von_kossa

I did finally recompile the nvidia-drivers, and then i got this:

*   CONFIG_DRM_KMS_HELPER: is not set but needed for Xorg auto-detection

 * 	of drivers (no custom config), and for wayland / nvidia-drm.modeset=1.

 * 	Cannot be directly selected in the kernel's menuconfig, and may need

 * 	selection of a DRM device even if unused, e.g. CONFIG_DRM_AMDGPU=m or

 * 	DRM_I915=y, DRM_NOUVEAU=m also acceptable if a module and not built-in.

 * Please check to make sure these options are set correctly.

 * Failure to do so may cause unexpected problems.

after that the reboot once again hanged in 5.10.

Funny thing is that above message contradicts instructions in:

https://wiki.gentoo.org/wiki/Framebuffer

https://wiki.gentoo.org/wiki/NVIDIA/nvidia-drivers

----------

## NeddySeagoon

von_kossa,

I think you have two separate problems.

One is your console driver and separately, the nvidia-drivers binary blob.

For your console driver, choose

```
 <*>   VGA 16-color graphics support

 [*]   VESA VGA graphics support

 [*]   EFI-based Framebuffer Support

 <*>   Simple framebuffer support
```

in the kernel.

The kernel will choose the best one. It may even change if a batter one starts later than the first one it chose. That's fine.

nvidia-drivers plays no part in this it does not provide a text console.

However you start your display manager, disable it while we debug.

With the display manager disabled and those four kernel options, your system should boot to a text console and allow you to login.

Xorg, with or without nvidia-drivers, is a layer on top of the above, so get the bottom layer working first, then we can build on what we know works.

----------

## von_kossa

Hi again!

I went the Figueroa route:

 *Quote:*   

> When I upgraded from 4.9 to 5.10, I did it in small bites, going from 4.9 to 4.14, then 4.19, then 5.4, and finally 5.10 over the course of a couple of weeks. This way I'm more confident of bringing forward my settings from the former kernel, and I'm not overwhelmed with decisions doing make oldconfig doing the new kernel. 

 

So 4.14 and 4.19 worked flawlessly.

Interesting to note is that when nvidia-drivers compile under 4.9, 4,14, 4.19 it looks like this when doing a lsmod:

Module                  Size  Used by

nvidia_drm             53248  3

nvidia_modeset       1150976  5 nvidia_drm

nvidia              34758656  177 nvidia_modeset

5.4 did work just as 5.10 until you compile nvidia-drivers. Then it looks like this when doing a lsmod:

Module                  Size  Used by

nvidia              34758656  1

I use nvidia-drivers-470 in all versions. Looks like linux-5 doesn´t like nvidia-drivers in my setup.  :Sad: 

Looks like something is prohibiting the nvidia modules to work as they should.

When i moved between linux versions i did make oldconfig and i cannot understand what possibly could cause this issue between 4.19 and 5.4

----------

## von_kossa

 *NeddySeagoon wrote:*   

> von_kossa,
> 
> However you start your display manager, disable it while we debug.
> 
> 

 

Never had it enabled, have always started X manually, and console have always been working. Still... The problem with the nvidia-drivers not working might be related to something framebuffer thingy...

----------

## pietinger

von_kossa,

I am not a NVIDIA man, but maybe I can help. AFAIR NVIDIA needs DRM_KMS_HELPER in kernel ... but ... you wont find it, even if you press "z" in "make menuconfig".

But you will find it if you search with / (in "make menuconfig").

Check if it is enabled:

```
DRM_KMS_HELPER [=y]
```

If yes: Good. Have you enabled also all other kernel options listed here: https://wiki.gentoo.org/wiki/NVIDIA/nvidia-drivers

If not:

1. You will see this: Depends on: HAS_IOMEM [=y] && DRM [=y]

Do you have enabled DRM ? =>

```
Device Drivers --->

    Graphics support --->

        [*] Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)  --->
```

If yes and still DRM_KMS_HELPER=N then enable also CONFIG_DRM_SIMPLEDRM: =>

```
Device Drivers --->

    Graphics support --->

        [*] Simple framebuffer driver
```

because this does:

 *Quote:*   

> Selects: DRM_GEM_SHMEM_HELPER [=y] && DRM_KMS_HELPER [=y]

 

I have an Intel GPU and when I search for DRM_KMS_HELPER I get it enabled by:

 *Quote:*   

>  Selected by [y]:   
> 
> - DRM_FBDEV_EMULATION [=y] && HAS_IOMEM [=y] && DRM [=y] && FB [=y] 
> 
> - DRM_I915 [=y] && HAS_IOMEM [=y] && DRM [=y] && X86 [=y] && PCI [=y]
> ...

 

----------

## von_kossa

Gathering of thoughts:

1. The problem is that nvidia modules don´t load as they should, the problem is intermittent so sometimes it works

2. When the modules don´t load correctly, you can only see with lsmod that nvidia is loaded and not nvidia_drm and nvidia_modesetting

3. Trying to modprobe nvidia_drm or modesetting results in nothing happening, probably because nvidia module is in a waiting state or something

4. when all modules is not loaded this results in black screen trying to start X and not being able to mount filesystem as read-only when rebooting, resulting in reboot hangs.

5. As previosly stated that 4.19 was working was incorrect. Problem is present in 4.19, and maybe 4.14. All i know is that it is not present in 4.9.

Probable causes?

Totally unknown, can somebody help me with how you troubleshoot modules. Because for some reason i cannot rmmod nvidia module. It says that it is in use, even when forced. And logs say nothing at all.

----------

## NeddySeagoon

von_kossa,

I misunderstood the problem.

Are there any hints in dmesg?

If you 

```
modprobe -r nvidia-drivers

modprobe nvidia-drivers
```

does that work?

Did you preserve the list order is. 

```
Module Size Used by

nvidia_drm 53248 3

nvidia_modeset 1150976 5 nvidia_drm

nvidia 34758656 177 nvidia_modeset 
```

In lsmod, modules are listed with the most recently listed at the top.

When kernel modules are built a file is created 

```
/lib/modules/<ver>-gentoo/modules.order
```

that together with the modules.dep file, modprobe uses to load the required modules in the right order.

They are text files, do the nvidia related parts look correct?

----------

## von_kossa

 *Quote:*   

> If you
> 
> Code:	
> 
> modprobe -r nvidia-drivers
> ...

 

I will test that tonight.  :Smile: 

 *Quote:*   

> Did you preserve the list order is. 

 

Yes i did, that´s seems to be the right order it´s always listed like that, even under 4.9

 *Quote:*   

> that together with the modules.dep file, modprobe uses to load the required modules in the right order.
> 
> They are text files, do the nvidia related parts look correct?

 

I guess so, or if i put it like this, same nvidia-driver (470) is used on all different kernels involved so nothing is changed by me, but i will have a look anyway.

----------

## von_kossa

I found this:

https://wiki.archlinux.org/title/NVIDIA/Troubleshooting, part 6 modprobe error

There is a mention about:

"This problem is caused by bad commits pertaining to PCIe power management in the Linux Kernel (as documented in this NVIDIA DevTalk thread).

The workaround is to add pcie_port_pm=off to your kernel parameters. Note that this disables PCIe power management for all devices. "

I wonder if there could be some sort of relation.

----------

## NeddySeagoon

von_kossa,

Try it and tell  us :)

At the grub menu, hilight the kernel you wan to boot then press 'e'.

Follow the on screen instructions to add 

```
pcie_port_pm=off
```

to the kernel line.

This is a one time thing. Only the grub in memory is being changed.

----------

## von_kossa

Checklist of possible reasons for myself:

1. pcie_port_pm=off

2. See results from 'modprobe -r nvidia-drivers', 'modprobe nvidia-drivers'

3. Check module loading order: /lib/modules/<ver>-gentoo/modules.order, modules.dep

4. IPMI (https://forums.developer.nvidia.com/t/black-screen-with-mac-version-of-gtx-680/66030?page=2)

5. check nvidia-drivers useflags, uvm, wayland (https://forums.gentoo.org/viewtopic-t-1087484-start-0.html)

6. framebuffer rabbithole

I remember the good old times around 2006, this problem is nothing compared to that time with Gentoo.   :Smile: 

I was actually reading the 100 page manual for gentoo while my wife was delivering our first child next to me.

----------

## NeddySeagoon

von_kossa,

If you think its a kernel bug, try a newer kernel.

5.15.x is stable.

5.19.x still gets bugfixes

6.0 was out today - nvidia-drivers may not build with that yet. 

I

----------

## von_kossa

Well my longterm goal is getting 5.10 running because it is at the moment supported longest. So i can live in peace and harmony until next time i need to change something.

But if nothing else works, i will eventually have to test newer kernels...

----------

## NeddySeagoon

von_kossa,

The idea of testing with a newer kernel was to see if the 5.10 kernel is implicated or not.

----------

## pietinger

 *von_kossa wrote:*   

> Well my longterm goal is getting 5.10 [...]

 

Next LTS kernel will be 6.1 (if you can wait).

----------

## von_kossa

Maybe so.

6.1 might be a solution.

But i think the problem will remain, it is present in 4.19, 5.4 and 5.10.

Speculating, i think some minor change in the kernel has triggered larger problems with nvidia-drivers.... And being a binary blob from nvidia....well. Thing´s is not that easy to solve.

Googling around on internet many many people have problem with nvidia and "black screens", but few understand that it is related to the module not being loaded properly at boot time.

I think this post should have a better title so more people can find it in the future, well, at least if we solve the problem.  :Smile: 

----------

## NeddySeagoon

von_kossa,

Edit your original post in the topic and change the title if you wish.

----------

## Ionen

Generally I suggest the following:

FB_EFI=y (or FB_VESA=y if not EFI)

FB_SIMPLE disabled (can be quirky, and likely won't work without SYSFB_SIMPLEFB, or the old option name (from <=5.4 or so) that used to be X86_SYSFB which came as unexpected for users going from 5.4 to 5.10 or around there, forgot when this got renamed)

SYSFB_SIMPLEFB disabled (this broke with >=5.18.13 + nvidia, making FB_SIMPLE kinda useless, FB_EFI will do the job but needs this to be disabled to work like the description says)

DRM_SIMPLEDRM disabled (nowadays this let the console display at first with nvidia, but everything will go to hell when X or wayland starts, usually black screen and can't do anything anymore)

DRM_KMS_HELPER enabled in some way, module is fine too (nouveau module works to enable it, or about anything like intel drm will work if you use that -- doesn't really matter which as long as it's not DRM_SIMPLEDRM, Edit: well in theory it /should/ be fine too if simpledrm is a module and not builtin)

The nvidia ebuild was recently updated to try to give a few more hints about these, not that it's fully exhaustive (lot of ways to break this).

Edit: This matches what stable gentoo-kernel-bin-5.15.x is using as configuration (meaning that kernel should typically "just work"), ~testing gentoo-kernel-bin-5.19.x unfortunately has DRM_SIMPLEDRM set (this uses fedora's kernel config, and they've been trying to push that and disabling all CONFIG_FB_* and this been causing a lot of nvidia issues over there, they do have some partial workarounds but well), but that will be reverted in >=gentoo-kernel-bin-5.19.13 which should be fine again with nvidiaLast edited by Ionen on Mon Oct 03, 2022 8:40 pm; edited 2 times in total

----------

## von_kossa

Thank you Ionen. This covers point 6 in my list, the framebuffer rabbithole. And i think you spared me alot of time here. Looks good.  :Smile: 

1. pcie_port_pm=off

2. See results from 'modprobe -r nvidia-drivers', 'modprobe nvidia-drivers'

3. Check module loading order: /lib/modules/<ver>-gentoo/modules.order, modules.dep

4. IPMI (https://forums.developer.nvidia.com/t/black-screen-with-mac-version-of-gtx-680/66030?page=2)

5. check nvidia-drivers useflags, uvm, wayland (https://forums.gentoo.org/viewtopic-t-1087484-start-0.html)

6. framebuffer rabbithole, se Ionen post.

----------

## von_kossa

@Ionen

 *Quote:*   

> Edit: This matches what stable gentoo-kernel-bin-5.15.x is using as configuration (meaning that kernel should typically "just work"), ~testing gentoo-kernel-bin-5.19.x unfortunately has DRM_SIMPLEDRM set (this uses fedora's kernel config, and they've been trying to push that and disabling all CONFIG_FB_* and this been causing a lot of nvidia issues over there, they do have some partial workarounds but well), but that will be reverted in >=gentoo-kernel-bin-5.19.13 which should be fine again with nvidia

 

Do you know how they configured DRM_KMS_HELPER in gentoo-kernel-bin-5.15.x? Or how did they enable it so to speak...

----------

## Ionen

 *von_kossa wrote:*   

> Do you know how they configured DRM_KMS_HELPER in gentoo-kernel-bin-5.15.x? Or how did they enable it so to speak...

 This is a generic kernel, so it enables a whole bunch of DRM_* (intel, amd, nouveau, etc...) which will enable DRM_KMS_HELPER at same time. Aka same way that the ebuild and my earlier post suggest, enable something from there (like DRM_NOUVEAU=m, nouveau is not harmful if built as a module given nvidia-drivers ebuild prevents it from being loaded by default).

----------

## von_kossa

 *Ionen wrote:*   

>  *von_kossa wrote:*   Do you know how they configured DRM_KMS_HELPER in gentoo-kernel-bin-5.15.x? Or how did they enable it so to speak... This is a generic kernel, so it enables a whole bunch of DRM_* (intel, amd, nouveau, etc...) which will enable DRM_KMS_HELPER at same time. Aka same way that the ebuild and my earlier post suggest, enable something from there (like DRM_NOUVEAU=m, nouveau is not harmful if built as a module given nvidia-drivers ebuild prevents it from being loaded by default).

 

Stupid question about this, should i change to DRM_NOUVEAU=m in .config and then run make menuconfig or what is the preferred approach? more exactly, how do i enable DRM_KMS_HELPER without getting alot of other stuff with it along the way.

----------

## Ionen

Ideally use `make menuconfig` or something, the .config will auto-update based on dependencies and you may lose what you're setting.

In menuconfig you can search to jump to options, e.g. hit /, type DRM_NOUVEAU, then if it's a selectable option it should display a number (either to this option or a prerequisite), aka like the (1) here:

```
  │   Location:   

  │     Main menu

  │       -> Device Drivers

  │ (1)     -> Graphics support    
```

Hit 1, to jump to it. If that's not nouveau then may need to enable a pre-requisite option around there like CONFIG_DRM (hit 'm' to set as module), then search for DRM_NOUVEAU again, hit the number then 'm' to enable it as module.

----------

## von_kossa

 *Ionen wrote:*   

> Ideally use `make menuconfig` or something, the .config will auto-update based on dependencies and you may lose what you're setting.
> 
> In menuconfig you can search to jump to options, e.g. hit /, type DRM_NOUVEAU, then if it's a selectable option it should display a number (either to this option or a prerequisite), aka like the (1) here:
> 
> ```
> ...

 

Thanks, can you configure CONFIG_DRM=y instead of a module without anything else breaking? I usually want to keep modules at a minimum.

----------

## Ionen

 *von_kossa wrote:*   

> Thanks, can you configure CONFIG_DRM=y instead of a module without anything else breaking? I usually want to keep modules at a minimum.

 Yeah that's fine.

----------

## von_kossa

 *Ionen wrote:*   

>  *von_kossa wrote:*   Thanks, can you configure CONFIG_DRM=y instead of a module without anything else breaking? I usually want to keep modules at a minimum. Yeah that's fine.

 

Final questions, hopefully. And thank you for you help by the way.

When enabling CONFIG_DRM,  these two are enabled by default:

CONFIG_DRM_FBDEV_EMULATION

CONFIG_DRM_FBDEV_OVERALLOC

should they be enabled?

And also:

Under Frame buffer Devices

CONFIG_FB_MODE_HELPERS

CONFIG_FB_TILEBLITTING

Is also enabled by default, should they be enabled?

----------

## pietinger

 *von_kossa wrote:*   

> When enabling CONFIG_DRM,  these two are enabled by default:
> 
> CONFIG_DRM_FBDEV_EMULATION
> 
> CONFIG_DRM_FBDEV_OVERALLOC
> ...

 

Yes.

 *von_kossa wrote:*   

> And also:
> 
> Under Frame buffer Devices
> 
> CONFIG_FB_MODE_HELPERS
> ...

 

It is not necessary, but it does not hurt. Working with <Help> inside "make menuconfig" gives you:

 *Quote:*   

> CONFIG_FB_MODE_HELPERS:
> 
> A few drivers rely                                                                                                                                                                                                                                                                                                                  
> 
> on this feature such as the radeonfb, rivafb, and the i810fb. If
> ...

 

----------

## Ionen

I'd advice to not try too hard to be minimal at least until you get this working properly, removing superfluous things is easier when you got a working baseline to compare with.

----------

## von_kossa

Hi!

ok, just arrived home and looked into my settings how it is configured now, and that arises some questions (again).

This is what i have enabled:

```

CONFIG_DRM=y

# CONFIG_DRM_DP_AUX_CHARDEV is not set

CONFIG_DRM_KMS_HELPER=y

CONFIG_DRM_KMS_FB_HELPER=y

CONFIG_DRM_FBDEV_EMULATION=y

# Frame buffer Devices

#

CONFIG_FB=y

CONFIG_FB_CMDLINE=y

CONFIG_FB_NOTIFY=y

CONFIG_FB_CFB_FILLRECT=y

CONFIG_FB_CFB_COPYAREA=y

CONFIG_FB_CFB_IMAGEBLIT=y

CONFIG_FB_SYS_FILLRECT=y

CONFIG_FB_SYS_COPYAREA=y

CONFIG_FB_SYS_IMAGEBLIT=y

CONFIG_FB_SYS_FOPS=y

CONFIG_FB_DEFERRED_IO=y

CONFIG_FB_MODE_HELPERS=y

CONFIG_FB_TILEBLITTING=y

```

From this we can gather:

1. Apparently you do not need to enable DRM_NOUVEAU or something else, it is only needed to enable CONFIG_DRM and CONFIG_DRM_KMS_HELPER happily arrives in .config

2. Why do i have a working console (yes, in low resolution)? i have no drivers selected, or is CONFIG_DRM_FBDEV_EMULATION counted as a driver?

3. Why would enabling FB_VESA solve my problem? my current solution altough confusing is more minimal

----------

## pietinger

Look into another part of your .config. Do you have here SYSFB_SIMPLEFB enabled (like I have) ?

```
#

# ARM System Control and Management Interface Protocol

#

# end of ARM System Control and Management Interface Protocol

# CONFIG_EDD is not set

# CONFIG_FIRMWARE_MEMMAP is not set

# CONFIG_DMIID is not set

# CONFIG_DMI_SYSFS is not set

CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y

# CONFIG_FW_CFG_SYSFS is not set

CONFIG_SYSFB=y

CONFIG_SYSFB_SIMPLEFB=y

# CONFIG_GOOGLE_FIRMWARE is not set
```

If yes, please see post from @Ionen above.

----------

## von_kossa

 *pietinger wrote:*   

> Look into another part of your .config. Do you have here SYSFB_SIMPLEFB enabled (like I have) ?
> 
> ```
> #
> 
> ...

 

No i don´t, i checked every entry @Ionen gave.

EDIT-----

I dont´t even have CONFIG_SYSFB in my .config.

----------

## pietinger

 *von_kossa wrote:*   

> I dont´t even have CONFIG_SYSFB in my .config.

 

Okay - this means you are on 5.10 (and not 5.15). What happens if you try now to install NVIDIA ?

----------

## von_kossa

 *pietinger wrote:*   

>  *von_kossa wrote:*   I dont´t even have CONFIG_SYSFB in my .config. 
> 
> Okay - this means you are on 5.10 (and not 5.15). What happens if you try now to install NVIDIA ?

 

That is what i have been testing all the time, never been on 5.15.

If i install nvidia-drivers they compile as they should

but intermittently only nvidia-module loads and not nvidia_drm or nvidia_modesetting, resulting in some kind of misbehaving module.

If i try to startx, black screen

if i reboot the computer, it is unable to mount filesystem in read only and hangs.

----------

## Ionen

Generally I'd advise to disable SYSFB_SIMPLEFB, the old X86_SYSFB, and FB_SIMPLE because this all broke in >=5.18.13 with nvidia and there's little reason to use that with old kernels either anyway.

FB_EFI=y should work without any of those, SYSFB_SIMPLEFB conflicts with it.

Are you missing FB_EFI? If not a EFI system, then FB_VESA would be what you want (having both is harmless, albeit they both need SYSFB_SIMPLEFB disabled to work properly unlike FB_SIMPLE which instead requires it).

Edit: wrt 5.15.x, may want to try stable gentoo-kernel-bin to rule out your configuration being the issue -- prebuilt so it's quick to try

----------

## von_kossa

```
1. pcie_port_pm=off
```

no difference

```
2. See results from 'modprobe -r nvidia-drivers', 'modprobe nvidia-drivers'
```

no module present nvidia-drivers

if modprobe -r nvidia

FAILURE: module is in use

lsof tells me that the process systemd-udevd is trying to load nvidia-module

```
3. Check module loading order: /lib/modules/<ver>-gentoo/modules.order, modules.dep
```

4.9.321

modules order: empty

modules dep:

video/nvidia-modeset.ko: video/nvidia.ko

video/nvidia-peermem.ko:

video/nvidia-drm.ko: video/nvidia-modeset.ko video/nvidia.ko

video/nvidia-uvm.ko: video/nvidia.ko

video/nvidia.ko:

5.4.214

modules order: empty

modules dep:

video/nvidia-modeset.ko: video/nvidia.ko

video/nvidia-peermem.ko:

video/nvidia-drm.ko: video/nvidia-modeset.ko video/nvidia.ko

video/nvidia-uvm.ko: video/nvidia.ko

video/nvidia.ko:

```
4. IPMI (https://forums.developer.nvidia.com/t/black-screen-with-mac-version-of-gtx-680/66030?page=2)
```

no difference with IPMI compiled

```
5. check nvidia-drivers useflags, uvm, wayland (https://forums.gentoo.org/viewtopic-t-1087484-start-0.html)
```

uvm seems to have no relevance

```
6. framebuffer rabbithole, se Ionen post.
```

haven´t tested this yet.

----

moved the nvidia modules from /lib/modules/5.4.214-gentoo/video

rebooted system

moved the modules back

tested modprobe manually, works every time (tested around 20 times)

So the problem seems to be related to the loading of the nvidia modules during the boot process.

And "systemd-udevd" process is apparently involved, and is impossible to kill.

----------

## von_kossa

Found this:

https://bugs.gentoo.org/670340#c8%5B/url%5D

https://bugs.gentoo.org/667362

Yes, apparently i can load the modules manually and make it work. But really....   :Evil or Very Mad: 

EDIT:

And those bugs ended with:

"The likely removal removal of nvidia-udev.sh will hopefully solve those."

@Ionen, you should remember this, it was your comment above. This problem arised for me moving above 4.9 kernel and the removal of nvidia-udev.sh script apparently didn´t solve it for me.

----------

## Ionen

Well, udev issues is still something I have no idea about. Could never reproduce and nvidia-drivers don't use udev for anything at the moment (which made me not consider this nvidia-drivers' bug anymore, makes it look like udev is just acting up on its own).

Also doesn't seem widespread, rather rare I hear about it. Maybe some weird hardware specific conflict (sorry, I don't know really, wish could help with that one).

----------

## von_kossa

yeah, will have to look into my kernel settings for udev. I bet somethings wrong there.

Anyway, worst case scenario i can always load the modules after boot but it feels like cheating.

----------

## OldTango

 *von_kossa wrote:*   

> Maybe so.
> 
> 6.1 might be a solution.
> 
> But i think the problem will remain, it is present in 4.19, 5.4 and 5.10.
> ...

 Not sure I can add any additional help here but FYI: 

I am currently running a system with

```
Linux SuperTux 5.4.66-gentoo #6 SMP PREEMPT Tue Feb 16 13:30:56 MST 2021 x86_64 AMD Ryzen 9 5950X 16-Core Processor AuthenticAMD GNU/Linux
```

and a system with

```
Linux MasterTux 4.19.86-gentoo #4 SMP Tue Jan 19 13:50:02 MST 2021 x86_64 AMD Ryzen 7 2700X Eight-Core Processor AuthenticAMD GNU/Linux
```

Both are using

```
x11-drivers/nvidia-drivers-515.65.01:0/515::gentoo  USE="X driver static-libs tools -dist-kernel -kernel-open -persistenced -wayland" ABI_X86="32 (64)" 0 KiB
```

No work-arounds are necessary

Xorg states that an xorg.conf file is no longer necessary and things should just work, but in the past I have had problems with nvidia-drivers in xorg and they were solved by creating a simple xorg.conf file in /etc/X11. Heres mine:

```
Section "Monitor"

        Identifier "Monitor0"

        #VenderName ""

        #ModelName ""

EndSection

Section "Device"

        Identifier "Device0"

        Driver "nvidia"

        #VenderName "NIVIDA Corporation"

        BoardName "GeForce GTX 1650ti"

EndSection

Section "Screen"

        Identifier "Screen0"

        Device "Device0"

        Monitor "Monitor0"

        DefaultDepth 24

EndSection
```

I am not sure this file is even necessary anymore, but it dosen't cause any issues either.

I only upgrade my kernels about once a year, unless I have some very compeling reason to do it sooner. While I'm ready now, I am waiting for 5.15.71 LTS to go stable before I build and test one.

Best Tango.....  :Smile: 

----------

## von_kossa

 *Ionen wrote:*   

> Well, udev issues is still something I have no idea about. Could never reproduce and nvidia-drivers don't use udev for anything at the moment (which made me not consider this nvidia-drivers' bug anymore, makes it look like udev is just acting up on its own).
> 
> Also doesn't seem widespread, rather rare I hear about it. Maybe some weird hardware specific conflict (sorry, I don't know really, wish could help with that one).

 

Regarding "nvidia-drivers don´t use udev" makes me wonder what actually pulls in the modules, i guess it is autoloading modules in /lib/modules/*kernel*/....?

----------

## von_kossa

I think these are my last alternatives:

1

check some kernel parameters like:

CONFIG_UEVENT

CONFIG_AMD_MEM_ENCRYPT

2

start from scratch with default config 5.10

3

WORKAROUND

1. Added in /etc/modprobe.d/blacklist.conf the following lines:

blacklist nvidia

blacklist nvidia_drm

blacklist nvidia_modeset

2. Created file /etc/local.d/nvidia-udev-workaround.start

Added the following lines in it:

#!/bin/sh

echo "NVIDIA WORKAROUND IN PROGRESS";

modprobe nvidia_drm;

3. Made that script executable by:

chmod +x /etc/local.d/nvidia-udev-workaround.start

4. Made sure that local service appears in default runlevel:

rc-update show default

----------

## von_kossa

Final assessment.

Implemented above workaround which for me simply means that modules loads at a later stage.

Reasons for modules not loading during boot is unknown and can only be speculated about.

I suspect, OpenRC, Systemd-utils (udev), kernel bug or something of these in conjunction with my old sandybridge based hardware with gtx660.

If someone finds this thread and actually know whatś causing this, please share information.

----------

