# Failed to initilize the NVIDIA kernel module

## Indi008

I am getting a 'failed to initialize the NVIDIA kernal module' error when running startx. log file output: http://dpaste.com/2EYHFK1

It was all working fine until yesterday when I updated with

```
sudo emerge --update --deep --newuse @world
```

When I ran that I got an error, something about my kernel not being configured and xorg missing a config file. I've never had this error before but I am pretty new to gentoo.

I had possibly changed some kernel settings earlier and not rebuilt kernel (I am not sure) so I figured I'd just rebuild the kernel. I did that and that seemed to work fine, not problems with running anything after. So then I continued the update which seemed to work. 

I did forget that when updating the kernel that the modules need to be rebuilt with 

```
emerge @module-rebuild
```

 so I didn't do this step until after everything went wrong. Maybe this is what has caused my problems?

It was either finished of almost finished the update when I opened a file with vlc media player (not sure if this was related but it happened right when I opened the file) and then the whole system froze. Nothing was responding. Eventually I turned everything off with the power button. The system rebooted fine but now startx fails with the above error.

At this point I rebuilt the modules and also tried re-emerging the nvidia drivers package but no luck.

I tried 

```
 lsmod | grep nvidia 
```

 and got the following output

```
nvidia_drm        49152    0

nvidia_modeset       1081344    1 nvidia_drm

nvidia                    19976192    1 nvidia_modeset

drm_kms_helper        208896    1 nvidia_drm

drm                           540672    3 drm_kms_helper,nvidia_drm

i2c_core                      86016    5 videodev,drm_kms_helper,nvidia,i2c_piix4,drm
```

I can remove the nvidia_drm, nvidia_modeset, and nvidia modules but it still gives the same error. If I remove and then re-load nvidia module then lsmod again I get:

```

nvidia     19976192 0

i2c_core      86016 5 videodev,drm_kms_helper,nvidia,i2c_piix4,drm

```

but still the same error with startx.

My xorg.conf file is here: http://dpaste.com/290H7GG

But it hasn't changed from what it was when it was working. Same with my .xinitrc file.

I have triple checked my kernel config and it matches up with the NVIDIA drivers page: https://wiki.gentoo.org/wiki/NVIDIA/nvidia-drivers

If I run 

```
glxinfo | grep direct
```

 then I get 

```
Error: unable to open display
```

What else should I try? Any ideas on what might be wrong? Should I try rebuild my kernel again?

----------

## NeddySeagoon

Indi008,

Put your /var/log/Xorg.0.log onto a pastebin site please.

In your xorg.conf the Section "InputDevice" (both of them) will be ignored.

They have been for a very long time now.

Xorg ony uses Driver  "mouse" or Driver "kbd" if you force it to. You don't.

That's just an interesting aside, its not your problem, so leave it just now.

----------

## Banana

Hello Indi008.

What is your kernel and nvdia dirver version?

Which GPU card do you have?

Basically you need the correct kernel config. Build and load it. After that you install the correct drivers and it dependencies, Reboot and everything should work

EDIT: was to slow. NeddySeagoon was faster

----------

## Indi008

Here is /var/log http://dpaste.com/106MZBC

nvidia driver package I have is x11-drivers/nvidia-drivers-440.64

my kernel is linux-5.4.28-gentoo

my GPU is NVIDIA GeForce GTX 1070

 *Quote:*   

> Basically you need the correct kernel config. Build and load it. After that you install the correct drivers and it dependencies, Reboot and everything should work 

 

Since my kernel rebuild seemed to initially work (until update) does that mean that the kernel config is likely fine or could it still be something I have set wrong in the kernel? The kernel has a lot of stuff in it though and I am not sure exactly what I did change between first build and rebuild. Apart from the stuff in the NVIDIA guide I am not sure what to check. I think I saved an old kernel config at some point but also I think I have changed things I want to keep since then so might leave that option for now. Maybe I will try rebuilding the current kernel config though and just stepping though it all one more time.

----------

## Indi008

I messed up.

I ran 

```
grub-mkconfig -o /boot/grub/grub.cfg
```

but I don't have grub I don't think. I think I was just using UEFI. Now I can't boot. It says unable to mount root fs on unknown.

How do I fix this?

----------

## Indi008

Update, managed to boot an old version of the kernel. Still seem to have the nvidia error so it probably wasn't the kernel config causing the issue? (assuming this is using different config, this one says 4.19.97-gentoo). I think I will go to bed and then rebuild a new kernel tomorrow.

----------

## fedeliallalinea

```
[   707.208] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the

[   707.208] (EE) NVIDIA:     system's kernel log for additional error messages and

[   707.208] (EE) NVIDIA:     consult the NVIDIA README for details.
```

What you see in dmesg? I see a similar error and disabling CONFIG_DRM was solution see this howto section

----------

## NeddySeagoon

Indi008,

That Xorg.0.log came from the 4.19.97 kernel.

```
[  2228.103] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module.
```

Isn't very useful on its own.

As the rest of the message says, look in dmesg for more detail.

This usually happens whet there is something it the kernel that the nvdia kernel module needs to be off, or a kernel module to be unloaded.  

 *Quote:*   

> When I ran that I got an error, something about my kernel not being configured ...

 

What happens in that you get a new kernel then run --depclean,

--depclean removes all the old kernel source files in the old kernel, including the Makefile.

The .config is still there.

The nvidia kernel module follows the /usr/src/linux symlink to find a kernel to build against.

I suspect that you did not run 

```
eselect kernel
```

, so 

```
emerge @module-rebuild
```

 rebuilt your out of tree kernel modules against your old kernel.

Which kernel does 

```
eselect kernel list
```

show is active?

----------

## icaruslnx

This is very similar to the issue I'm dealing with. I ran a sync and update, nvidia-drivers was updated from 440.64 to 440.82 and when trying to load the new module the display would freeze and I needed to ssh in to reboot.

The weirdest part (and only clear hint) is what dmesg says (I'm posting from my phone so please excuse typos)

NVRM: API mismatch: the client has the version 440.82, but this kernel module has the version for 440.64. Please make sure that this kernel module and all Nvidia driver components have the same version

What's the difference between the client and kernel version? Why is the kernel still holding onto the old version?

I can verify my kernel is configured properly, uname output matches eselect kernel list.

5.4.14-gentoo

Seeing that Indi008 is on a similar kernel I thought this could be relevant

Edit: also 440.64 is no longer available in the tree

----------

## NeddySeagoon

icaruslnx,

The nvidia-drivers is in two parts.

The kernel part, which is loaded at boot and the Xorg part, which is loaded when Xorg starts.

If you restart Xorg after an nvidia-drivers update you still nave the old kernel part but the new Xorg part.

nvidia-drivers checks versions and won't start unless the versions are identical.

You can achieve the same effect by not building nvidia-drivers against the kernel you thought you were, or not running the kernel you though you were.

Check 

```
uname -a
```

Is that the right kernel version and build date?

Its your running kernel.

Check 

```
eselect kernel list
```

The active kernel is the kernel all out of tree kernel modules will build against.

If it all looks correct, reboot to load the new nvidia kernel module.

----------

## Jaglover

 *Quote:*   

> ... reboot to load the new nvidia kernel module

 

Sic transit gloria mundi. First coronavirus, then worldwide economic downturn and now we suggest on Gentoo forums to reboot to reload some trivial kernel modules.   :Shocked: 

----------

## icaruslnx

LOL @Jaglover   :Laughing: 

Back on track, if I rmmod nvidia and 'systemctl restart gdm' it loads the proper module and works. Rebooting it comes up with the API mismatch, rmmod and restart gdm it loads with a sanity check and works as expected.

```
[   10.544809] [drm:nv_drm_init [nvidia_drm]] *ERROR* [nvidia-drm] Version mismatch: nvidia-modeset.ko(440.64) nvidia-drm.ko(440.82)

[   10.615121] NVRM: API mismatch: the client has the version 440.82, but

               NVRM: this kernel module has the version 440.64.  Please

               NVRM: make sure that this kernel module and all NVIDIA driver

               NVRM: components have the same version.

...

[   33.554278] NVRM: API mismatch: the client has the version 440.82, but

               NVRM: this kernel module has the version 440.64.  Please

               NVRM: make sure that this kernel module and all NVIDIA driver

               NVRM: components have the same version.

[   94.966299] nvidia-modeset: Unloading

[   96.860204] nvidia-nvlink: Unregistered the Nvlink Core, major device number 252

[  121.489887] nvidia-nvlink: Nvlink Core is being initialized, major device number 252

[  121.490372] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem

[  121.536599] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.82  Wed Apr  1 20:04:33 UTC 2020

[  121.743376] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000dffff window]

[  121.743597] caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs

[  122.865344] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  440.82  Wed Apr  1 19:41:29 UTC 2020

```

Before I tested this I did update Gnome (had to since they seem to update Gnome once every 5 years  :Razz:  )

So the driver is fine, it's failing in the boot process from pulling in the wrong module for some reason. Something I'll figure out eventually when I have time, but for now I have a work around as weird as it is. Working from home I seem to have even less free time than before the world melted down

Does any of this help Indi008 at all?

```
lsmod |grep nvidia

nvidia_modeset       1073152  11

nvidia              19947520  545 nvidia_modeset

i2c_core               53248  4 drm_kms_helper,nvidia,i2c_piix4,drm

```

----------

## NeddySeagoon

Jaglover,

:) :)

I want to know what kernel is in use too. I still suspect its not the kernel that the OP thinks it is.

icaruslnx,

When its works, what does 

```
uname -a
```

 show?

----------

## icaruslnx

Kernel is the same as I've been using all month without issues

```
icaruslnx@Daedalus ~ $ uname -a

Linux Daedalus 5.4.14-gentoo #6 SMP PREEMPT Wed Apr 1 09:58:30 CDT 2020 x86_64 AMD FX(tm)-6300 Six-Core Processor AuthenticAMD GNU/Linux

icaruslnx@Daedalus ~ $ sudo eselect kernel list

Available kernel symlink targets:

  [1]   linux-4.19.97-gentoo

  [2]   linux-5.2.17-gentoo

  [3]   linux-5.3.18-gentoo

  [4]   linux-5.4.14-gentoo *

  [5]   linux-5.4.28-gentoo

  [6]   linux-5.5.2-gentoo

```

I do need to clean up my old kernels but I don't have a problem with a random kernel loading at boot

----------

## Indi008

Firstly thanks all for the help so far, I know you don't have to spend time helping random strangers so all help is very much appreciated.

It turns out I did originally set grub up, it was just late and I was panicking because it's the first time I've had a system that wouldn't boot.

Unfortunately it still won't boot. Although I can get on to an old kernel so that's something.

NeddySeagoon you are right, I did not run eselect so that is probably the initial cause of the problem. Given what version the log file was it is probably a good assumption that I was not on the kernel I thought I was.

What happened was I found this page: https://wiki.gentoo.org/wiki/Kernel/Rebuild before I found this page: https://wiki.gentoo.org/wiki/Kernel/Upgrade. I think the former is meant to be more a quick reference check but I followed that one when I should have followed the more detailed one.

I have now booted onto the old kernel (which is probably what I was on initially without realizing) and I have followed the instructions on the kernel upgrade page including updating the bootloader. Then I reboot but I am getting a kernel panic error when trying to boot the new kernel.

I am not sure how to copy that error over to share it? But the last line says:

```
---[ end kernel panic  - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
```

I can still access the old kernel though (just not run startx). I will look into how to get share the error.

Here is dmesg: http://dpaste.com/27K9HRA

It does seem that the NVIDIA module is the new version which explains why I can't run startx. Not sure why new kernel won't boot though. Where do I look to find more detailed boot error? Is there a way to get a paste of the boot error?

----------

## Indi008

Here is some more info

fstab: http://dpaste.com/0RCD42P

lspci - k: http://dpaste.com/2QQXD88

grub.cfg: http://dpaste.com/0RVMPK5

blkid: http://dpaste.com/08KG9N2

So I see in the grub.cfg that when it is setting location for root for the old kernel version it is using the UUID but when it sets it for the new version it is using /dev/nvme0n1p4  which should be fine but I see it says in the fstab that UUID is more reliable so I might try change the fstab to use the UUID.

I also see that the old kernels have a initrd line so maybe I need to generate that for the new one somehow. I am not sure how to do that but I think I should be able to find out with a little be of searching. 

I will give these a go and then report back.

----------

## Indi008

Yay, boots fine and startx works again  :Smile: . Turns out I just needed to regenerate the initramfs to fix boot problem. And make sure to do eselect to fix startx problem. Thanks all very much for the help  :Smile: .

Next upgrade should go much better and I am slowly learning where to find info about errors which is very useful.

----------

