# [SOLVED - sort of] nvidia module causing problems with udev?

## Napalm Llama

Hello! I'm returning to Gentoo after a long absence.  New hardware, fresh install.

I'm getting a weird, intermittent problem, which appears to be related to the nvidia module somehow.  It has three main symptoms:

X doesn't start

When restarting the system, "Stopping udev" takes an unusually long time

The system hangs at "remounting filesystems readonly" and never actually reboots

Either all three of these things happen, or none of them happen and the system functions as expected.  Sometimes I get an additional error at shutdown that the root filesystem can't be remounted readonly because it's in use.

There's nothing different in /var/log/Xorg.0.log when the problem occurs. The reason I think this is related to nvidia is that blacklisting the nvidia module from the kernel cmdline in grub seems to fix the problem (except still no X, of course).  I think there must be some kind of race condition at boot, because it seems arbitrary whether the problem happens or not.  I'm hoping someone can recognise the symptoms and point me in a useful direction...

----------

## alamahant

Maybe an 

```

emerge --info

```

Also there maybe kernel space problems.

----------

## Napalm Llama

I do apologise, like I said it's been a while!

```
Portage 3.0.20 (python 3.9.5-final-0, default/linux/amd64/17.1/desktop/plasma, gcc-10.3.0, glibc-2.33, 5.11.1-gentoo-splig-1 x86_64)

=================================================================

System uname: Linux-5.11.1-gentoo-splig-1-x86_64-AMD_Ryzen_5_5600X_6-Core_Processor-with-glibc2.33

KiB Mem:    16348428 total,  12364364 free

KiB Swap:   33554428 total,  33554428 free

Timestamp of repository gentoo: Sun, 20 Jun 2021 13:30:01 +0000

Head commit of repository gentoo: de45c854109a4e020052dd9bc89990462e11f4e6

sh bash 5.1_p8

ld GNU ld (Gentoo 2.35.2 p1) 2.35.2

app-shells/bash:          5.1_p8::gentoo

dev-lang/perl:            5.32.1::gentoo

dev-lang/python:          2.7.18_p10::gentoo, 3.9.5_p2::gentoo

dev-lang/rust:            1.52.1::gentoo

dev-util/cmake:           3.18.5::gentoo

sys-apps/baselayout:      2.7::gentoo

sys-apps/openrc:          0.42.1-r1::gentoo

sys-apps/sandbox:         2.24::gentoo

sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r5::gentoo

sys-devel/automake:       1.16.3-r1::gentoo

sys-devel/binutils:       2.35.2::gentoo

sys-devel/gcc:            10.3.0::gentoo

sys-devel/gcc-config:     2.4::gentoo

sys-devel/libtool:        2.4.6-r6::gentoo

sys-devel/make:           4.3::gentoo

sys-kernel/linux-headers: 5.11::gentoo (virtual/os-headers)

sys-libs/glibc:           2.33::gentoo

Repositories:

gentoo

    location: /var/db/repos/gentoo

    sync-type: rsync

    sync-uri: rsync://rsync.gentoo.org/gentoo-portage

    priority: -1000

    sync-rsync-verify-metamanifest: yes

    sync-rsync-verify-max-age: 24

    sync-rsync-verify-jobs: 1

    sync-rsync-extra-opts: 

ACCEPT_KEYWORDS="amd64"

ACCEPT_LICENSE="* -@EULA"

CBUILD="x86_64-pc-linux-gnu"

CFLAGS="-march=znver3 -mtune=znver3 -O2 -pipe"

CHOST="x86_64-pc-linux-gnu"

CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt"

CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/sandbox.d /etc/terminfo"

CXXFLAGS="-march=znver3 -mtune=znver3 -O2 -pipe"

DISTDIR="/var/cache/distfiles"

ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"

FCFLAGS="-march=znver3 -mtune=znver3 -O2 -pipe"

FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fail-clean fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch parallel-install pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"

FFLAGS="-march=znver3 -mtune=znver3 -O2 -pipe"

GENTOO_MIRRORS="https://mirror.bytemark.co.uk/gentoo/ http://mirror.bytemark.co.uk/gentoo/ rsync://mirror.bytemark.co.uk/gentoo/ https://www.mirrorservice.org/sites/distfiles.gentoo.org/ http://www.mirrorservice.org/sites/distfiles.gentoo.org/ rsync://rsync.mirrorservice.org/distfiles.gentoo.org/"

LANG="en_GB.utf8"

LDFLAGS="-Wl,-O1 -Wl,--as-needed"

LINGUAS="en en_GB"

MAKEOPTS="-j12"

PKGDIR="/var/cache/binpkgs"

PORTAGE_CONFIGROOT="/"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"

PORTAGE_TMPDIR="/var/tmp"

USE="X a52 aac acl acpi activities alsa amd64 berkdb bluetooth branding bzip2 cairo cdda cdr cli crypt cups dbus declarative dri dts dvd dvdr elogind emboss encode exif flac fortran gdbm gif gles2 gpm gtk gui iconv icu ipv6 jpeg kde kipi kwallet lcms libass libglvnd libnotify libtirpc mad mng mp3 mp4 mpeg multilib ncurses nls nptl ogg opengl openmp pam pango pcre pdf phonon plasma png policykit ppds pulseaudio qml qt5 readline sdl seccomp semantic-desktop spell split-usr ssl startup-notification svg tcpd tiff truetype udev udisks unicode upower usb vorbis wayland widgets wxwidgets x264 xattr xcb xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64 pc" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en en-GB" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-3 php7-4" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_9" PYTHON_TARGETS="python3_9" RUBY_TARGETS="ruby26" USERLAND="GNU" VIDEO_CARDS="nvidia nouveau" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"

Unset:  CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RUSTFLAGS
```

----------

## alamahant

Do you have this as a kernel paeameter

```

nvidia-drm.modeset=1

```

?

Have you already emerged xorg--server and kde?

How is your kernel?

What us dmesg saying

```

dmesg | grep -i nvidia

```

Also you should blacklist nouveau NOT nvidia

Or remove nouveau from

```

VIDEO_CARDS="nvidia nouveau"

```

and rebuild @world

----------

## Napalm Llama

I don't have that kernel parameter.  I'll try it and see if it helps - thanks.

I've emerged everything, and have a fully functional system when it works.  The problem started appearing after I set everything up.

My kernel is very well thankyou, how's yours?   :Very Happy:    Not sure what you're asking me there!

The one thing that changed along with the problem appearing is I moved the system partition from my HDD to my NVMe drive.  I updated fstab of course, and I can't think how it could be relevant - but as the problem appeared almost straight after doing it I thought I should mention it.

----------

## alamahant

I meant is your kernel properly configured?

If you moved your partition

you should update 

/etc/default/grub

rebuild your initrd if any

and run update grub 

in addition to fstab which you already did.

Also is your kernel supprorting nvme?

Is your kernel supporting all file systems

?

This is what i meant "how is your kernel"

 :Smile: 

for starters plz use the above kernel parameter and blacklist nouveau.

I think the nvidia issue is different from your other issues.

----------

## Napalm Llama

Hmm, the nvidia-drm.modeset=1 parameter doesn't seem to help. As far as I know the kernel is correctly configured, although it's obviously in question. It has booted just fine before, though - like I said the problem is intermittent. I just configured and booted the latest 5.12 kernel, but it didn't help.

Regards nvme support, grub, filesystems etc - yes, all present and correct.

The only interesting thing from dmesg is this:

```
nvidia-gpu 0000:07:00.3: i2c timeout error e0000000
```

I tried "time udevadm settle" and it took almost exactly 2 minutes. Is that normal?

----------

## alamahant

```

lsmod | egrep "nvidia|nouveau"

```

What does it say?

 *Quote:*   

> 
> 
> I tried "time udevadm settle" and it took almost exactly 2 minutes. Is that normal?
> 
> 

 

it seems abnormal no?

How did you move the partition from hdd to nvme?

Was it the / partition?

----------

## Napalm Llama

```
nvidia

i2c_nvidia_gpu
```

It does seem abnormal, so I think the problem is likely related to udev somehow. "Stopping udev" at shutdown hangs for 30 seconds. I gave up waiting for "remounting read only" after 7 minutes...

I have everything except EFI on one partition, and I moved it using rsync -aHAXv

----------

## alamahant

 *Quote:*   

> 
> 
> and I moved it using rsync -aHAXv
> 
> 

 

Perfect.So this is NOT an acl perm or xattr issue.

Also your lsmod did not indicate the the presence of nouveau.

This is good because nouveau conflicts with the nvidia-drivers and should be blacklisted.

BUT

my lsmod

```

lsmod | egrep "nvidia|nouveeau"

nvidia_drm             57344  2

nvidia_modeset       1142784  2 nvidia_drm

nvidia              34459648  72 nvidia_modeset

drm_kms_helper        286720  2 nvidia_drm,i915

drm                   577536  13 drm_kms_helper,nvidia_drm,i915

```

seems more detailed than yours...

Also maybe rebuild

```

sys-fs/eudev

```

?

There is alot of nvidia content in 

linux-firmware

Do you have it installed?

I still worry about your kernel lacking .config.Perhaps needlessly.........

What does 

```

lspci -v | grep -i 3d -A30

```

say?

----------

## Napalm Llama

Rebuilding eudev appeared to help, ("udevadm settle" settled immediately), but only because I restarted the udev service after rebuilding. Rebooted and it was back to how it was.

linux-firmware is installed and up to date. I don't need to reinstall it for each new kernel, do I?

lspci | grep -i 3d finds nothing.

modprobe nvidia_drm just hangs...

[edit]

Kernel config - https://pastebin.com/sH1ycuwr

lspci -v - https://pastebin.com/1YeRvxSH

```
# lsmod

Module                  Size  Used by

rpcsec_gss_krb5        32768  0

btusb                  61440  0

btrtl                  28672  1 btusb

btbcm                  20480  1 btusb

btintel                28672  1 btusb

iwlmvm                356352  0

ucsi_ccg               24576  0

nvidia              34344960  1

typec_ucsi             45056  1 ucsi_ccg

mac80211             1064960  1 iwlmvm

intel_rapl_common      28672  0

iosf_mbi               20480  1 intel_rapl_common

crct10dif_pclmul       16384  1

crc32_pclmul           16384  0

iwlwifi               249856  1 iwlmvm

crc32c_intel           24576  0

ghash_clmulni_intel    16384  0

drm_kms_helper        225280  0

syscopyarea            16384  1 drm_kms_helper

sysfillrect            16384  1 drm_kms_helper

sysimgblt              16384  1 drm_kms_helper

fb_sys_fops            16384  1 drm_kms_helper

wmi_bmof               16384  0

cec                    49152  1 drm_kms_helper

cfg80211              864256  3 iwlmvm,iwlwifi,mac80211

drm                   446464  1 drm_kms_helper

drm_panel_orientation_quirks    20480  1 drm

backlight              20480  1 drm

i2c_nvidia_gpu         16384  0

pinctrl_amd            32768  0
```

----------

## Napalm Llama

Hmm, the recent closeout comments on this long running bug suggest the nvidia driver no longer has anything to do with udev.

https://bugs.gentoo.org/454740

Curious then that when udev hangs, the nvidia-drm module won't load, but when udev works correctly nvidia-drm does load... I don't know udev well enough to investigate the connection, but maybe there's some leftover bad config sending udev astray?

Maybe one time in five I boot, and everything just works. Seems to be usually after I've changed something. But then when I reboot, I'm always back where I was...

[edit]

Just noticed one of my udevd threads is pegging one of my CPU cores at 100%!  lsof says it has several kernel modules open, including nvidia.ko.  Connection confirmed?   :Shocked: 

[more edit]

The CPU-eating udev process is unkillable, even with -9.  This would explain:

Why udev-related commands hang

Why I can't rmmod nvidia (the module is in use by the udev process)

Why the system can't remount filesystems readonly on shutdown (the unkillable udev process is still running)

If I boot with the nvidia module blacklisted in the kernel command line, I don't get this behaviour, but of course I can't have X then either so it's not really a solution.

[even more edit]

This person seems to have a related problem, which is also intermittent - ie. usually present, but sometimes not.

This thread might be a dupe of this one.

----------

## hotstoast

I posted in the other thread last week. Wondering if anyone has found a solution for this yet.

I have all the same symptoms. The system will only shutdown if the uptime is ~ a day.

If you get to the bottom of any of this I'd greatly appreciate the solution you found  :Smile: 

----------

## iandoug

Also struggling to set up new box.

Where is this setting?

nvidia-drm.modeset=1 

nvidia-drivers compiled OK until I set things as per https://wiki.gentoo.org/wiki/NVIDIA/nvidia-drivers

and now it complains

```

>>> Emerging (1 of 1) x11-drivers/nvidia-drivers-460.91.03::gentoo

 * NVIDIA-Linux-x86_64-460.91.03.run BLAKE2B SHA512 size ;-) ...                                                                                                                                                                                                                  [ ok ]

 * nvidia-installer-460.91.03.tar.bz2 BLAKE2B SHA512 size ;-) ...                                                                                                                                                                                                                 [ ok ]

 * nvidia-modprobe-460.91.03.tar.bz2 BLAKE2B SHA512 size ;-) ...                                                                                                                                                                                                                  [ ok ]

 * nvidia-persistenced-460.91.03.tar.bz2 BLAKE2B SHA512 size ;-) ...                                                                                                                                                                                                              [ ok ]

 * nvidia-settings-460.91.03.tar.bz2 BLAKE2B SHA512 size ;-) ...                                                                                                                                                                                                                  [ ok ]

 * nvidia-xconfig-460.91.03.tar.bz2 BLAKE2B SHA512 size ;-) ...                                                                                                                                                                                                                   [ ok ]

 * Determining the location of the kernel source code

 * Found kernel source directory:

 *     /usr/src/linux

 * Found sources for kernel version:

 *     5.10.27-gentoo

 * Checking for suitable kernel configuration options...

 *   CONFIG_DRM_KMS_HELPER: is not set but needed for Xorg auto-detection

 *      of drivers (no custom config), and optional nvidia-drm.modeset=1.

 *      Cannot be directly selected in the kernel's menuconfig, so enable

 *      options such as CONFIG_DRM_FBDEV_EMULATION instead.

 * Please check to make sure these options are set correctly.

 * Failure to do so may cause unexpected problems.

>>> Unpacking source...

>>> Unpacking NVIDIA-Linux-x86_64-460.91.03.run to /var/tmp/portage/x11-drivers/nvidia-drivers-460.91.03/work

```

It would help if it said WHERE to set "CONFIG_DRM_FBDEV_EMULATION" ....

Thanks, Ian

----------

## Ionen

^ This is kind of off topic, this thread is about a udev+nvidia issue. It's better to make a new thread than hijack an old one, but I'll answer anyway.

 *iandoug wrote:*   

> It would help if it said WHERE to set "CONFIG_DRM_FBDEV_EMULATION" ....

 Well, if every single ebuild had to explain this I think it'd be a bit much. This is general knowledge that can be gotten elsewhere. This is also only needed for someone configuring their own minimalist kernel given it's default otherwise, typically expect only advanced users will see this message.

For one way, in `make menuconfig` you can press `/` (to search), type CONFIG_DRM_FBDEV.... and there'll be (n) number in:

```
 │ (1)     -> Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) (DRM [=y])
```

Pressing that number will either take you there or to another required option that's needed first.

 *iandoug wrote:*   

> Where is this setting?
> 
> nvidia-drm.modeset=1

 You don't need to worry about this if you don't use it, it's a module option for nvidia-drm that you'd have to set manually at the moment, so you'd know if you set it. That form is something you'd pass to the kernel command line / grub options.

nvidia-drivers-470.xx will set this by default if USE=wayland through /etc/modprobe.d/nvidia.conf (option used to not be in it, but was added a few days ago), but 460.xx doesn't have this USE flag and keeps that disabled for now.

Either way I do recommend to set the requested option, drivers aren't tested much without it and also won't work out of the box (not that it's impossible to use that way, so it's not hard-required).

----------

## iandoug

 *Ionen wrote:*   

> ^ This is kind of off topic, this thread is about a udev+nvidia issue. It's better to make a new thread than hijack an old one, but I'll answer anyway.

 

Thanks, I search before posting and this seemed most relevant ... people having issues with Nvidia on new installs.

 *Ionen wrote:*   

> 
> 
>  *iandoug wrote:*   It would help if it said WHERE to set "CONFIG_DRM_FBDEV_EMULATION" .... Well, if every single ebuild had to explain this I think it'd be a bit much. This is general knowledge that can be gotten elsewhere. This is also only needed for someone configuring their own minimalist kernel given it's default otherwise, typically expect only advanced users will see this message.
> 
> For one way, in `make menuconfig` you can press `/` (to search), type CONFIG_DRM_FBDEV.... and there'll be (n) number in:
> ...

 

https://wiki.gentoo.org/wiki/NVIDIA/nvidia-drivers says to disable that. So I did.

Once upon a time someone said I was having problems because I configured my own kernel. So when I built this box, I thought I would use GenKernel. But Gentoo handbook said no.

"We explain the manual configuration as the default choice here as it is the best way to optimize an environment.".

 *Ionen wrote:*   

> 
> 
> Either way I do recommend to set the requested option, drivers aren't tested much without it and also won't work out of the box (not that it's impossible to use that way, so it's not hard-required).

 

So the Nvidia instruction page is wrong then?

Curious.  :Smile: 

Thanks, Ian

----------

## Ionen

 *iandoug wrote:*   

> So the Nvidia instruction page is wrong then?
> 
> Curious. 

 Ah I didn't see it at first (it's only in the troubleshooting section), yes partially it's wrong.

That page is user-edited, so it's not really curious. I may reconsider having the nvidia ebuild point to that page unless I bother editing it. Lot of this is outdated / unnecessary.

For the most parts nvidia-drivers work out of the box (no need to even make a configuration file like it tells you unless you're using a dual gpu / optimus setup or so).

Edit: I was kind of hoping not to have to (not the best at writing docs / editing wikis and have other things to do), but I've put revamping that page on my TODO (or maybe just remove a few wrong bits at first). The more I look at it, the worse it looks. Much of this seem to date back to >10 years ago and nobody is daring clean it up. Disabling DRM was fine with old drivers for what it's worth, but not now that nvidia has a nvidia-drm module.

----------

## Napalm Llama

For the sake of completeness - I updated a few things, and (fingers crossed) since then the problem has vanished.

Things I updated that might be relevant (giving the new versions that are now working):

x11-drivers/nvidia-drivers-470.63.01

gentoo-sources-5.13.13

sys-fs/eudev-3.2.10-r1

Still no idea what caused the problem, or why it went away unfortunately.  I'll mark this thread as solved though.

----------

