# BUG message during boot - Kernel or Nvidia?

## Holysword

Hello there,

Recently I've been experiencing weird issues during boot process. The OpenRC goes on normally until it reaches the "initializing uevents" thing, and then it glitches horribly, the screen goes black and then it comes back without colors and on the same format as dmesg, rather than the OpenRC startup standard. Then after a few seconds it starts the OpenRC again and finishes initializing everything.

Then it gives me the username/password screen (I don't have a login manager) but while I am trying to type the username and the password, it keeps overriding the screen with information from ilwifi module! After few seconds it stops, and then I can log in. So I log in and try to start X - only twm, nothing else. It shows a black screen with the cursor on the top left corner (just the "_") still and that's it; crashes, I cannot switch TTY, I cannot do anything other than hard-rebooting. When I check Xorg.0.log, it is empty - completely blank. Now, the worst part - all of this is random. It happens sometimes, and then only way to actually get a working computer is to keep restarting until it does not happen. It only works when the first glitch (after initializing uevents part) does not happen. I found this in my dmesg log, dunno if it gives any clues:

```
[    9.377864] systemd-udevd[1118]: renamed network interface wlan0 to wlo1

[    9.377866] BUG: unable to handle kernel NULL pointer dereference at           (null)

[    9.377870] IP: [<ffffffff8152aa19>] __down+0x3c/0x93

[    9.377871] PGD 446694067 PUD 446770067 PMD 0 

[    9.377873] Oops: 0002 [#1] PREEMPT SMP 

[    9.377886] Modules linked in: nvidia(PO+) iwldvm mac80211 i915(+) uvcvideo btusb hp_accel videobuf2_vmalloc lis3lv02d videobuf2_memops intel_agp hid_ortek videobuf2_core videodev media input_polldev video snd_hda_codec_idt snd_hda_intel bluetooth iwlwifi psmouse r8169 mii intel_gtt snd_hda_codec thermal ac fan cfg80211 rfkill x86_pkg_temp_thermal battery coretemp processor snd_hwdep snd_pcm wmi snd_page_alloc snd_timer snd soundcore i2c_i801 drm_kms_helper button lpc_ich mfd_core thermal_sys hwmon efivarfs

[    9.377888] CPU: 0 PID: 1169 Comm: nvidia-smi Tainted: P          IO 3.13.6-gentoo #1

[    9.377888] Hardware name: Hewlett-Packard HP ENVY TS m7 Notebook PC/1966, BIOS F.1C 06/07/2013

[    9.377889] task: ffff88044a5140d0 ti: ffff8804474ce000 task.ti: ffff8804474ce000

[    9.377891] RIP: 0010:[<ffffffff8152aa19>]  [<ffffffff8152aa19>] __down+0x3c/0x93

[    9.377892] RSP: 0018:ffff8804474cfbd0  EFLAGS: 00010082

[    9.377893] RAX: 0000000000000000 RBX: 7fffffffffffffff RCX: 0000000000000000

[    9.377893] RDX: ffffffffa0fab6a0 RSI: ffffffffa0d35cb5 RDI: ffffffffa0fab698

[    9.377894] RBP: ffff8804474cfc10 R08: 0000000000000000 R09: 0000000000000000

[    9.377894] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffa0fab698

[    9.377894] R13: ffff88044a5140d0 R14: ffff880445fab2f0 R15: 00000000000000ff

[    9.377895] FS:  00007f76e2b63700(0000) GS:ffff88045f200000(0000) knlGS:0000000000000000

[    9.377896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[    9.377897] CR2: 0000000000000000 CR3: 0000000445fad000 CR4: 00000000001407f0

[    9.377897] Stack:

[    9.377898]  ffffffffa0fab6a0 0000000000000000 00000000000000d0 0000000000000246

[    9.377900]  ffff880449dc6a80 ffffffffa0fab698 ffff88044640c088 ffff880449fef480

[    9.377901]  ffff880447469980 ffffffff81073ad6 0000000000000282 ffff880447469980

[    9.377901] Call Trace:

[    9.377904]  [<ffffffff81073ad6>] ? down+0x36/0x40

[    9.377957]  [<ffffffffa0b70c53>] ? nvidia_open+0x453/0x920 [nvidia]

[    9.377959]  [<ffffffff8139ed8a>] ? kobj_lookup+0x10a/0x170

[    9.378004]  [<ffffffffa0b7ac4f>] ? nvidia_frontend_open+0x3f/0x90 [nvidia]

[    9.378006]  [<ffffffff8110f196>] ? chrdev_open+0x96/0x1c0

[    9.378008]  [<ffffffff8110f100>] ? cdev_put+0x30/0x30

[    9.378010]  [<ffffffff81109092>] ? do_dentry_open+0x1a2/0x2a0

[    9.378011]  [<ffffffff811095e8>] ? finish_open+0x28/0x40

[    9.378013]  [<ffffffff81117f33>] ? do_last.isra.52+0x4a3/0xc70

[    9.378014]  [<ffffffff811187b2>] ? path_openat+0xb2/0x660

[    9.378016]  [<ffffffff810cb8bf>] ? shmem_xattr_validate+0x8f/0xd0

[    9.378017]  [<ffffffff81119d15>] ? do_filp_open+0x35/0x80

[    9.378019]  [<ffffffff81109553>] ? chown_common.isra.15+0x83/0xf0

[    9.378021]  [<ffffffff8152b9be>] ? _raw_spin_lock+0xe/0x40

[    9.378022]  [<ffffffff8152ba8e>] ? _raw_spin_unlock+0xe/0x30

[    9.378024]  [<ffffffff81125847>] ? __alloc_fd+0x97/0x120

[    9.378026]  [<ffffffff8110a606>] ? do_sys_open+0x126/0x210

[    9.378027]  [<ffffffff8152c926>] ? system_call_fastpath+0x1a/0x1f

[    9.378038] Code: bb ff ff ff ff ff ff ff 7f 48 83 e4 f0 48 83 ec 20 48 8b 47 10 48 89 14 24 65 4c 8b 2c 25 80 b8 00 00 48 89 67 10 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 c6 44 24 18 00 4c 89 e7 49 c7 45 00 02 

[    9.378039] RIP  [<ffffffff8152aa19>] __down+0x3c/0x93

[    9.378040]  RSP <ffff8804474cfbd0>

[    9.378040] CR2: 0000000000000000

[    9.378041] ---[ end trace 8422aa58f6fd8a32 ]---

[    9.378043] note: nvidia-smi[1169] exited with preempt_count 1

[    9.767193] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off

[    9.793133] Console: switching to colour frame buffer device 240x67

```

What could that be? It smells like a problem with my nvidia driver or perhaps some options in the kernel. I have been messing aroudn with some options to try to get my bluetooth headset to work properly, I might have screwed up things there...?

My full dmesg log

My .config file

Thank you in advance

----------

## Hu

Based on the callstack, that looks like a bug in the nVidia driver.  Can you reproduce the problem on an untainted kernel?

----------

## Holysword

 *Hu wrote:*   

> Based on the callstack, that looks like a bug in the nVidia driver.  Can you reproduce the problem on an untainted kernel?

 

What do you mean with untainted kernel? I can try it if you give me a link with instructions. 

Thanks.

----------

## Hu

A kernel becomes tainted when it loads any out-of-tree or proprietary modules, such as nvidia.ko.  To use an untainted kernel, reboot and do not load any modules which would taint it.

----------

## mir3x

U wrote OpenRC N-times ... and then systemd jumps on logs ?? Maybe thats a problem

([    9.377864] systemd-udevd[1118]: renamed network interface wlan0 to wlo1)

----------

## Ant P.

 *mir3x wrote:*   

> U wrote OpenRC N-times ... and then systemd jumps on logs ?? Maybe thats a problem
> 
> ([    9.377864] systemd-udevd[1118]: renamed network interface wlan0 to wlo1)

 

Your read buffer seems to have been truncated 6 chars early... try again.

----------

## Hu

 *mir3x wrote:*   

> U wrote OpenRC N-times ... and then systemd jumps on logs ?? Maybe thats a problem

 Although systemd has and causes many problems, it is irrelevant here.  OP's bug stack clearly shows that an nVidia related configuration utility ran, attempted to access a character device managed by the nVidia proprietary driver, and the handler for that device then triggered a BUG event.  The only way I can see to blame this on systemd is if the nVidia driver reacts badly to acquiring resources while systemd is busy renaming network interfaces.  This seems unlikely.

----------

## Holysword

 *Hu wrote:*   

> A kernel becomes tainted when it loads any out-of-tree or proprietary modules, such as nvidia.ko.  To use an untainted kernel, reboot and do not load any modules which would taint it.

 

It is a bit hard to tell since it was random; it would happen very often, but not always. Anyway, I uninstalled nvidia-drivers-337.12 and rebooted several times, and it did not occur once. Then I installed version 334.21-r3 and rebooted some few times, also did not occur once. I do believe it is a problem with 337 driver then...

----------

## wvmmhxkh

same thing here, both on 337.12 and .19

```

[    1.344957] nvidia: module license 'NVIDIA' taints kernel.

[    1.344958] Disabling lock debugging due to kernel taint

[    1.346458] hub 2-1:1.0: port 5 not reset yet, waiting 10ms

[    1.357727] BUG: unable to handle kernel NULL pointer dereference at           (null)

[    1.362264] IP: [<ffffffff814478e7>] __down_common+0x4e/0xe9

[    1.366632] PGD 2140e5067 PUD 2140e4067 PMD 0 

[    1.371013] Oops: 0002 [#1] SMP 

[    1.375360] Modules linked in: nvidia(PO+) snd_hda_intel(+) snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd

[    1.380027] CPU: 0 PID: 1096 Comm: nvidia-smi Tainted: P           O 3.12.13-gentoo #6

[    1.384733] Hardware name: System manufacturer System Product Name/P8Z77-V LX, BIOS 2303 12/05/2013

[    1.389509] task: ffff880215b42ae0 ti: ffff88021406e000 task.ti: ffff88021406e000

[    1.394282] RIP: 0010:[<ffffffff814478e7>]  [<ffffffff814478e7>] __down_common+0x4e/0xe9

[    1.399122] RSP: 0018:ffff88021406fb88  EFLAGS: 00010096

[    1.403916] RAX: 0000000000000000 RBX: ffffffffa0a54fc0 RCX: 0000000000000000

[    1.408714] RDX: ffff88021406fb88 RSI: 0000000000000002 RDI: ffffffffa0a54fc0

[    1.409526] usb 2-1.5: new low-speed USB device number 3 using ehci-pci

[    1.418360] RBP: 7fffffffffffffff R08: 0000000000018860 R09: 0000000000000000

[    1.420537] hub 2-1:1.0: port 5 not reset yet, waiting 10ms

[    1.428250] R10: 000000000000000b R11: ffffffffffffffd6 R12: ffff880215b42ae0

[    1.433248] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000

[    1.438204] FS:  00007fee5297c700(0000) GS:ffff88021ec00000(0000) knlGS:0000000000000000

[    1.443216] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[    1.448174] CR2: 0000000000000000 CR3: 000000021591c000 CR4: 00000000001407f0

[    1.453133] Stack:

[    1.457968]  ffffffffa0a54fc8 0000000000000000 ffff88021ec14560 0000000000000001

[    1.463030]  0000000000000000 ffffffffa0a54fc0 ffff8800d9423100 ffff880213908000

[    1.468099]  ffff8800d97237a0 ffff8800d9423300 00000000000000ff ffffffff81063337

[    1.473133] Call Trace:

[    1.478125]  [<ffffffff81063337>] ? down+0x37/0x40

[    1.483138]  [<ffffffffa0661b73>] ? nvidia_open+0x563/0x8e0 [nvidia]

[    1.488118]  [<ffffffff810ef1dc>] ? exact_lock+0xc/0x20

[    1.493057]  [<ffffffff81296f82>] ? kobj_lookup+0x102/0x150

[    1.496766] usb 2-1.5: skipped 1 descriptor after interface

[    1.497136] usb 2-1.5: default language 0x0409

[    1.498616] usb 2-1.5: udev 3, busnum 2, minor = 130

[    1.498617] usb 2-1.5: New USB device found, idVendor=0458, idProduct=003a

[    1.498617] usb 2-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0

[    1.498618] usb 2-1.5: Product: Optical Mouse

[    1.498618] usb 2-1.5: Manufacturer: Genius

[    1.498661] usb 2-1.5: usb_probe_device

[    1.498662] usb 2-1.5: configuration #1 chosen from 1 choice

[    1.499997] usb 2-1.5: adding 2-1.5:1.0 (config #1, interface 0)

[    1.500010] usbhid 2-1.5:1.0: usb_probe_interface

[    1.500011] usbhid 2-1.5:1.0: usb_probe_interface - got id

[    1.501958] input: Genius Optical Mouse as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.5/2-1.5:1.0/input/input1

[    1.502005] hid-generic 0003:0458:003A.0001: input,hidraw0: USB HID v1.10 Mouse [Genius Optical Mouse] on usb-0000:00:1d.0-1.5/input0

[    1.502018] hub 2-1:1.0: state 7 ports 8 chg 0000 evt 0020

[    1.574407]  [<ffffffffa066bc7d>] ? nvidia_frontend_open+0x4d/0xa0 [nvidia]

[    1.579905]  [<ffffffff810efa25>] ? chrdev_open+0x95/0x1a0

[    1.585390]  [<ffffffff810ef990>] ? cdev_put+0x30/0x30

[    1.590825]  [<ffffffff810e92fe>] ? do_dentry_open.isra.16+0x1ee/0x280

[    1.596276]  [<ffffffff810e93a5>] ? finish_open+0x15/0x20

[    1.601661]  [<ffffffff810f9ae1>] ? do_last.isra.72+0x7c1/0xd40

[    1.607002]  [<ffffffff810f6878>] ? link_path_walk+0x68/0x830

[    1.612284]  [<ffffffff810fa12c>] ? path_openat+0xcc/0x5f0

[    1.617524]  [<ffffffff81104e3c>] ? inode_change_ok+0x8c/0x180

[    1.622773]  [<ffffffff810faae5>] ? do_filp_open+0x45/0xb0

[    1.627882]  [<ffffffff811060f2>] ? __alloc_fd+0x42/0x110

[    1.632845]  [<ffffffff810ea700>] ? do_sys_open+0x140/0x230

[    1.637712]  [<ffffffff8144a762>] ? system_call_fastpath+0x16/0x1b

[    1.642496] Code: fb 48 83 ec 28 48 8b 47 10 48 8d 14 24 48 89 57 10 48 8d 57 08 48 89 14 24 48 8d 14 24 65 4c 8b 24 25 40 b8 00 00 48 89 44 24 08 <48> 89 10 4c 89 64 24 10 c6 44 24 18 00 4d 85 f6 74 5e 49 8b 44 

[    1.652769] RIP  [<ffffffff814478e7>] __down_common+0x4e/0xe9

[    1.657732]  RSP <ffff88021406fb88>

[    1.662633] CR2: 0000000000000000

[    1.667443] ---[ end trace a5b398f35f50896a ]---

[    1.916166] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none

[    2.143544] Switched to clocksource tsc

[   30.794085] EXT4-fs (sda2): re-mounted. Opts: discard

[   30.893166] EXT4-fs (sda1): mounted filesystem without journal. Opts: discard

[   30.931112] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)

[   30.960667] EXT4-fs (sdc1): mounted filesystem with ordered data mode. Opts: (null)

```

----------

## glv

I have the same problem with version 337.12 and 337.19 of nvidia-drivers (kernel 3.14.1).

It seems to be caused by the nvidia-smi program called by the udev rule 99-nvidia.rules.

There's a workaround this bug (I found it there: https://bugs.gentoo.org/show_bug.cgi?id=504326).

The idea is to supersede the udev rule and not call nvidia-smi.

- Copy /lib/udev/rules.d/99-nvidia.rules to /etc/udev/rules.d/99-nvidia.rules

- Edit /etc/udev/rules.d/99-nvidia.rules and comment the first line:

#ACTION=="add", DEVPATH=="/module/nvidia", SUBSYSTEM=="module", RUN+="nvidia-udev.sh $env{ACTION}"

- Reboot and the kernel OOPS should not appear anymore

----------

## Holysword

 *glv wrote:*   

> I have the same problem with version 337.12 and 337.19 of nvidia-drivers (kernel 3.14.1).
> 
> It seems to be caused by the nvidia-smi program called by the udev rule 99-nvidia.rules.
> 
> There's a workaround this bug (I found it there: https://bugs.gentoo.org/show_bug.cgi?id=504326).
> ...

 

Thank you for letting us know! I will test as soon as possible, but for the moment I simply downgraded nvidia driver. Is there anything outstanding with the newest version?

----------

## StevePER

I just started getting the same problem after upgrading to kernel 3.12.21-gentoo-r1 and nvidia-drivers 337-25. However the workaround isn't working. Downgrading to 334.21-r3 fixes it.

----------

## poolshrk

 *glv wrote:*   

> I have the same problem with version 337.12 and 337.19 of nvidia-drivers (kernel 3.14.1).
> 
> It seems to be caused by the nvidia-smi program called by the udev rule 99-nvidia.rules.
> 
> There's a workaround this bug (I found it there: https://bugs.gentoo.org/show_bug.cgi?id=504326).
> ...

 

This works for me, thanks!

kernel 3.12.22 

nvidia-drivers 340.17

----------

## hampelratte

I ran into the same problem with kernel 3.12.21 and nvidia-drivers above 334.21-r3. I'm wondering, if this is caused by a certain setup (hardware / software), since only a few users seem to be affected. I'm not willing to use the workaround from the bug report, because I tend to forget about such things and then they may cause problems later. So, for now I downgraded to 334 and masked all newer versions of nvidia-drivers.

If anybody of you successfully tests a newer version, let us know, so that we can get rid of the workaround / package mask.

BR

Henrik

----------

