# No X with kernel 4.16.0 patch FIXES it.

## Fred Krogh

Any ideas on where to look?  Thanks,

FredLast edited by Fred Krogh on Sun Apr 08, 2018 9:49 pm; edited 1 time in total

----------

## NeddySeagoon

Fred Krogh,

Pastebin dmesg and /var/log/Xorg.0.log from your broken system.

You might need to save them both to files to do that.

----------

## josedb

There is a problem with the 4.16 and nvidia module. There is no easy fix actually.

----------

## fedeliallalinea

 *josedb wrote:*   

> There is a problem with the 4.16 and nvidia module. There is no easy fix actually.

 

Here the open bug

----------

## FR3141

If you are using the nvidia proprietary blob then you will need this patch:

https://devtalk.nvidia.com/default/topic/1030082/linux/kernel-4-16-rc1-breaks-latest-drivers-unknown-symbol-swiotlb_map_sg_attrs-

This patch works for me.

Just drop it in the /etc/portage/patches/x11-drivers/nvidia-drivers directory and then emerge nvidia-drivers.

----------

## Fred Krogh

Thanks that patch by mlau worked for me with kernel 4.16.1

----------

## causality

I notice in that devtalk.nvidia.com thread it was specifically mentioned that he was not using any spectre/meltdown mitigation.  More on that later.

Without the patch at all, the nvidia-drivers module 390.48 successfully builds against kernel 4.16.1 (!) but upon reboot, starting SDDM fails spectacularly (because Xorg fails because the nvidia driver fails).  It doesn't just fail gracefully.  The text console flickers in a funky strange way that actually interferes with typing on the console.  I could not even log in because keypresses were ignored randomly, though I could see the flickering prompt.  Rebooted with "init=/bin/bash", turned on OpenRC's interactive boot (I see it exists for good reason heh), booted again, declined to run SDDM, and went back to 4.15.14.  Not a big deal but definitely a nuisance.

My /var/log/kern.log has this:

```
Apr  9 11:14:58 causality kernel: nvidia_drm: loading out-of-tree module taints kernel.

Apr  9 11:14:58 causality kernel: nvidia_drm: module license 'MIT' taints kernel.

Apr  9 11:14:58 causality kernel: Disabling lock debugging due to kernel taint

Apr  9 11:14:58 causality kernel: nvidia: Unknown symbol swiotlb_map_sg_attrs (err 0)
```

The strange thing is, that symbol DOES appear in /boot/System.map-4.16.1-gentoo:

```
causality /boot # grep swiotlb_map_sg_attrs System.map-4.16.1-gentoo 

ffffffff813da960 T swiotlb_map_sg_attrs

causality /boot #
```

Back to the Spectre/Meltdown mitigation... I do have both.  That is, CONFIG_RETPOLINE (with GCC 7.3.0) and CONFIG_PAGE_TABLE_ISOLATION are both enabled.  Additionally, I have the CONFIG_HARDENED_USERCOPY set (in the 4.16.1 kernel, I had also enabled the whitelisting with fallback feature).

Were these causing the nvidia module to fail despite the symbol appearing in the system map (suggesting it was available)?  Should I interpret that devtalk.nvidia.com page to be implying that those with spectre/meltdown mitigations should wait for a proper new driver release?

----------

## Fred Krogh

I can confirm that without the patch X would not start for me with kernel 4.16.1.  It just came up with a simple black screen with nothing to do.  I too have tried to protect my system against spectre and meltdown.  My guess is that the next version of the nvidia driver will get these things fixed.

----------

## causality

Thanks for confirmation.  What I'm really wondering is whether I should bother trying my current setup WITH the patch.

I suspect my security mitigations (one of them, or the combination) caused it to fail so spectacularly when done without the patch.  I don't want a situation where the driver loads because it stops complaining about missing symbols (which is all that patch does), then fails for other mysterious reasons (and possibly not immediately).  On a lesser note, it's annoying to have to reboot multiple times considering this machine generally only ever reboots for two reasons:  kernel updates and (rare) power failures.

I note that in the past, when newer kernels broke compatibility, nvidia-drivers simply wouldn't build at all.  Strange that this one does, then fails later, for reasons similar to why past ones didn't build (missing symbols).  That's why I wonder about the security mitigations and what other impact they may have.

----------

## josedb

The patch works for me,i now have the nvidia module loaded, but iam not able to control the lcd brightness anymore (alienware 15 r3) I have tried several configurations and there is no way.

----------

## causality

josedb, I assume you're running kernel 4.16.1?  What version of nvidia-drivers?  I'm using unstable 390.48 (latest available in standard Portage tree).  Can you advise whether you are using any of the following kernel settings:  CONFIG_RETPOLINE (with GCC 7.3.0), CONFIG_PAGE_TABLE_ISOLATION, CONFIG_HARDENED_USERCOPY?

Do you see any errors in your logs (anything relevant under /var/log including Xorg.0.log and kern.log if you have those) or in the output of dmesg?  At bootup?  After trying to adjust the LCD brightness?

More information is needed before there is any hope of sorting this out.

----------

## josedb

 *causality wrote:*   

> josedb, I assume you're running kernel 4.16.1?  What version of nvidia-drivers?  I'm using unstable 390.48 (latest available in standard Portage tree).  Can you advise whether you are using any of the following kernel settings:  CONFIG_RETPOLINE (with GCC 7.3.0), CONFIG_PAGE_TABLE_ISOLATION, CONFIG_HARDENED_USERCOPY?
> 
> Do you see any errors in your logs (anything relevant under /var/log including Xorg.0.log and kern.log if you have those) or in the output of dmesg?  At bootup?  After trying to adjust the LCD brightness?
> 
> More information is needed before there is any hope of sorting this out.

 

nvidia-drivers-390.48

gentoo-sources-4.16.1

dmesg : https://pastebin.com/JeZVPmuS

weird stuff:

```
[    0.009000] [Firmware Bug]: TSC ADJUST: CPU0: -1418891798 force to 0

[    0.001000] [Firmware Bug]: TSC ADJUST differs within socket(s), fixing all errors

[    0.016000]  #2 #3 #4 #5 #6 #7

[    0.396839] [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS

[    0.399102] i915 0000:00:02.0: Direct firmware load for i915/kbl_dmc_ver1_04.bin failed with error -2

[    0.399142] i915 0000:00:02.0: Failed to load DMC firmware i915/kbl_dmc_ver1_04.bin. Disabling runtime power management.

[    0.399187] i915 0000:00:02.0: DMC firmware homepage: https://01.org/linuxgraphics/downloads/firmware

[    0.763326] __power_supply_register: Expected proper parent device for 'test_ac'

[    0.764626] __power_supply_register: Expected proper parent device for 'test_battery'

[    0.765920] __power_supply_register: Expected proper parent device for 'test_usb'

[    0.767256] kworker/u16:8 (1520) used greatest stack depth: 14224 bytes left

[    0.788448] applesmc: supported laptop not found!

[    0.789676] applesmc: driver init failed (ret=-19)!

[    0.839294] pc87360: PC8736x not detected, module not inserted

[    0.912553] resource sanity check: requesting [mem 0xfdffe800-0xfe0007ff], which spans more than pnp 00:07 [mem 0xfdb00000-0xfdffffff]

[    0.913790] caller pmc_core_probe+0x7d/0x1f5 mapping multiple BARs

[    0.944230] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2

[    1.001218] ata4.00: supports DRM functions and may not be fully accessible

[    1.006747] ata4.00: NCQ Send/Recv Log not supported

[    1.023537] ata4.00: supports DRM functions and may not be fully accessible

[    1.026342] ata4.00: NCQ Send/Recv Log not supported

[    1.571939] systemd[1]: File /lib/systemd/system/systemd-journald.service:35 configures an IP firewall (IPAddressDeny=any), but the local system does not support BPF/cgroup based firewalling.

[    1.573550] systemd[1]: Proceeding WITHOUT firewalling in effect! (This warning is only shown for the first loaded unit using IP firewalling.)

[    1.582754] systemd[1]: Configuration file /usr/lib/systemd/system/alienware-kbl.service is marked executable. Please remove executable permission bits. Proceeding anyway.

[    1.743685] nvidia: loading out-of-tree module taints kernel.

[    1.745070] nvidia: module license 'NVIDIA' taints kernel.

[    1.746443] Disabling lock debugging due to kernel taint

[    1.760008] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  390.48  Thu Mar 22 00:42:57 PDT 2018 (using threaded interrupts)

[    1.998779] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20180105/nsarguments-100)

[    2.025333] usb: port power management may be unreliable

[    2.232739] ath10k_pci 0000:3e:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:3e:00.0.bin failed with error -2

[    2.232750] ath10k_pci 0000:3e:00.0: Direct firmware load for ath10k/cal-pci-0000:3e:00.0.bin failed with error -2

[    2.787679] usb 1-4: config 1 interface 0 altsetting 0 has 2 endpoint descriptors, different from the interface descriptor's value: 1

[    2.890360] ath10k_pci 0000:3e:00.0: Unknown eventid: 118809

[    2.893290] ath10k_pci 0000:3e:00.0: Unknown eventid: 90118

[    3.363840] NVRM: Your system is not currently configured to drive a VGA console

               on the primary VGA device. The NVIDIA Linux graphics driver

               requires the use of a text-mode VGA console. Use of other console

               drivers including, but not limited to, vesafb, may result in

               corruption and stability problems, and is not supported.

    3.556976] snd_hda_intel 0000:00:1f.3: Too many HDMI devices

[    3.556977] snd_hda_intel 0000:00:1f.3: Consider building the kernel with CONFIG_SND_DYNAMIC_MINORS=y

[    3.556978] snd_hda_intel 0000:00:1f.3: Too many HDMI devices

[    3.556978] snd_hda_intel 0000:00:1f.3: Consider building the kernel with CONFIG_SND_DYNAMIC_MINORS=y

[    3.556978] snd_hda_intel 0000:00:1f.3: Too many HDMI devices

[    3.556979] snd_hda_intel 0000:00:1f.3: Consider building the kernel with CONFIG_SND_DYNAMIC_MINORS=y

```

modules:

```
Module                  Size  Used by

snd_hda_codec_hdmi     57344  1

nvidia_drm             40960  4

snd_hda_codec_realtek    81920  1

snd_hda_codec_generic    77824  1 snd_hda_codec_realtek

snd_hda_intel          32768  8

snd_hda_codec         118784  4 snd_hda_intel,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek

snd_hwdep              16384  1 snd_hda_codec

snd_hda_core           73728  5 snd_hda_intel,snd_hda_codec,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek

snd_pcm                98304  5 snd_hda_intel,snd_hda_codec,snd_hda_core,snd_hda_codec_hdmi

nvidia_modeset       1073152  8 nvidia_drm

snd_timer              32768  1 snd_pcm

ath10k_pci             61440  0

nvidiafb               49152  0

snd                    77824  23 snd_hda_intel,snd_hwdep,snd_hda_codec,snd_timer,snd_hda_codec_hdmi,snd_hda_codec_generic,snd_hda_codec_realtek,snd_pcm

ath10k_core           397312  1 ath10k_pci

soundcore              16384  1 snd

vgastate               20480  1 nvidiafb

xhci_pci               16384  0

x86_pkg_temp_thermal    16384  0

fb_ddc                 16384  1 nvidiafb

xhci_hcd              192512  1 xhci_pci

nvidia              13836288  552 nvidia_modeset

efivarfs               16384  1
```

CONFIG_RETPOLINE=y

CONFIG_PAGE_TABLE_ISOLATION=y

CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR=y

# CONFIG_HARDENED_USERCOPY is not set

Xorg logs: https://pastebin.com/E3eBP6q5

----------

## causality

For my own system, I decided to fall back to kernel 4.15.14-gentoo until nvidia releases an updated driver designed to work with kernel 4.16.x.  josedb, this is the kind of mysterious failures after successful load that I was concerned about on my own system which is why I have not tried using the new patch.

It appears you have both Intel i915 video hardware and a discrete nvidia card on that system.  I assume this is an nvidia optimus setup?  These are sometimes known to be problematic on their own without using hackish patches to force a module load.  Of particular interest is this line in dmesg:

```
[    0.667601] [drm] failed to retrieve link info, disabling eDP
```

From Wikipedia's page (https://en.wikipedia.org/wiki/DisplayPort#eDP:

 *Quote:*   

> Version 1.4 was released in February 2013; it reduces power consumption with partial-frame updates in PSR mode, regional backlight control, lower interface voltage, and additional link rates; the auxiliary channel supports multi-touch panel data to accommodate different form factors.

 

That may or may not explain the failure to control the backlight.  I honestly don't know enough about how that works on a low level.  But it may be a good place to start looking.

Also there was this line:

```
[    0.394704] fb0: EFI VGA frame buffer device
```

And this later one:

```
[    0.397944] [drm] Replacing VGA console driver
```

The nvidia proprietary driver really doesn't like any kind of VGA console or framebuffer driver.  It wants a plain old text console.  On my own (single video card) desktop system, in the kernel config, the entire sections "Direct Rendering Manager" and "Frame buffer devices" are completely disabled.  The proprietary nvidia driver has its own way of implementing DRM.

See this page for more information, under the "Kernel compatibility" section:  https://wiki.gentoo.org/wiki/NVidia/nvidia-drivers

Perhaps you're using a kernel configured with the "genkernel" tool?  I just manually hand-configure mine, moving the .config file to the new kernel directory and running "make oldconfig".  On a given system, this means you just have to get it right one time and only small tweaks are needed from there.  The advantage is you know exactly what you're getting because you're forced to think about special cases (like nvidia driver's needs) from the start.  Just my opinion.  There are lots of valid ways to do things.

----------

## josedb

 *causality wrote:*   

> For my own system, I decided to fall back to kernel 4.15.14-gentoo until nvidia releases an updated driver designed to work with kernel 4.16.x.  josedb, this is the kind of mysterious failures after successful load that I was concerned about on my own system which is why I have not tried using the new patch.
> 
> It appears you have both Intel i915 video hardware and a discrete nvidia card on that system.  I assume this is an nvidia optimus setup?  These are sometimes known to be problematic on their own without using hackish patches to force a module load.  Of particular interest is this line in dmesg:
> 
> ```
> ...

 

I tried disabling fb drivers with the same results. The information i´ve posted corresponds to my last working config for kernel 4.15

----------

## twalter

Beta driver 396.18 works without further patching required.

----------

## josedb

 *twalter wrote:*   

> Beta driver 396.18 works without further patching required.

 

Superb

----------

## saboya

 *twalter wrote:*   

> Beta driver 396.18 works without further patching required.

 

This driver is beta quality and has a new SPIR-V compiler and some Vulkan changes, currently unstable with lots of regressions in Vulkan. So beware.

----------

## twalter

 *saboya wrote:*   

>  *twalter wrote:*   Beta driver 396.18 works without further patching required. 
> 
> This driver is beta quality and has a new SPIR-V compiler and some Vulkan changes, currently unstable with lots of regressions in Vulkan. So beware.

 

Given that the op initially had no X to work with, that still qualifies as an improvement.   :Wink: 

Running way ahead of stable is the only way to fully implement the Spectre/Meltdown mitigations at the moment.

----------

## papas

did not worked for my GT 610, so i am back to patch file and 390.42 driver.

i am taking this message:

```

***** WARNING *****

You are currently installing a version of nvidia-drivers that is

known not to work with a video card you have installed on your

system. If this is intentional, please ignore this. If it is not

please perform the following steps:

Add the following mask entry to /etc/portage/package.mask by

echo ">=x11-drivers/nvidia-drivers-391.0.0" > /etc/portage/package.mask/nvidia-drivers

Failure to perform the steps above could result in a non-working

X setup.

```

strange...

here http://www.nvidia.com/download/driverResults.aspx/133571/en-us there is a reference about my card

----------

## depontius

 *papas wrote:*   

> did not worked for my GT 610, so i am back to patch file and 390.42 driver.
> 
> 

 

Crud.  I've got a GT 610 in my dedicated Myth frontend system.  I saw on Phoronix that nVidia is dropping support - I thought at the end of this year.  I thought I had time to plan.

In the meantime of course AMD has made uvd usable from the OSS driver.  I bought the GT 610 specifically for vdpau, because it was a low-end passive-cooled card that would be good for showing TV.  I was also hoping that there would be a "legacy drivers" line to keep it running for longer.  So far I haven't found fanless Radeon card, though I haven't looked too hard.

----------

## saboya

Well, nvidia does maintain legacy drivers. It still maintains the 340 series for old cards. Will probably maintain 390 for a while as well.

----------

## twalter

 *papas wrote:*   

> did not worked for my GT 610, so i am back to patch file and 390.42 driver.
> 
> i am taking this message:
> 
> ```
> ...

 

Pretty sure this is a bug.  I've got a GTX 780 that is definitely supported and it printed out the same message.

----------

## twalter

Just had a thought.  Do you have a mobo with a nForce chipset?  It could be complaining about the onboard video (GeForce 8200 coprocessor in my case).

----------

## depontius

 *twalter wrote:*   

>  *papas wrote:*   did not worked for my GT 610, so i am back to patch file and 390.42 driver.
> 
> i am taking this message:
> 
> ```
> ...

 

----------

## papas

 *twalter wrote:*   

> Just had a thought.  Do you have a mobo with a nForce chipset?  It could be complaining about the onboard video (GeForce 8200 coprocessor in my case).

 

my mb is ASUS PRIME Z370-P , i think onboard gpu is intel. 

 *depontius wrote:*   

> Pretty sure this is a bug. I've got a GTX 780 that is definitely supported and it printed out the same message.

 

thank you depontius i was about to buy a new gt 1030...

I think nvidia is confused because here https://us.download.nvidia.com/XFree86/Linux-x86_64/396.18/README/supportedchips.html talks about  gt 610 not supported.. 

i will be waiting a couple of days before buying a new gt1030.

thank you both guys..

P.S does anyone belive that my onboard card is better performance than gt 610 ???

----------

## Cthulhu666

 *twalter wrote:*   

> 
> 
> Pretty sure this is a bug.  I've got a GTX 780 that is definitely supported and it printed out the same message.

 

It is! If you check the eclass, you can see that it contains a section for "drv_390x", but there's nothing for "drv_396x", thus all cards are warned about being incompatible.

----------

## depontius

I had my mythfrontend up this morning for servicing and checked - it's running the 390 driver.  So I presume I'm all set, at least through the end of this year, in terms of continuing support.

----------

## twalter

 *Cthulhu666 wrote:*   

>  *twalter wrote:*   
> 
> Pretty sure this is a bug.  I've got a GTX 780 that is definitely supported and it printed out the same message. 
> 
> It is! If you check the eclass, you can see that it contains a section for "drv_390x", but there's nothing for "drv_396x", thus all cards are warned about being incompatible.

 

Well, that would do it!   :Very Happy: 

I never really looked at where that message was generated even though I noticed it wasn't in the ebuild.   :Embarassed:  Never too old to learn!

----------

