# Recent Hardened-Sources Kernels and Nvidia-Drivers [FYI]

## causality

Hello,

This is one of those nagging problems I kept having until I finally found the solution myself.  It is one of those problems that seems unbelievably uncommon in the sense that it is 100% reproducible, yet no amount of Google searching or forum browsing ever provided any useful information.

Unfortunately this is something for which one must be prepared when using Hardened Gentoo on the desktop and, worse still, with nvidia-drivers.  Yet until this started happening my experience there has been relatively problem-free.  Note that on Hardened Gentoo, the nvidia-drivers are masked by default.  The implication is that users with such hardware are advised to either abandon the extra security of Hardened or to abandon using the full capabilities of their graphics cards.  Neither really appeals to me since I have enjoyed having both for years now.

At any rate, I wanted to document this for others who were dealing with the same issue.  Maybe now someone in the same position can perform a search and actually receive a useful result.

With kernels up to and including hardened-sources-2.6.37-r7 the proprietary nvidia drivers worked flawlessly for me.  When I installed the next available kernel, which was hardened-sources-2.6.38-r6, I had a strange sort of "soft" system lock-up upon rebooting.  It's odd because the system would hang and give me nothing but a totally black screen, with both the keyboard (including CTRL-ALT-DEL) and mouse totally unresponsive, yet it had not fully halted.  I could still briefly tap the ATX power button (i.e. "soft off" handled by the OS) and the system would catch this as an ACPI event and, after a minute or two of the hard drive LED flashing as it was processing init scripts, it would perform an orderly shutdown.

There was no trace of any problem in the usual log files such as /var/log/Xorg.0.log, /var/log/kern.log, /var/log/messages, et al.

Usually when there is a problem related to graphics hardware you get a total system halt.  As in, a power-cycle is your only way out.  There is not usually enough responsiveness left to perform an orderly shutdown because it's totally locked up.

I tried all sorts of settings and such and it got me nowhere.  Then I realized the significance of the way it could still process a soft-off shutdown, that I was overlooking a simple way to diagnose the situation.  So I rebooted yet again.  My system does not use GDM/KDM/XDM.  I log into a text-mode console and I type "startx" for a GUI.  Knowing that "startx" would lead to another strange lock-up, I ran this command:

```
localhost ~ # startx & sleep 5 && dmesg > /root/dmesg_xorg
```

(Normally I would never run X as root but my normal users do not have the ability to use "dmesg" so this was strictly for diagnostics)

When I next rebooted, I read the dmesg output stored in that file.  This section finally explained the problem:

```
PAX: kernel memory leak attempt detected from f5535edb (nv_stack_t) (1 bytes)

Pid: 3609, comm: X Tainted: P            3.0.3-hardened #1

Call Trace:

 [<c10c6af9>] ? pax_report_usercopy+0xc3/0xd2

 [<c10bb00b>] ? check_object_size+0x9b/0xa1

 [<f985c151>] ? os_memcpy_to_user+0x84/0xa9 [nvidia]

 [<f9278a3f>] ? _nv000496rm+0xb/0x31 [nvidia]

 [<f982c519>] ? _nv023287rm+0x11/0x15 [nvidia]

 [<f92aadfd>] ? _nv004161rm+0x9d/0xa7 [nvidia]

 [<f94b8656>] ? _nv021660rm+0x72/0x7c [nvidia]

 [<f92ca2a3>] ? _nv025716rm+0x23/0x30 [nvidia]

 [<f94c097d>] ? _nv008965rm+0xe/0x12 [nvidia]

 [<f94c5148>] ? _nv008964rm+0xba/0x134 [nvidia]

 [<f92c9bd9>] ? _nv025726rm+0xb3/0xc0 [nvidia]

 [<f92c8b93>] ? _nv003783rm+0xdc/0x103 [nvidia]

 [<f92c917f>] ? _nv003780rm+0x29b/0x2be [nvidia]

 [<f92c91c1>] ? _nv002341rm+0x1f/0x23 [nvidia]

 [<f92abe53>] ? _nv002018rm+0x2b/0x4e [nvidia]

 [<f983e900>] ? _nv002426rm+0x654/0x68d [nvidia]

 [<f9838e1f>] ? rm_ioctl+0x3e/0x125 [nvidia]

 [<c10bafd8>] ? check_object_size+0x68/0xa1

 [<f98581d2>] ? nv_kern_ioctl+0x1ca/0x548 [nvidia]

 [<f9858443>] ? nv_kern_ioctl+0x43b/0x548 [nvidia]

 [<f9858583>] ? nv_kern_unlocked_ioctl+0x18/0x1b [nvidia]

 [<c10cfce4>] ? do_vfs_ioctl+0x5e6/0x630

 [<f985856b>] ? nv_kern_compat_ioctl+0x1b/0x1b [nvidia]

 [<f9858583>] ? nv_kern_unlocked_ioctl+0x18/0x1b [nvidia]

 [<c10cfce4>] ? do_vfs_ioctl+0x5e6/0x630

 [<c10cfd62>] ? sys_ioctl+0x34/0x4c

 [<c10cfd62>] ? sys_ioctl+0x34/0x4c

 [<c10cfd62>] ? sys_ioctl+0x34/0x4c

 [<c10cfd62>] ? sys_ioctl+0x34/0x4c

 [<c13c06cc>] ? syscall_call+0x7/0xb

```

A PAX feature complaining about a kernel-related memory leak.  I assume this comes from the need for the nvidia kernel module to communicate with the user-space GLX driver that Xorg uses.  Turns out there is a relevant setting.

It is Security Options -> Pax -> Miscellaneous Hardening Features -> Harden heap object copies between kernel and userland (CONFIG_PAX_USERCOPY)

Disabling that made nvidia-drivers start working again.

Now here's the part that really puzzles me.  This option was available in hardened-sources-2.6.37-r7.  Used in that kernel, it caused no problems of any kind.  Suddenly after this kernel version it becomes a problem.  I think that's why this was so hard for me to diagnose.  I was starting with what I had every reason to believe were known-good settings and it was precisely those settings that suddenly stopped working.

I also think this is why it was a strange sort of lock-up instead of returning me to a prompt: PaX was killing X sometime after it launched, but since X was killed off suddenly it did not have time to return control of the graphics card to a text console, leaving the system in something of a state of limbo.  That's speculation since I don't really know much about the internals of X, but it would explain why it was not a hard totally unresponsive lock-up.

At any rate, I hope this will spare someone a bit of frustration.

----------

## jsevilleja

I don't know how to use nvidia drivers with hardened sources. Can you explain me that, if it's not much trouble? Thanks

----------

## causality

I should say up-front that it is generally recommended not to use that combination -- for that reason the nvidia-drivers are masked by default on Hardened.  

Having said that, this is really a general Portage question.  Lots of packages get masked for all sorts of reasons.  It is nothing specific to Nvidia-drivers or Hardened Gentoo.

You have a directory called /etc/portage containing various text files controlling package-specific Portage settings.  If you do not have a file called /etc/portage/package.unmask you can create it and Portage will refer to it when it has to decide whether it considers a package masked.  To unmask x11-drivers/nvidia-drivers simply add this line (i.e. on a line by itself) to /etc/portage/package.unmask:

```
x11-drivers/nvidia-drivers
```

That line in package.unmask tells Portage to consider nvidia-drivers unmasked.  Because this is a custom setting, it overrides the default of having it masked.  The package called nvidia-settings goes with nvidia-drivers, so you'll need to unmask that as well, on its own separate line in the same package.unmask file.

The very latest nvidia-drivers package is also ~keyworded (that is, marked as "unstable" or "too new to be thoroughly tested").  This is a separate issue from whether it is masked.  I use nvidia-drivers-275.21 so I have to also add an entry to my /etc/portage/package.keywords file:

```
x11-drivers/ndivida-drivers ~*
```

There are related packages that Portage will want to install along with nvidia-drivers.  So I also have entries in package.keywords for eselect and nvidia-settings.

For more information please refer to the command "man portage" and the excellent Gentoo documentation on Portage, some of which can be found at http://www.gentoo.org/doc/en/?catid=gentoo.

----------

## jsevilleja

Thanks. And then I should do a "pax -m" on every file that is linked to libGL.so, isn't it?

----------

