# kernel greater than 5.17.11 makes virutalbox unstable

## jagdpanther

Within the sys-kernel/gentoo-sources-5.17.x kernel branch, when I upgrade my kernel to any version greater than gentoo-sources-5.17.11, virtual box guests (both Linux and Windows guests) become unstable and unusable.  I am running virtualbox-6.1.34. (I tested and found this virtualbox guest failure when using gentoo-sources-5.17.12,  5.17.13 and 5.17.14.  I have not tired 5.18.x as there is a known issue with running virtualbox-6.1.34 with that kernel series.)   I also have this problem on a second Gentoo system.

I describe the VirtualBox guest issue below.

When the issue occurs, there are no additional lines added to the  host system  ~/.config/VirtualBox/VBoxSVC.log.  Also there are no additional entries in  host's /var/log/messages.  (On the guest Linux system I did see some errors in the RedHat VM while doing a tail -f of /var/log/messages via ssh.)  

I ran a diff between the VirtualBox working kernel config file and the non-working config file and there are no differences:

```
/boot]$ diff config-5.17.11-gentoo-2 config-5.17.12-gentoo-3 

3c3

< # Linux/x86 5.17.11-gentoo-2 Kernel Configuration

---

> # Linux/x86 5.17.12-gentoo-3 Kernel Configuration
```

Linux guest issue (no VBox guest additions):

After booting the Oracle-8 (RHEL-8 clone) guest VM  and then logging into the guest, I set the guest screen resolution, which works.  Soon after that, usually after moving a terminal window within the guest VM, the VM window appears to crash to a Linux console VM window for a few seconds then the Oracle-8 login screen re-appears.  (I assume something is happening to the VM's Xwindow server or Wayland.)  I can subsequently  login again to the guest VM and repeat the problem.

Windows 10 guest issue (with VBox guest additions):

I can login to the normal Win10 logon screen which leads to a much larger VM screen (because of a past addition of VBox guest additions and dragging the screen.)  Soon after I launch any Win10 app and move it (or not) Win10 exits back to the Win10 login screen.  Then I can log back into the VM and repeat ...

Any ideas other that remaining at Linux kernel-5.17.11?

(I'll probably post this on virtualbox.org also.)

----------

## Hu

You could use git bisect to find the specific patch that causes the problem, then report the regression.  You could switch to qemu-kvm, which uses a mainline driver and seems to be less susceptible to this kind of breakage.

----------

## jagdpanther

After upgrading to virtualbox-6.1.34-r1  (from virtualbox-6.1.34)  I can run VirtualBox successfully under sys-kernel/gentoo-sources-5.17.15.  I'll check my second Gentoo system, which was having the same issue tomorrow.    

Edit:  Never mind.    Although some guest actions are more stable, Virtualbox-6.1.34-r1 is still  very stable under gentoo-sources -5.17.11  and   still has  some of the same issues when run under gentoo-sources -5.17.15.

----------

## fudge

I've tried using a few different kernel versions on the host and 5.17.11-gentoo is that most recent version that works properly with a Gentoo guest installed.  Versions 5.17.12-gentoo and newer cause problems.

----------

## Hu

fudge: can you git bisect to find which change between 5.17.11 and 5.17.12 is responsible?  Identifying that change is the first step to reporting the problem to someone who can fix it.

----------

## fudge

Never done anything like this before.  I'll use https://wiki.gentoo.org/wiki/Kernel_git-bisect as a guide and see what I come up with.

----------

## fudge

That took a while and a personal event intervened.  Here's the result and how I went about it.

I used https://wiki.gentoo.org/wiki/Kernel_git-bisect as a guide.  In the Gentoo guest, my test was to build llvm.  It went bad, bad, bad, bad, good.

I checked out the kernel as follows:

```
# git clone --shallow-exclude v5.17 --branch v5.17.12 git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
```

Here's the bisect log:

```
Bisecting: 55 revisions left to test after this (roughly 6 steps)

[d82e9eac3aae49e6a34e2d4ccaf39c259b2fe3be] random: skip fast_init if hwrng provides large chunk of entropy

Bisecting: 27 revisions left to test after this (roughly 5 steps)

[fc8ce099962615ddba4642e89b84fcf3c0564871] random: introduce drain_entropy() helper to declutter crng_reseed()

Bisecting: 13 revisions left to test after this (roughly 4 steps)

[6057a5d6a3b71451518022285bf8f82ddeb75990] random: ensure early RDSEED goes through mixer on init

Bisecting: 6 revisions left to test after this (roughly 3 steps)

[4fa0d8ed5c4584a66198152a8b78f4d24e7f4df1] random: make credit_entropy_bits() always safe

Bisecting: 3 revisions left to test after this (roughly 2 steps)

[efba5eb2281ec51a295a9571b0ff73466272430d] random: use computational hash for entropy extraction

Bisecting: 0 revisions left to test after this (roughly 1 step)

[19a66796d1f0dd4ce4b05f76d53ce1d0a7dc817d] KVM: x86/mmu: fix NULL pointer dereference on guest INVPCID

efba5eb2281ec51a295a9571b0ff73466272430d is the first bad commit

commit efba5eb2281ec51a295a9571b0ff73466272430d

Author: Jason A. Donenfeld <Jason@zx2c4.com>

Date:   Sun Jan 16 14:23:10 2022 +0100

    random: use computational hash for entropy extraction

    

    commit 6e8ec2552c7d13991148e551e3325a624d73fac6 upstream.

    

    The current 4096-bit LFSR used for entropy collection had a few

    desirable attributes for the context in which it was created. For

    example, the state was huge, which meant that /dev/random would be able

    to output quite a bit of accumulated entropy before blocking. It was

    also, in its time, quite fast at accumulating entropy byte-by-byte,

    which matters given the varying contexts in which mix_pool_bytes() is

    called. And its diffusion was relatively high, which meant that changes

    would ripple across several words of state rather quickly.

    

    However, it also suffers from a few security vulnerabilities. In

    particular, inputs learned by an attacker can be undone, but moreover,

    if the state of the pool leaks, its contents can be controlled and

    entirely zeroed out. I've demonstrated this attack with this SMT2

    script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in

    a matter of seconds on a single core of my laptop, resulting in little

    proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.

    

    For basically all recent formal models of RNGs, these attacks represent

    a significant cryptographic flaw. But how does this manifest

    practically? If an attacker has access to the system to such a degree

    that he can learn the internal state of the RNG, arguably there are

    other lower hanging vulnerabilities -- side-channel, infoleak, or

    otherwise -- that might have higher priority. On the other hand, seed

    files are frequently used on systems that have a hard time generating

    much entropy on their own, and these seed files, being files, often leak

    or are duplicated and distributed accidentally, or are even seeded over

    the Internet intentionally, where their contents might be recorded or

    tampered with. Seen this way, an otherwise quasi-implausible

    vulnerability is a bit more practical than initially thought.

    

    Another aspect of the current mix_pool_bytes() function is that, while

    its performance was arguably competitive for the time in which it was

    created, it's no longer considered so. This patch improves performance

    significantly: on a high-end CPU, an i7-11850H, it improves performance

    of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it

    improves performance by 103%.

    

    This commit replaces the LFSR of mix_pool_bytes() with a straight-

    forward cryptographic hash function, BLAKE2s, which is already in use

    for pool extraction. Universal hashing with a secret seed was considered

    too, something along the lines of <https://eprint.iacr.org/2013/338>,

    but the requirement for a secret seed makes for a chicken & egg problem.

    Instead we go with a formally proven scheme using a computational hash

    function, described in sections 5.1, 6.4, and B.1.8 of

    <https://eprint.iacr.org/2019/198>.

    

    BLAKE2s outputs 256 bits, which should give us an appropriate amount of

    min-entropy accumulation, and a wide enough margin of collision

    resistance against active attacks. mix_pool_bytes() becomes a simple

    call to blake2s_update(), for accumulation, while the extraction step

    becomes a blake2s_final() to generate a seed, with which we can then do

    a HKDF-like or BLAKE2X-like expansion, the first part of which we fold

    back as an init key for subsequent blake2s_update()s, and the rest we

    produce to the caller. This then is provided to our CRNG like usual. In

    that expansion step, we make opportunistic use of 32 bytes of RDRAND

    output, just as before. We also always reseed the crng with 32 bytes,

    unconditionally, or not at all, rather than sometimes with 16 as before,

    as we don't win anything by limiting beyond the 16 byte threshold.

    

    Going for a hash function as an entropy collector is a conservative,

    proven approach. The result of all this is a much simpler and much less

    bespoke construction than what's there now, which not only plugs a

    vulnerability but also improves performance considerably.

    

    Cc: Theodore Ts'o <tytso@mit.edu>

    Cc: Dominik Brodowski <linux@dominikbrodowski.net>

    Reviewed-by: Eric Biggers <ebiggers@google.com>

    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

    Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>

    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 drivers/char/random.c | 304 +++++++++-----------------------------------------

 1 file changed, 55 insertions(+), 249 deletions(-)

```

----------

## mortonP

As another data point, Arch forum also reports instabilities on latest kernels: https://bbs.archlinux.org/viewtopic.php?id=276883

Myself, I booted Win10 today (on VB 6.1.34-r1) and it is also unstable, after some time it crashes with:

kernel: SUPR0GipMap: fGetGipCpu=0x1b

kernel: vboxdrv: 00000000a96976a3 VMMR0.r0

kernel: vboxdrv: 000000009759b5ca VBoxDDR0.r0

kernel: VMMR0InitVM: eflags=246 fKernelFeatures=0x0 (SUPKERNELFEATURES_SMAP=0)

I'm on 5.15.49  and only recently upgraded from an earlier 5.15.x - need to figure out which was previous kernel...

----------

## mortonP

update: after several tries/crashes Win10 now seems to run stable.

Maybe the initial "check for updates" and whatever background things run after booting Win10 after a long time triggered it - but now that these processes are done it runs stable again? Hmmmm....

----------

## mortonP

Today upgraded virtualbox-6.1.34-r1 -> virtualbox-6.1.34-r4

Now at every start of Win10 VM the VM immediately crashes.

Downgrade to virtualbox-6.1.34-r1.

Works again.

.... weird?

----------

## jagdpanther

I am still using gentoo-sources-5.17.11 as kernels greater than this have VirtualBox issues.  (This is a know issue according to posts on the virtualbox forums and should be resolved with the next virtualbox release.)

Using kernel  gentoo-sources-5.17.11 I tried upgrading from virtualbox-6.1.34-r1 to 6.1.34-r5.  Both my Linux and Windows10 VMs  failed to start and gave errors in a pop-up window similar to:

```
The configuration constructor in main failed due to a COM error. Check the release log of the VM for further details. (VERR_MAIN_CONFIG_CONSTRUCTOR_COM_ERROR).

Result Code: 

NS_ERROR_FAILURE (0x80004005)

Component: 

ConsoleWrap

Interface: 

IConsole {872da645-4a9b-1727-bee2-...
```

Reverting to virtualbox-6.1.34-r1 solved the issue and the VM guests are working again.

----------

## jagdpanther

Virtualbox-6.1.34-r6 DOES work with gentoo-sources-5.17.11.  (tested win10 VM)

Virtualbox-modules-6.1.34 will NOT compile if you try to use gentoo-sources-5.18.x (still waiting on new, production, not testing version of VirtualBox to fix this.  Probably Virtualbox-6.1.36.)

I did NOT try Virtualbox-6.1.34-r6 with gentoo-sources-5.17.15  (the final 5.17.x)

----------

## fudge

It seems like that patch in the kernel has caused a problem with Virtualbox and that it's known.

https://www.virtualbox.org/ticket/20914

----------

## devsk

Anyone has any idea on when 6.1.36 comes out? I don't see a tracker for it in bugs.gentoo.org

----------

## jagdpanther

 *Quote:*   

> Anyone has any idea on when 6.1.36 comes out? 

 

Virtualbox v6.1.36 was released on 19 July 2022.  

https://www.virtualbox.org/

The gentoo Portage version virtualbox-6.1.36 does not seem to be available yet.

----------

## devsk

yeah, that's what I meant by no tracker on bugs.gentoos.org

----------

## devsk

Virtualbox v6.1.36 is now available in the portage

----------

## jagdpanther

virtualbox-6.1.36 solves the issue with Linux kernels greater than 5.17.11.

I am currently running gentoo-sources-5.18.14 and both Linux and Windows VMs running in virtualbox-6.1.36 work without issue.

----------

