# Help understanding performance drop with kernel upgrade [s]

## Letharion

I'm running 5.7.19-gentoo right now, and everything works well.

When I tried upgrading to either 5.8.12 or 5.9.1, my performance turns really terrible across the system.

Mouse pointer feels janky, applications start slowly, rendering in games is slow, I see a lot of "wait" in top. Pretty much everything turns to molasses.

I haven't experienced this before, where everything seems to be impacted, so I'm not sure where to even begin investigating.

I moved back to 5.7.19 again and I'm now compiling 5.8.16 to try the lastest 5.8, but since the problem is also in 5.9 I don't think it's going to help much.

Maybe I can bisect the kernel, but from where would I clone the source?

Perhaps the problem isn't really in the kernel, but the upgrade triggers a problem elsewhere?Last edited by Letharion on Thu Oct 22, 2020 3:17 pm; edited 1 time in total

----------

## Ionen

Don't know if the same here but heard a similar story recently... turns out they had forgotten to run emerge @module-rebuild after updating their kernel and were missing nvidia modules for the newer kernels.

----------

## Letharion

 *Ionen wrote:*   

> Don't know if the same here but heard a similar story recently... turns out they had forgotten to run emerge @module-rebuild after updating their kernel and were missing nvidia modules for the newer kernels.

 

I suppose that could be the issue, but I would expect to be dumped out of X if that was case, as X would fail to start. As it is now I get can start a game and get 20 FPS where I normally expect to get 100+.

Having said that, here's how I normallly do a kernel upgrade, not sure if that sheds any light on the situation:

```
mount /boot/; cd /usr/src/linux; cp -i /usr/src/linux-$(awk '{ print $3 }' /proc/version)/.config .; make oldconfig; make && make install && make modules_install && grub-mkconfig -o /boot/grub/grub.cfg; emerge -q @module-rebuild
```

----------

## Ionen

Nowadays you have libglvnd that will let things still work, but give you software rendering instead that's unsurprisingly slow.

Can just check if your modules are loaded anyway, maybe something went wrong. Also output of glxinfo | head which normally should mention NVIDIA if that's what you use.

----------

## Letharion

I'm now using 5.8.16 and the problem is still there.

 *Quote:*   

> Nowadays you have libglvnd that will let things still work, but give you software rendering instead that's unsurprisingly slow. 

 

I saw libglvnd show up as a flag recently, but didn't realize this would happen. Doesn't look like that's the case here though.

```
$ lsmod

Module                  Size  Used by

nvidia_drm             49152  17

nvidia_modeset       1167360  35 nvidia_drm

vboxnetadp             28672  0

vboxnetflt             32768  0

vboxdrv               421888  2 vboxnetadp,vboxnetflt

x86_pkg_temp_thermal    20480  0

snd_usb_audio         253952  4

snd_usbmidi_lib        28672  1 snd_usb_audio

snd_rawmidi            32768  1 snd_usbmidi_lib

nvidia              27209728  1933 nvidia_modeset

efivarfs               16384  1
```

The Xorg.0.log part where nvidia gets loaded:

```

[    12.528] (II) Applying OutputClass "nvidia" to /dev/dri/card0

[    12.528]    loading driver: nvidia

[    12.529] (==) Matched nvidia as autoconfigured driver 0

[    12.529] (II) LoadModule: "nvidia"

[    12.529] (II) Loading /usr/lib64/xorg/modules/drivers/nvidia_drv.so

[    12.536] (II) Module nvidia: vendor="NVIDIA Corporation"

[    12.541] (II) NVIDIA dlloader X Driver  455.28  Wed Sep 30 01:04:06 UTC 2020

[    12.541] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs

[    12.554] (II) NVIDIA(0): Creating default Display subsection in Screen section

[    12.554] (==) NVIDIA(0): Depth 24, (==) framebuffer bpp 32

[    12.554] (==) NVIDIA(0): RGB weight 888

[    12.554] (==) NVIDIA(0): Default visual is TrueColor

[    12.554] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)

[    12.555] (**) NVIDIA(0): Enabling 2D acceleration

[    12.555] (II) Loading sub module "glxserver_nvidia"

[    12.555] (II) LoadModule: "glxserver_nvidia"

[    12.555] (II) Loading /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so

[    12.633] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"

[    12.633] (II) NVIDIA GLX Module  455.28  Wed Sep 30 01:01:28 UTC 2020

[    12.635] (II) NVIDIA: The X server supports PRIME Render Offload.

[    13.098] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:1:0:0

[    13.098] (--) NVIDIA(0):     CRT-0

[    13.098] (--) NVIDIA(0):     DFP-0 (boot)

[    13.098] (--) NVIDIA(0):     DFP-1

[    13.098] (--) NVIDIA(0):     DFP-2

[    13.098] (--) NVIDIA(0):     DFP-3

[    13.098] (--) NVIDIA(0):     DFP-4

[    13.100] (II) NVIDIA(0): NVIDIA GPU GeForce GTX 770 (GK104) at PCI:1:0:0 (GPU-0)
```

```
$ glxinfo | head

name of display: :0

display: :0  screen: 0

direct rendering: Yes

server glx vendor string: NVIDIA Corporation

server glx version string: 1.4

server glx extensions:

    GLX_ARB_context_flush_control, GLX_ARB_create_context,

    GLX_ARB_create_context_no_error, GLX_ARB_create_context_profile,

    GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float,

    GLX_ARB_multisample, GLX_EXT_buffer_age,
```

----------

## Ionen

Well I don't see anything wrong in those outputs at least, not sure then  :Neutral: 

----------

## Letharion

Another point of comparision. I opened a web-app I frequently use in FF. On 5.8.16 it hovered around 150% CPU usage.

Rebooting back to 5.7.19 this drops down to low double-digits.

----------

## mike155

In order to avoid mistakes building the kernel, I run

```
dmesg -t >dmesg-$(uname -r)
```

whenever I compile and boot a new kernel. After that, I compare the newly created dmesg file with the dmesg file of the previous kernel (diff). The diff quickly shows missing drivers, different setting or other issues. If I had modules enabled, I would probably also run 'lsmod >modules-$(uname -r)' whenever I compile and boot a new kernel - and compare the modules-files.

----------

## Letharion

 *mike155 wrote:*   

> In order to avoid mistakes building the kernel, I run
> 
> ```
> dmesg -t >dmesg-$(uname -r)
> ```
> ...

 

Clever, I'll definitely incorporate that into my update workflow.

Having tested now, I can't spot much interesting unfortunately. The order of many things differ somewhat, but after sorting the files, the major difference is minor things like the actual kernel version, some process ids that naturally differ and a few other details.

The one little thing that stands out is:

```
750 i2c i2c-0: 2/4 memory slots populated (from DMI)

751 i2c i2c-0: Successfully instantiated SPD at 0x51

752 i2c i2c-0: Successfully instantiated SPD at 0x53
```

It certainly doesn't look like an error though, but I also don't know what it means.

----------

## Anon-E-moose

what does cat /proc/sys/kernel/random/entropy_avail return?

If you run it during the slowdowns, what does it show?

Also what CPU are you on and which governor are you running? And are you by any chance running conky?

----------

## Letharion

```
$ grep name /proc/cpuinfo

model name   : Intel(R) Core(TM) i5-5675C CPU @ 3.10GHz
```

For reference, here's what the entropy and governor say when the system is normal:

```
$ cat /proc/sys/kernel/random/entropy_avail

3523

# cat /sys/devices/system/cpu/cpu{0,1,2,3}/cpufreq/scaling_governor

powersave

powersave

powersave

powersave
```

Gonna reboot and check the slow kernel in a minute.

Edit:

```
# cat /sys/devices/system/cpu/cpu{0,1,2,3}/cpufreq/scaling_governor

userspace

userspace

userspace

userspace

$ cat /proc/sys/kernel/random/entropy_avail

1761
```

Ah, that scaling governor change looks very suspiscous.

In fact, I'm not even allowed to change it back:

```
# echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

bash: echo: write error: Invalid argument
```

Which is probably the reason why it now defaults to a different value.

----------

## Anon-E-moose

Why are you using powersave?

```
      Use the CPUFreq governor 'powersave' as default. This sets

      the frequency statically to the lowest frequency supported by

      the CPU.
```

----------

## Letharion

```
echo performance | tee /sys/devices/system/cpu/cpu{0,1,2,3}/cpufreq/scaling_governor
```

This makes the system much more responsive right away, yey!

----------

## Anon-E-moose

 *Letharion wrote:*   

> 
> 
> Edit:
> 
> ```
> ...

 

userspace is wrong   :Laughing: 

But that amount of entropy is awfully small and will cause the system to pause a fair amount.

----------

## Letharion

 *Anon-E-moose wrote:*   

> Why are you using powersave?
> 
> ```
>       Use the CPUFreq governor 'powersave' as default. This sets
> 
> ...

 

I don't actually know, this is a desktop, so there's no strong reason to that I can think off.

I may have had a reason at some point and just forgotten about it.

----------

## Anon-E-moose

You might check the "default governor" in the kernel config, sounds like userspace got set by default.

```
$ grep FREQ_DEFAULT_GOV /usr/src/linux/.config
```

----------

## Letharion

 *Anon-E-moose wrote:*   

> But that amount of entropy is awfully small and will cause the system to pause a fair amount.

 

I'll keep an eye on that now, does it maybe take the system some time to accumulate events? It's up now:

```
$ cat /proc/sys/kernel/random/entropy_avail

4001
```

If things slow down again later, I'll know to look at that. I/O is a source of entropy, is that right?

So ensuring I have my backup raid-discs mounted should be helpful then?

----------

## Letharion

 *Anon-E-moose wrote:*   

> You might check the "default governor" in the kernel config, sounds like userspace got set by default.
> 
> ```
> $ grep FREQ_DEFAULT_GOV /usr/src/linux/.config
> ```
> ...

 

Well, yes, but the value doesn't appear to have changed?

```
$ grep FREQ_DEFAULT_GOV /usr/src/linux-5.7.19-gentoo/.config

# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set

# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set

CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y

# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set

# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set

# CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set

$ grep FREQ_DEFAULT_GOV /usr/src/linux-5.8.16-gentoo/.config

# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set

# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set

CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y

# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set

# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set

# CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set
```

I _did_ try to compare the configs before I posted my question here, and I couldn't spot anything interesting then.

----------

## Anon-E-moose

You shouldn't use the userspace driver UNLESS you're going to run some program to handle changing the frequency changes, which is what it's designed for.

As far as entropy, it's generated by a lot of things, keyboard input, mouse movement.

I use rng-tools to add to entropy, put it in the boot runlevel

```

sys-apps/rng-tools-6.10::gentoo  USE="jitterentropy nistbeacon pkcs11 (-selinux)"

 * Found these USE flags for sys-apps/rng-tools-6.10:

 U I

 + + jitterentropy : Enable Jitter RNG entropy support 

 + + nistbeacon    : Enable NIST beacon entropy support 

 + + pkcs11        : Enable PKCS11 entropy support 

```

----------

