# [SOLVED] gentoo-sources 4.11.0 su does not work anymore

## costel78

Today I updated my system to sys-kernel/gentoo-sources-4.11.0 and su stoped working on xorg.

Of course I am in the wheel group and password is correct. In fact, in console, su - root work flawless.

I reemerged shadow, pam.

```
ls -als /bin/su

36 -rws--x--x 1 root root 36152 mai  2 23:12 /bin/su
```

```
cat /etc/pam.d/su

auth       sufficient   pam_rootok.so

auth       required     pam_wheel.so use_uid

auth       include              system-auth

account    include              system-auth

password   include              system-auth

session    include              system-auth

session    required     pam_env.so

session    optional             pam_xauth.so

```

I taken a look at demerge and there were rubygems php iproute2 and gentoo-sources for today.

I downgraded gentoo-sources and problem dissapear.

I need to start investigating problem. The problem is reproductible on one server with a very different kernel config.

Oh, in journalctl the error is "check pass; user unknown"

----------

## Zucca

Hm...

Interesting.

I'll then hold my upgrades.

Have you tried

```
su root
```

...?

----------

## costel78

Yes. And just su, too.

I do not understand. Why just in enlightenment or plain x11 session, why it is working on console ?

----------

## mega_flow

no su problem on my system, sound like a xattr problem. I have seen this with kde-plasma

Ar u sure u have POSIX Access Control Lists enable for your filesystem

i also have user_xattr in fstab enable

if not using xattr, try to disable the use flag filecaps with the package sys-libs/pam

----------

## albright

just as another data point, I have no problem with su in xorg

(using kde plasma)

my problem is that vmware-modules won't build under 4.11.0

----------

## costel78

I also have xattr use flag enabled globally, and user_xattr in fstab on root partition.

Just tried today all four combinations, with/without user_xatrr/filecaps use flag, but all test with same results:

```
mai 03 22:51:20 gentoo su[929]: - /dev/pts/0 costel:root

mai 03 22:51:20 gentoo su[929]: FAILED su for root by costel

mai 03 22:51:20 gentoo su[929]: pam_authenticate: Authentication failure

mai 03 22:51:19 gentoo su[929]: pam_unix(su:auth): authentication failure; logname= uid=1000 euid=1000 tty=/dev/pts/0 ruser=costel rhost=  user=root

mai 03 22:51:19 gentoo unix_chkpwd[933]: password check failed for user (root)

mai 03 22:51:19 gentoo unix_chkpwd[933]: check pass; user unknown

mai 03 22:51:13 gentoo unix_chkpwd[930]: check pass; user unknown
```

Log from console (always successful, no mater what):

```
mai 03 22:57:24 gentoo su[1224]: pam_unix(su:session): session closed for user root

mai 03 22:57:22 gentoo su[1224]: pam_systemd(su:session): Cannot create session: Already running in a session

mai 03 22:57:22 gentoo su[1224]: pam_unix(su:session): session opened for user root by costel(uid=1000)

mai 03 22:57:22 gentoo su[1224]: + /dev/tty1 costel:root

mai 03 22:57:22 gentoo su[1224]: Successful su for root by costel
```

So, in console unix_chkpwd is not involved.

```
ls -als /sbin/unix_chkpwd

24 -rws--x--x 1 root root 22392 mai  3 22:50 /sbin/unix_chkpwd
```

Kernel config have systemd checked:

```
#

# Gentoo Linux

#

CONFIG_GENTOO_LINUX=y

CONFIG_GENTOO_LINUX_UDEV=y

CONFIG_GENTOO_LINUX_PORTAGE=y

#

# Support for init systems, system and service managers

#

# CONFIG_GENTOO_LINUX_INIT_SCRIPT is not set

CONFIG_GENTOO_LINUX_INIT_SYSTEMD=y
```

I have no idea what in kernel internals could make this.

I really appreciate all yours support. Thank you!

----------

## Hu

Is the setuid bit on /bin/su respected when you run su under your Xorg session?  Check by running su, then switching to a different xterm and examining the process list before you type in any password in su.

----------

## costel78

Yes, it seems that is respected. 

```
ps aux | grep su

root       298  0.0  0.0  13224  1976 ?        Ss   07:59   0:00 /usr/sbin/mount.ntfs-3g /dev/sdb2 /mnt/date -o rw,noexec,nosuid,nodev,users

root     11974  0.0  0.0  25672  2856 pts/2    SN+  08:36   0:00 su - root

costel   11979  0.0  0.0  10704   968 pts/1    SN+  08:36   0:00 grep --colour=auto su
```

----------

## costel78

No error with kernel 4.10.14. Also just completed an emerge -e world. For now, 4.11.0 stay masked on my system.

----------

## NeddySeagoon

I know this isn't terribly useful

```
roy@Pi3 64bit ~ $ sudo su -

Password: 

Pi3 64bit ~ # uname -a

Linux Pi3 64bit 4.11.0 #2 SMP PREEMPT Tue May 2 22:06:22 BST 2017 aarch64 GNU/Linux

Pi3 64bit ~ # 
```

but it works for me.

That's over ssh

----------

## costel78

Thank you, the intention matter.

That's the weird thing, no problem whatsoever in console, including ssh. Just in a X session and just with 4.11.0 kernel with the exactly same config as 4.10.13/14.   :Confused: 

For now I masked it and waiting for 4.11.1. I'll try with vanilla-sources, too.

----------

## NeddySeagoon

costel78,

That's my only 4.11.0 install just now and its console doesn't work (its a Pi3 arm64 feature) so I can't easily test with Xfce4 or Mate right now.

----------

## Jaglover

costel78 is using systemd. My openrc boxes do not exhibit such a problem with 4.11.

----------

## Zucca

*sigh*

I was just about to upgrade systemd on one of my PCs. I think I'll pass it too. Although I could just snapshot / before trying out... /boot in the other hand isn't on btrfs. I still take snapshots of it by rsyncing the contents to /var/backups.

Lately if I've had problems with PCs I use, the cause has been systemd or udev ignoring my rules. I'm getting tired of "learning" systemd.

So. I keep my system at 4.10 and don't upgrade systemd. Only after this has been resolved I'll continue.

----------

## saellaven

no problems here using openrc, but I'm also using vanilla-sources since I don't trust the gentoo-sources package.Last edited by saellaven on Thu May 04, 2017 11:12 pm; edited 1 time in total

----------

## swanson

I'm having the same problem since upgrading to usual self-configured/compiled Linux 4.11 on an openrc only (no systemd) computer. Booting back to Linux 4.9 resolves the issue. Confused as to why this would cause PAM authentication to fail under X11 but not under console. Nothing on the kernel mailing lists so it might be specific to the Gentoo PAM setup but I can't see anything wrong with the PAM configuration for su and system-auth.

Also, Linux 4.11 stops Enlightenment from providing shutdown or reboot option which will be probably the same issue. Still investigating...

----------

## eccerr0r

Inside xfce4-terminal

```
fujiko:/$ systemctl --version

systemd 233

+PAM -AUDIT -SELINUX +IMA -APPARMOR +SMACK -SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL -XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN default-hierarchy=hybrid

fujiko:/$ uname -r

4.11.0-gentoo

fujiko:/$ su

Password: 

fujiko / # id

uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),26(tape),27(video)

fujiko / # exit

exit

fujiko:/$ 

```

Works for me?  I used a 4.9.16 .config and just copied it over.

----------

## Hu

Since we have conflicting data points (both openrc users and systemd users reporting failure, and both groups reporting success), it may be helpful to gather more details about the involved packages.  eccerr0r showed us his systemd version.  Would those posting mind showing also emerge --pretend --verbose sys-apps/shadow $(eix --installed --only-names pam) (and for other systemd users, your systemd version)?  Reports seem to agree that this is a regression in 4.11, but perhaps knowing the versions of the user packages involved will help understand why this regression is not affecting everyone.

----------

## eccerr0r

```
$ emerge --pretend --verbose sys-apps/shadow $(eix --installed --only-names pam)

These are the packages that would be merged, in order:

Calculating dependencies... done!

[ebuild   R    ] sys-libs/pam-1.2.1::gentoo  USE="berkdb cracklib nls pie -audit -debug -nis (-selinux) {-test} -vim-syntax" ABI_X86="32 (64) (-x32)" 0 KiB

[ebuild   R    ] sys-auth/pambase-20150213::gentoo  USE="cracklib gnome-keyring nullok sha512 systemd (-consolekit) -debug -minimal -mktemp -pam_krb5 -pam_ssh -passwdqc -securetty (-selinux)" 0 KiB

[ebuild   R    ] virtual/pam-0-r1::gentoo  ABI_X86="32 (64) (-x32)" 0 KiB

[ebuild   R    ] sys-apps/shadow-4.4-r2::gentoo  USE="acl cracklib nls pam xattr -audit (-selinux) -skey" LINGUAS="-cs -da -de -es -fi -fr -hu -id -it -ja -ko -pl -pt_BR -ru -sv -tr -zh_CN -zh_TW" 0 KiB

Total: 4 packages (4 reinstalls), Size of downloads: 0 KiB

 * IMPORTANT: 50 news items need reading for repository 'gentoo'.

 * Use eselect news read to view new items.

```

Also we may have to possibly count x11 keymap input layer changes, unless you know exactly what you typed for a password.  Throwing that out there just in case though it may be in the weeds...

----------

## mega_flow

I do have only libinput as INPUT_DEVICES

no error with passwords in  sys-kernel/gentoo-sources-4.11.0

can u use sudo ?

----------

## costel78

```
emerge --pretend --verbose sys-apps/shadow $(eix --installed --only-names pam)

These are the packages that would be merged, in order:

Calculating dependencies                      ... done!

[ebuild   R    ] virtual/pam-0-r1::gentoo  ABI_X86="32 (64) (-x32)" 0 KiB

[ebuild   R    ] sys-libs/pam-1.3.0::gentoo  USE="cracklib filecaps nls pie -audit -berkdb -debug -nis (-selinux) {-test} -vim-syntax" ABI_X86="32 (64) (-x32)" 1.754 KiB

[ebuild   R    ] sys-auth/pambase-20150213::gentoo  USE="cracklib nullok sha512 systemd (-consolekit) -debug -gnome-keyring -minimal -mktemp -pam_krb5 -pam_ssh -passwdqc -securetty (-selinux)" 4 KiB

[ebuild   R    ] sys-apps/shadow-4.4-r2::gentoo  USE="acl cracklib nls pam xattr -audit (-selinux) -skey" LINGUAS="-cs -da -de -es -fi -fr -hu -id -it -ja -ko -pl -pt_BR -ru -sv -tr -zh_CN -zh_TW" 3.620 KiB

Total: 4 packages (4 reinstalls), Size of downloads: 5.377 KiB
```

I am relieved that someone can confirm this strange bug. And it is seem to be something in enlightenment.

```
emerge -pvO efl enlightenment

These are the packages that would be merged, in order:

[ebuild   R    ] dev-libs/efl-1.18.4::gentoo  USE="X bmp drm eet egl fontconfig gif gles gstreamer harfbuzz ico libressl nls physics png postscript ppm psd pulseaudio sound ssl systemd tiff wayland -debug -doc -fbcon -fribidi -glib -gnutls -ibus -jpeg2k (-neon) -oldlua -opengl (-pixman) -raw -scim -sdl -tga -tslib -unwind -v4l -valgrind -webp -xim -xine -xpm" 63.096 KiB

[ebuild  NS    ] x11-wm/enlightenment-1.0.17:0::gentoo [0.21.7:0.17/0.21.7::gentoo] USE="dbus nls pango pulseaudio -doc -xcomposite -xinerama -xrandr" 2.361 KiB
```

Vanilla-sources-4.11.0 show the same symptoms. I'll try with efl-1.19, maybe, maybe...   :Smile: 

Thank you all very much!

Oh, I forgot about systemd version: sys-apps/systemd-233-r1:0/2::gentoo

----------

## costel78

No change with efl-1.19, but I installed xfce4-meta and when using it the problem disappear.   :Very Happy: 

So it's something in efl/enlightenment which kernel 4.11 trigger. 

It still remain a unknown to me why xfvm/xterm (X11 plain session) is affected.

----------

## swanson

So, it's only the Enlightenment window manager being affected. On the Enlightenment dev list the developers don't know either and to quote the main developers response to someone elses report of the issue from yesterday;

 *Quote:*   

> but it's a kernel change that creates the issue. what - i don't know. ask your friendly neighbourhood kernel developer. the setuid root binaries are specifically erroring out unable to assume root privs where they could before.

 

----------

## costel78

Just tried with genkernel-next, brand new default kernel config, but it still refuse to work. 

4.11 stay masked from now on. Waiting for 4.11.x.

----------

## eccerr0r

Tried it under Gnome 3 (gnome-terminal 3.22.2) and it works as well.

----------

## miket

I have no system affected one way or another but one possible villian comes to mind:  seats.  Recall that awful chain from display manager to consolekit to polkit coupled with the ugly concept of multiseat computers.  The seat you're in is supposed to matter; it may be causing havoc now.  For example, some change in the kernel could have made it so that the display manager thinks that whatever it uses to identify the console hardware to later steps in the chain is somewhat different.  There may be some rogue rule in that nasty Javascript that polkit uses that is now interpreted a bit differently since the kernel change.  There could be a change in the way that DBus communicates the value.

In any event, the effect is that you were ejected from your set.  You'd no longer be authorized to use su.

As a check, you could see if your DE lets you do other things that you would normally do, such as mount USB sticks or shut down the machine.

----------

## eccerr0r

Based on the evidence so far, it does look like a seat issue with polkit (but not consolekit, as systemd does not use consolekit - it's built in).  Also is there a full DE for Enlightenment?

Make sure etc-update is up to date with all the polkit files too.

---

Another test: I logged into a console virtual terminal, and startx -- :1 with .xinitrc starting just "vte" ... su still works.

----------

## costel78

Polkit is up to date.

```
emerge -pvO polkit

These are the packages that would be merged, in order:

[ebuild   R    ] sys-auth/polkit-0.113-r2::gentoo  USE="gtk introspection nls pam systemd -elogind -examples -jit -kde (-selinux) {-test}" 0 KiB
```

and I never touch it's config files.

Tried without pam flag, same error. I wanted to know if it's a permission error or something and I did a chmod 644 /etc/shadow.

su throw "setgid: Operation not permitted" and journalctl:

```
mai 06 16:52:27 gentoo su[1137]: Successful su for root by costel

mai 06 16:52:27 gentoo su[1137]: + /dev/pts/0 costel:root

mai 06 16:52:27 gentoo su[1137]: bad group ID `0' for user `root': Operation not permitted
```

But root has uid and guid 0 !? 

I installed gcc-7.1.0 and I did a full system and world rebuild. Now, even xfce is affected...

It wouldn't mind me to fully reinstall gentoo from scratch, but I am afraid that the result would be the same.

----------

## Zucca

 *costel78 wrote:*   

> It wouldn't mind me to fully reinstall gentoo from scratch, but I am afraid that the result would be the same.

 ... and even if that fixes it, it would be very hard afterwards to pinpoint what caused it in the first place. Anyways. My intuition says the same - the problem would not vanish in complete reinstall.

Just out of curiosity: log in (via virtual console maybe) as root and run

```
id root
```

.. and paste the results.

----------

## costel78

```
uid=0(root) gid=0(root) grupuri=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),26(tape),27(video)
```

grupuri translate to groups, they are similar, anyway.

----------

## Zucca

Okay... I'm not an expert on internals, or configuration of pam and su but

```
su[1137]: bad group ID `0' for user `root': Operation not permitted
```

... is something to watch...

----------

## costel78

Problem persist with 4.11.1 and e developers keep blaming kernel, but only e17 is affected from all DE...

----------

## Zucca

I'm currently on 4.11. Works fine. I'm running lightdm+i3.

 *costel78 wrote:*   

> e developers keep blaming kernel

 That sounds like ridiculous attitude. Do you know if they have even tried searching the root cause of the problem?

----------

## konspiracy

Working for me with git sources.....

Systemd

```

~

➔ su

Password: 

muh shane # screenfetch

         -/oyddmdhs+:.                root@muh.rig

     -odNMMMMMMMMNNmhy+-`             OS: Gentoo 

   -yNMMMMMMMMMMMNNNmmdhy+-           Kernel: x86_64 Linux 4.11.0-rc8

 `omMMMMMMMMMMMMNmdmmmmddhhy/`        Uptime: 23h 16m

 omMMMMMMMMMMMNhhyyyohmdddhhhdo`      Packages: 1032

.ydMMMMMMMMMMdhs++so/smdddhhhhdm+`    Shell: bash 4.4.12

 oyhdmNMMMMMMMNdyooydmddddhhhhyhNd.   Resolution: 1920x1080

  :oyhhdNNMMMMMMMNNNmmdddhhhhhyymMh   DE: GNOME 

    .:+sydNMMMMMNNNmmmdddhhhhhhmMmy   WM: GNOME Shell

       /mMMMMMMNNNmmmdddhhhhhmMNhs:   WM Theme: 

    `oNMMMMMMMNNNmmmddddhhdmMNhs+`    GTK Theme: Adwaita [GTK2/3]

  `sNMMMMMMMMNNNmmmdddddmNMmhs/.      Icon Theme: Adwaita

 /NMMMMMMMMNNNNmmmdddmNMNdso:`        Font: Cantarell 11

+MMMMMMMNNNNNmmmmdmNMNdso/-           CPU: Intel Core i5-6600K @ 4x 4.6GHz [31.0°C]

yMMNNNNNNNmmmmmNNMmhs+/-`             GPU: GeForce GTX 960

/hMMNNNNNNNNMNdhs++/-`                RAM: 1280MiB / 16015MiB

`/ohdmmddhys+++/:.`                  

  `-//////:--.                       

muh shane #

```

----------

## costel78

Thanks, but only e17 (ver. 19) is affected. Even old e16 works   :Smile: 

Did a full reinstall. Now xfce4 and xorg work, only enlightenment 19 fail. 

I would report the error on kernel, but, what exactly to blame, which subsystem ? And, more probably, it's e19 problem, otherwise others DE would be affected.

Later: Reported on kernel Bugzilla. It has became very tiresome to switch back and forth to console.

https://bugzilla.kernel.org/show_bug.cgi?id=195799

----------

## tholin

 *costel78 wrote:*   

> I would report the error on kernel, but, what exactly to blame, which subsystem ? And, more probably, it's e19 problem, otherwise others DE would be affected.

 

If you have the patience you could try doing a kernel git bisect.

https://wiki.gentoo.org/wiki/Kernel_git-bisect

It's tedious and usually takes a few hours.

If you end up with an unbootable kernel or a kernel that doesn't build you can use "git bisect skip" to skip the bad commit.

----------

## Zucca

 *tholin wrote:*   

> It's tedious and usually takes a few hours.
> 
> If you end up with an unbootable kernel or a kernel that doesn't build you can use "git bisect skip" to skip the bad commit.

 That process just screams for something automated...

At least something that can compile several git versions in a row. That also needs a bigger than normal /boot. Maybe best solution is to temporarily put /boot on a USB stick.

I'll raise my hat for OP if he pulls this off by hand. Either way, it's a great service for the Linux community if OP does it and can pinpoint the exact git commit that caused that problem.

What's making things bit worse is that there have been heaps of commits between those kernel versions. Since if the bug was introduced in 4.11.0... Although there's always the efficient way to test a kernel that has a git commit in half the way from 4.10.x to 4.11.0. Then depending if the bug still exist on the kernel being tested, move again half the way... and so on. This way even million commits isn't that much.

This is now railing a bit out of topic but I started to think the process bit more...

This (very crude) bash script demonstrates the process:

```
#!/bin/bash

commit="$1"

guilty="$2"

n=0

let "jump=${commit}/2"

while [ "$commit" -ne "$guilty" ]

do

    let "n=${n}+1"

    if [ $guilty -gt $commit ]

    then

        let "commit=${commit}+${jump}"

    elif [ $guilty -lt $commit ]

    then

        let "commit=${commit}-${jump}"

    fi

    let "jump=${jump}/2+1"

    case "$jump" in

        2)

            jump=1

        ;;

    esac

    echo "Trying ${commit}..."

done

echo "Guilty number is ${commit}. Took ${n} rounds to find."
```

By running it using one of the most "distant" number:

```
Trying 500000...

Trying 750001...

Trying 625000...

Trying 562499...

Trying 531248...

Trying 515622...

Trying 507808...

Trying 503900...

Trying 501945...

Trying 500967...

Trying 500477...

Trying 500231...

Trying 500107...

Trying 500044...

Trying 500012...

Trying 499995...

Trying 500004...

Trying 499999...

Trying 500002...

Trying 500001...

Guilty number is 500001. Took 20 rounds to find.
```

 It takes 20 kernels to test to find the one that has the bug from the million (imaginary) commits between 4.10.x and 4.11.0.

So... Only by choosing commits carefully it's not that bad task. Unless the compilation process takes a lot of time.

----------

## costel78

I will also try git bisect, but not right now. Let's wait what kernel developers have to say. 

Really understand enlightenment developers position as this bug is very strange, but why only e19, not even their own e16 ?

It could be really a change in the kernel and git bisect wouldn't be a complete vaste of time, but also could be a good fix in the kernel which trigler a hidden error in e.

----------

## Hu

 *Zucca wrote:*   

>  *tholin wrote:*   It's tedious and usually takes a few hours. That process just screams for something automated...
> 
> At least something that can compile several git versions in a row. That also needs a bigger than normal /boot. Maybe best solution is to temporarily put /boot on a USB stick.

 Git bisect can be automated, if you can find a way for it to programmatically tell whether the chosen test revision has the problem or not.  For environments where the bug report is a failure in a previously automated test suite, this is easy.  For this case, where the test is for the user to start e19, open a terminal, and try to su, automation is a few steps harder.

 *Zucca wrote:*   

> What's making things bit worse is that there have been heaps of commits between those kernel versions. Since if the bug was introduced in 4.11.0... Although there's always the efficient way to test a kernel that has a git commit in half the way from 4.10.x to 4.11.0. Then depending if the bug still exist on the kernel being tested, move again half the way... and so on. This way even million commits isn't that much.

 This is exactly why git bisect exists and is so well loved.  It will pick good candidate commits on your behalf, run checkout, then wait for you to tell it whether the commit exhibits the bug.  This should generally get the number of steps close to the theoretical minimum.

----------

## costel78

Ran a git bisect and found this patch  as responsible.

Reverted it and now everything is working fine with 4.11.1. Hurray!!!   :Laughing: 

About how smart was to revert it, I do not know. Those of you with more deeply understand of kernel internals have to pronounce about it.

Also updated info on kernel bugzilla.

To be on the safe side, I will stay on 4.10.16 until everything will be clear.

Thank you for your support and infos! I really appreciate!

Oh, do not try a git bisect since 4.10 with gcc-7: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=474c90156c8dcc2fa815e6716cc9394d7930cb9c

I had to reinstall  gcc-6.3.1 and start from scratch after few git bisect skip.

----------

## Zucca

 *costel78 wrote:*   

> Ran a git bisect and found this patch  as responsible.
> 
> Reverted it and now everything is working fine with 4.11.1. Hurray!!!

 

Great!

Please report your findigs to e devs. ;)

----------

## tholin

The kernel got a strict don't break userspace rule. It doesn't matter how broken userspace programs are, if they stop working because of a kernel change it's the kernel's fault (with some exceptions)

That patch looks related to POSIX capabilities which is a part of the security subsystem. Set the bugzilla regression field to yes. That might get some more attention. It doesn't look like there is a specific capabilities or security category in the kernel bugzilla. If you don't get a response in a day or two send a mail to linux-security-module@vger.kernel.org and point to that bugzilla and this thread. Some subsystem maintainers ignore the kernel bugzilla.

----------

## costel78

Posted on enlightenment bug discussion. https://phab.enlightenment.org/T5470

If nothing change until 24.05.2017 I will send email to kernel system maintainer.

Thank you!

----------

## Zucca

 *tholin wrote:*   

> The kernel got a strict don't break userspace rule.

 And Linus enforces that rule. We've seen what happens when someone actually breaks userspace... It's not pretty what happens after Linus finds out. :D

But it really seems like a kernel bug... One that's breaking the userspace. Uh oh.

----------

## costel78

Joke aside, it was a very small part of userspace, despite annoying as it was to deal with it. It would be different if was also affecting gnome, kde or xfce as their userbase is much bigger than enlightenment.

Seriously, is there a way to test/prevent such isolated cases ?

----------

## Hu

It is a small part that we know of.  There may be other more popular programs that also break as a result of the same change, but which have not yet been reported because their users have not yet moved to 4.11.

----------

## Zucca

I wonder if this issue has been solved in 4.11.3 already?

----------

## tholin

 *Zucca wrote:*   

> I wonder if this issue has been solved in 4.11.3 already?

 

Nope and I don't see the fix in the 4.11 stable queue, but the fix has been confirmed so I guess the fix will eventually land in 4.11.5.

https://www.spinics.net/lists/stable/msg173893.html

----------

## Jack Krauser

It is happening to me rigth now... 

I have 4.9.16-gentoo kernel and I don't know what can I do :/

----------

## Zucca

 *Jack Krauser wrote:*   

> I have 4.9.16-gentoo kernel and I don't know what can I do :/

 ... wait? You have this problem with 4.9.x? The bug was supposedly introduced at 4.11...

Please paste the error message and also the lines that appear at dmesg while you try to su.

----------

## tholin

 *Zucca wrote:*   

> ... wait? You have this problem with 4.9.x? The bug was supposedly introduced at 4.11...

 

According to that mail to the stable list:

 *Quote:*   

> Enlightenment is broken in 4.11 and possibly other kernels where commit 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP") is backported to.

 

Patch 64b875f7ac8a was introduced in 4.10-rc1 and backported to 4.9.1, 4.4.40 and 4.1.39.

But Greg Kroah-Hartman express some doubt.

 *Quote:*   

> Has no one noticed it being broken on 4.10?  That's where that commit showed up, it seems odd we haven't had a bug report until now, right?

 

----------

## costel78

Hmmm, that's odd. The bug was introduced in 4.11, as far I understand, read the date on responsible patch.

4.11.4 with the following patch solve everything:

```
Date: Mon, 22 May 2017 16:04:48 -0500

Subject: [PATCH] ptrace: Properly initialize ptracer_cred on fork

Message-ID: <877f18txfz.fsf_-_@xmission.com>

Patch-mainline: Submitted, LKML

References: bsc#1040041

When I introduced ptracer_cred I failed to consider the weirdness of

fork where the task_struct copies the old value by default.  This

winds up leaving ptracer_cred set even when a process forks and

the child process does not wind up being ptraced.

Because ptracer_cred is not set on non-ptraced processes whose

parents were ptraced this has broken the ability of the enlightenment

window manager to start setuid children.

Fix this by properly initializing ptracer_cred in ptrace_init_task

This must be done with a little bit of care to preserve the current value

of ptracer_cred when ptrace carries through fork.  Re-reading the

ptracer_cred from the ptracing process at this point is inconsistent

with how PT_PTRACE_CAP has been maintained all of these years.

Fixes: 64b875f7ac8a ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Signed-off-by: Takashi Iwai <tiwai@suse.de>

---

 include/linux/ptrace.h |    7 +++++--

 kernel/ptrace.c        |   20 +++++++++++++-------

 2 files changed, 18 insertions(+), 9 deletions(-)

--- a/include/linux/ptrace.h

+++ b/include/linux/ptrace.h

@@ -54,7 +54,8 @@ extern int ptrace_request(struct task_st

                          unsigned long addr, unsigned long data);

 extern void ptrace_notify(int exit_code);

 extern void __ptrace_link(struct task_struct *child,

-                         struct task_struct *new_parent);

+                         struct task_struct *new_parent,

+                         const struct cred *ptracer_cred);

 extern void __ptrace_unlink(struct task_struct *child);

 extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead);

 #define PTRACE_MODE_READ       0x01

@@ -206,7 +207,7 @@ static inline void ptrace_init_task(stru

        if (unlikely(ptrace) && current->ptrace) {

                child->ptrace = current->ptrace;

-               __ptrace_link(child, current->parent);

+               __ptrace_link(child, current->parent, current->ptracer_cred);

                if (child->ptrace & PT_SEIZED)

                        task_set_jobctl_pending(child, JOBCTL_TRAP_STOP);

@@ -215,6 +216,8 @@ static inline void ptrace_init_task(stru

                set_tsk_thread_flag(child, TIF_SIGPENDING);

        }

+       else

+               child->ptracer_cred = NULL;

 }

 /**

--- a/kernel/ptrace.c

+++ b/kernel/ptrace.c

@@ -60,19 +60,25 @@ int ptrace_access_vm(struct task_struct

 }

+void __ptrace_link(struct task_struct *child, struct task_struct *new_parent,

+                  const struct cred *ptracer_cred)

+{

+       BUG_ON(!list_empty(&child->ptrace_entry));

+       list_add(&child->ptrace_entry, &new_parent->ptraced);

+       child->parent = new_parent;

+       child->ptracer_cred = get_cred(ptracer_cred);

+}

+

 /*

  * ptrace a task: make the debugger its new parent and

  * move it to the ptrace list.

  *

  * Must be called with the tasklist lock write-held.

  */

-void __ptrace_link(struct task_struct *child, struct task_struct *new_parent)

+static void ptrace_link(struct task_struct *child, struct task_struct *new_parent)

 {

-       BUG_ON(!list_empty(&child->ptrace_entry));

-       list_add(&child->ptrace_entry, &new_parent->ptraced);

-       child->parent = new_parent;

        rcu_read_lock();

-       child->ptracer_cred = get_cred(__task_cred(new_parent));

+       __ptrace_link(child, new_parent, __task_cred(new_parent));

        rcu_read_unlock();

 }

@@ -386,7 +392,7 @@ static int ptrace_attach(struct task_str

                flags |= PT_SEIZED;

        task->ptrace = flags;

-       __ptrace_link(task, current);

+       ptrace_link(task, current);

        /* SEIZE doesn't trap tracee on attach */

        if (!seize)

@@ -459,7 +465,7 @@ static int ptrace_traceme(void)

                 */

                if (!ret && !(current->real_parent->flags & PF_EXITING)) {

                        current->ptrace = PT_PTRACED;

-                       __ptrace_link(current, current->real_parent);

+                       ptrace_link(current, current->real_parent);

                }

        }

        write_unlock_irq(&tasklist_lock);

```

----------

## Jack Krauser

 *tholin wrote:*   

> The kernel got a strict don't break userspace rule. It doesn't matter how broken userspace programs are, if they stop working because of a kernel change it's the kernel's fault (with some exceptions)

 

I've installed gnome in a new installation and systemd was installed. With that comment I found a solution in the systemd wiki and I could fix the problem --> https://wiki.gentoo.org/wiki/Systemd.

Before I've installed gnome I have not this problem. My system was working without configure kernel options about systemd, this was the only problem that I found.

Thanks for your answers  :Smile: 

----------

## costel78

This issue has been fixed in linux kernel 4.11.5 by commit ff6c1649b4a15065474adc9b2590ba20c0a62238 ("ptrace: Properly initialize ptracer_cred on fork").

Thank you, all!

----------

