# sshd fails due to loack of ptys -- TEMPORARY FIX

## Felig

I have two machines which have recently exhibited the same failure symptom when trying to log in remotely to them from another machine.  ssh asks for the remote password, all appears well, then it hangs.  The target machine shows this in its messages log:

```

sshd[7982]: error: openpty: No such file or directory

sshd[7987]: error: session_pty_req: session 0 alloc failed

```

This began just as I was going on vacation two weeks ago.  I briefly tried rebooting, rebooting under the previous kernel, etc, and had no luck.  I still can not ssh into the machine.  I had hoped there would be some update during the two week vacation that would fix the problem, but I have tried updating and rebooting with the latest -r5 kernel, and that doesn't fix anything.

This happens on both a ~x86 machine and a ~amd64 machine.  openssh version is 4.3_p2-r2.  The kernel config is (and has been for ages)

```

CONFIG_UNIX98_PTYS=y

# CONFIG_LEGACY_PTYS is not set

```

What puzzles me the most is that this happened out of the blue.  I had ssh'd into the machine just a few minutes before without problem.  I was not updating software at the moment, altho I had updated various things the day before.

Here is the emerge --info output:

```

# emerge --info 

Portage 2.1.1_pre4-r4 (default-linux/x86/2006.0, gcc-4.1.1/vanilla, glibc-2.4-r3, 2.6.17-gentoo-r4 i686)

=================================================================

System uname: 2.6.17-gentoo-r4 i686 Intel(R) Pentium(R) 4 CPU 3.06GHz

Gentoo Base System version 1.12.4

Last Sync: Fri, 11 Aug 2006 18:00:08 +0000

app-admin/eselect-compiler: 2.0.0_rc2-r1

dev-lang/python:     2.4.3-r1

dev-python/pycrypto: 2.0.1-r5

dev-util/ccache:     [Not Present]

dev-util/confcache:  [Not Present]

sys-apps/sandbox:    1.2.18.1

sys-devel/autoconf:  2.13, 2.60

sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2

sys-devel/binutils:  2.17

sys-devel/gcc-config: 2.0.0_rc1

sys-devel/libtool:   1.5.22

virtual/os-headers:  2.6.17

ACCEPT_KEYWORDS="x86 ~x86"

AUTOCLEAN="yes"

CBUILD="i686-pc-linux-gnu"

CFLAGS="-march=i686 -O2 -pipe"

CHOST="i686-pc-linux-gnu"

CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/lib/mozilla/defaults/pref /usr/share/X11/xkb /usr/share/config /var/lib/postgresql/data /var/qmail/alias /var/qmail/control"

CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/eselect/compiler /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"

CXXFLAGS="-O2 -mcpu=i686 -pipe"

DISTDIR="/usr/portage/distfiles"

FEATURES="autoconfig ccache distlocks metadata-transfer sandbox sfperms strict"

GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"

LINGUAS="af ar az bg bn br bs ca cs cy da de el en_GB eo es et eu fa fi fr fy ga gl he hi hr hu is it ja km ko lt lv mk mn ms nb nds nl nn pa pl pt pt_BR ro ru rw se sk sl sr sr@Latn ss sv ta tg tr uk uz zh_CN zh_TW"

MAKEOPTS="-j2"

PKGDIR="/usr/portage/packages"

PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"

PORTAGE_TMPDIR="/var/tmp"

PORTDIR="/usr/portage"

PORTDIR_OVERLAY="/usr/local/portage"

SYNC="rsync://rsync.gentoo.org/gentoo-portage"

USE="x86 X Xaw3d a52 aac aalib accessibility acpi afs aim alsa amd anthy apache2 apm arts audiofile avi bash-completion bcmath berkdb bidi bitmap-fonts blas bluetooth bonobo bzip2 calendar canna cdb cdparanoia cdr chasen cjk cli crypt ctype cups curl curlwrappers dba dbm dbus dbx dga dio directfb dlloader doc dri dts dv dvb dvd dvdr dvdread eds elibc_glibc emacs emacs-w3 emboss encode esd ethereal evo examples exif expat fam fastcgi fbcon ffmpeg flac flash flatfile fontconfig foomaticdb fortran freetds freewnn ftp gb gcj gd gdbm geoip ggi gif ginac glut gmp gnome gnustep gnutls gphoto2 gpm gps gstreamer gtk gtk2 gtkhtml guile hal haskell iconv icq idn ieee1394 imagemagick imap imlib inifile innodb input_devices_evdev input_devices_keyboard input_devices_mouse ipv6 isdnlog jabber jack java javascript jbig jpeg jpeg2k junit kde kernel_linux ladcca lapack lash lcms ldap leim libcaca libg++ libgda libwww linguas_af linguas_ar linguas_az linguas_bg linguas_bn linguas_br linguas_bs linguas_ca linguas_cs linguas_cy linguas_da linguas_de linguas_el linguas_en_GB linguas_eo linguas_es linguas_et linguas_eu linguas_fa linguas_fi linguas_fr linguas_fy linguas_ga linguas_gl linguas_he linguas_hi linguas_hr linguas_hu linguas_is linguas_it linguas_ja linguas_km linguas_ko linguas_lt linguas_lv linguas_mk linguas_mn linguas_ms linguas_nb linguas_nds linguas_nl linguas_nn linguas_pa linguas_pl linguas_pt linguas_pt_BR linguas_ro linguas_ru linguas_rw linguas_se linguas_sk linguas_sl linguas_sr linguas_sr@Latn linguas_ss linguas_sv linguas_ta linguas_tg linguas_tr linguas_uk linguas_uz linguas_zh_CN linguas_zh_TW lirc lm_sensors lua m17n-lib mad maildir mailwrapper mcal mikmod mime ming mmap mmx mng mono motif mozilla mp3 mpeg mpi msession msn mysql mysqli nas ncurses nis nls nptl nptlonly nsplugin ocaml odbc offensive ofx ogg oggvorbis openal opengl oscar oss pam pcmcia pcre pda pdf pdflib perl php plotutils png portaudio posix postgres ppds pppd python qdbm qt qt3 qt4 quicktime readline recode reflection ruby samba sasl scanner sdl session sharedext sharedmem shorten simplexml slang slp sndfile snmp soap sockets socks5 sox speex spell spf spl sql sqlite sqlite3 sse sse2 ssl stroke svg svga sysvipc tcltk tcpd tetex theora threads tiff tokenizer truetype truetype-fonts type1-fonts udev unicode usb userland_GNU v4l vcd vorbis wddx wifi win32codecs wmf wxwindows xine xinerama xml xml2 xmlrpc xmms xorg xosd xpm xprint xsl xv xvid yahoo yaz zlib"

Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS

```

Last edited by Felig on Thu Aug 31, 2006 3:09 am; edited 1 time in total

----------

## Felig

I have found I can ssh and give a command, i.e.

```

ssh xyzzy touch /tmp/hello

```

That does create the /tmp/hello file on the target machine.  But I still can't log in.

----------

## Janne Pikkarainen

Weird. There have been quite a lot of /dev related stuff going on around since kernel 2.6.12, so I'm going to ask the obvious:

- Do you have (latest) udev installed & up and running? udev is in your USE flags, but you never know...  :Wink: 

- Did you run etc-update?

- Do you have /dev/pts directory around?

- What if you run sshd with debug option enabled? What is the point where sshd dies (according to logs), then?

----------

## Felig

Udev is running, or other services wouldn't work, like plugging in USB camera card readers.   Version is sys-fs/udev-096-r1.

I update /etc and its ilk after every merge.  One thing I can't do is run revdep-rebuild because I have too many packages and it barfs all over from arg lists being too long.  But I do have a home grown perl program which checks all executables for having valid dynamic library lists.

/dev/pts exists, also /dev/ptmx.

From this machine, it would do me no good to run sshd with debug, since I need another machine to log into this one.  But clue me in, I will see if I can borrow one and do that ... what debug level would you recommend?

I assume this is a double pronged screwup ... I probably changed something that seems unrelated now, and there may well be some gentoo-specific update that shouldn't have been released.  But danged if I can see anything obvious.

----------

## Janne Pikkarainen

Put something like

```
LogLevel DEBUG1
```

to /etc/ssh/sshd_config and then see what gets logged.  :Smile: 

----------

## Felig

I tried it on the one machine I have access to now and it works; ssh login finds a pty and is happy.  I don't know what has changed, I don't think anything, but something must have.  I will see if I can use some combination of ssh _command_ to trick the other machine into a fresh reboot and see if it now accepts remote login.

----------

## Felig

Alrighty.  I added LogLevel DEBUG1 to sshd_config and restarted it.  I have also upgraded to the latest kernel (2.6.17-gentoo-r5 #1 SMP PREEMPT) and baselayout (sys-apps/baselayout-1.12.4-r6) and openssh (net-misc/openssh-4.3_p2-r2).  This is a Pentium 4 with hyperthreading.  If I ssh to myself with a command (say, ls -l), it works fine, and this is the log output:

```
sshd[26602]: debug1: rexec start in 4 out 4 newsock 4 pipe 6 sock 7

sshd[26445]: debug1: Forked child 26602.

sshd[26602]: debug1: inetd sockets after dupping: 3, 3

sshd[26602]: Connection from xx.xx.xx.xx port 50218

sshd[26602]: debug1: Client protocol version 2.0; client software version OpenSSH_4.3

sshd[26602]: debug1: match: OpenSSH_4.3 pat OpenSSH*

sshd[26602]: debug1: Enabling compatibility mode for protocol 2.0

sshd[26602]: debug1: Local version string SSH-2.0-OpenSSH_4.3

sshd[26602]: debug1: PAM: initializing for "felix"

sshd[26602]: debug1: PAM: setting PAM_RHOST to "xxx.com"

sshd[26602]: debug1: PAM: setting PAM_TTY to "ssh"

sshd[26602]: debug1: temporarily_use_uid: 1000/100 (e=0/0)

sshd[26602]: debug1: trying public key file /home/felix/.ssh/authorized_keys

sshd[26602]: debug1: restore_uid: 0/0

sshd[26602]: debug1: temporarily_use_uid: 1000/100 (e=0/0)

sshd[26602]: debug1: trying public key file /home/felix/.ssh/authorized_keys2

sshd[26602]: debug1: restore_uid: 0/0

sshd[26608]: debug1: do_pam_account: called

sshd[26602]: debug1: PAM: num PAM env strings 0

sshd[26602]: debug1: do_pam_account: called

sshd[26602]: Accepted keyboard-interactive/pam for felix from xx.xx.xx.xx port 50218 ssh2

sshd[26602]: debug1: monitor_child_preauth: felix has been authenticated by privileged process

sshd(pam_unix)[26665]: session opened for user felix by (uid=0)

sshd[26665]: debug1: PAM: reinitializing credentials

sshd[26665]: debug1: permanently_set_uid: 1000/100

sshd[26665]: debug1: Entering interactive session for SSH2.

sshd[26665]: debug1: server_init_dispatch_20

sshd[26665]: debug1: server_input_channel_open: ctype session rchan 0 win 131072 max 32768

sshd[26665]: debug1: input_session_request

sshd[26665]: debug1: channel 0: new [server-session]

sshd[26665]: debug1: session_new: init

sshd[26665]: debug1: session_new: session 0

sshd[26665]: debug1: session_open: channel 0

sshd[26665]: debug1: session_open: session 0: link with channel 0

sshd[26665]: debug1: server_input_channel_open: confirm session

sshd[26665]: debug1: server_input_channel_req: channel 0 request x11-req reply 0

sshd[26665]: debug1: session_by_channel: session 0 channel 0

sshd[26665]: debug1: session_input_channel_req: session 0 req x11-req

sshd[26665]: debug1: x11_create_display_inet: Socket family 10 not supported

sshd[26665]: debug1: channel 1: new [X11 inet listener]

sshd[26665]: debug1: server_input_channel_req: channel 0 request exec reply 0

sshd[26665]: debug1: session_by_channel: session 0 channel 0

sshd[26665]: debug1: session_input_channel_req: session 0 req exec

========== begin differences =========

sshd[26676]: debug1: Received SIGCHLD.

sshd[26665]: debug1: Received SIGCHLD.

sshd[26665]: debug1: session_by_pid: pid 26676

sshd[26665]: debug1: session_exit_message: session 0 channel 0 pid 26676

sshd[26665]: debug1: session_exit_message: release channel 0

sshd[26665]: debug1: session_by_channel: session 0 channel 0

sshd[26665]: debug1: session_close_by_channel: channel 0 child 0

sshd[26665]: debug1: session_close_x11: detach x11 channel 1

sshd[26665]: debug1: session_close: session 0 pid 0

sshd[26665]: debug1: channel 0: free: server-session, nchannels 2

sshd[26665]: debug1: channel 1: free: X11 inet listener, nchannels 1

sshd[26665]: Connection closed by xx.xx.xx.xx

sshd[26665]: debug1: do_cleanup

sshd[26665]: debug1: PAM: cleanup

sshd(pam_unix)[26665]: session closed for user felix

sshd[26665]: Closing connection to xx.xx.xx.xx

sshd[26665]: debug1: PAM: cleanup
```

If I ssh to myself with no command, ie, requesting a login shell, it asks for the password and hangs forever, or at least as long as I have been willing to wait, hours.  This is the log output:

```
sshd[26706]: debug1: rexec start in 4 out 4 newsock 4 pipe 6 sock 7

sshd[26445]: debug1: Forked child 26706.

sshd[26706]: debug1: inetd sockets after dupping: 3, 3

sshd[26706]: Connection from xx.xx.xx.xx port 50220

sshd[26706]: debug1: Client protocol version 2.0; client software version OpenSSH_4.3

sshd[26706]: debug1: match: OpenSSH_4.3 pat OpenSSH*

sshd[26706]: debug1: Enabling compatibility mode for protocol 2.0

sshd[26706]: debug1: Local version string SSH-2.0-OpenSSH_4.3

sshd[26706]: debug1: PAM: initializing for "felix"

sshd[26706]: debug1: PAM: setting PAM_RHOST to "xxx.com"

sshd[26706]: debug1: PAM: setting PAM_TTY to "ssh"

sshd[26706]: debug1: temporarily_use_uid: 1000/100 (e=0/0)

sshd[26706]: debug1: trying public key file /home/felix/.ssh/authorized_keys

sshd[26706]: debug1: restore_uid: 0/0

sshd[26706]: debug1: temporarily_use_uid: 1000/100 (e=0/0)

sshd[26706]: debug1: trying public key file /home/felix/.ssh/authorized_keys2

sshd[26706]: debug1: restore_uid: 0/0

sshd[26722]: debug1: do_pam_account: called

sshd[26706]: debug1: PAM: num PAM env strings 0

sshd[26706]: debug1: do_pam_account: called

sshd[26706]: Accepted keyboard-interactive/pam for felix from xx.xx.xx.xx port 50220 ssh2

sshd[26706]: debug1: monitor_child_preauth: felix has been authenticated by privileged process

sshd(pam_unix)[26800]: session opened for user felix by (uid=0)

sshd[26800]: debug1: PAM: reinitializing credentials

sshd[26800]: debug1: permanently_set_uid: 1000/100

sshd[26800]: debug1: Entering interactive session for SSH2.

sshd[26800]: debug1: server_init_dispatch_20

sshd[26800]: debug1: server_input_channel_open: ctype session rchan 0 win 65536 max 16384

sshd[26800]: debug1: input_session_request

sshd[26800]: debug1: channel 0: new [server-session]

sshd[26800]: debug1: session_new: init

sshd[26800]: debug1: session_new: session 0

sshd[26800]: debug1: session_open: channel 0

sshd[26800]: debug1: session_open: session 0: link with channel 0

sshd[26800]: debug1: server_input_channel_open: confirm session

sshd[26800]: debug1: server_input_channel_req: channel 0 request x11-req reply 0

sshd[26800]: debug1: session_by_channel: session 0 channel 0

sshd[26800]: debug1: session_input_channel_req: session 0 req x11-req

sshd[26800]: debug1: x11_create_display_inet: Socket family 10 not supported

sshd[26800]: debug1: channel 1: new [X11 inet listener]

sshd[26800]: debug1: server_input_channel_req: channel 0 request pty-req reply 0

sshd[26800]: debug1: session_by_channel: session 0 channel 0

sshd[26800]: debug1: session_input_channel_req: session 0 req pty-req

========= begin differences =============

sshd[26800]: debug1: Allocating pty.

sshd[26706]: debug1: session_new: init

sshd[26706]: debug1: session_new: session 0

sshd[26706]: error: openpty: No such file or directory

sshd[26800]: error: session_pty_req: session 0 alloc failed

sshd[26800]: debug1: server_input_channel_req: channel 0 request shell reply 0

sshd[26800]: debug1: session_by_channel: session 0 channel 0

sshd[26800]: debug1: session_input_channel_req: session 0 req shell

sshd[26811]: debug1: Received SIGCHLD.
```

The two logs are almost identical up to the ===== lines, except for "req exec" vs "req pty-req".

The complaint about openpty: no such file or directory is particular confusing.  Here is what I have in that regard:

```
# ll /dev/pt*

crw-rw-rw- 1 root tty  5, 2 Aug 17 20:55 /dev/ptmx

/dev/pts:

total 0

crw--w---- 1 felix tty 136,  0 Aug 17 19:35 0

crw--w---- 1 felix tty 136,  1 Aug 17 19:35 1

crw--w---- 1 felix tty 136, 10 Aug 17 20:55 10

crw--w---- 1 felix tty 136, 11 Aug 17 20:20 11

crw--w---- 1 felix tty 136,  2 Aug 17 19:35 2

crw--w---- 1 felix tty 136,  3 Aug 17 19:40 3

crw--w---- 1 felix tty 136,  4 Aug 17 19:36 4

crw--w---- 1 felix tty 136,  5 Aug 17 19:36 5

crw--w---- 1 felix tty 136,  6 Aug 17 19:35 6

crw--w---- 1 felix tty 136,  7 Aug 17 20:17 7

crw--w---- 1 felix tty 136,  8 Aug 17 20:17 8

crw--w---- 1 root  tty 136,  9 Aug 17 19:38 9
```

I don't know what sshd is looking for; I suppose strace will help, and I will try that next.

The other machine, which was failing similarly but now works, is a ~amd64 dual opteron.  It is kernel 2.6.17-gentoo-r5 #1 SMP PREEMPT, baselayout sys-apps/baselayout-1.12.4-r5, and openssh net-misc/openssh-4.3_p2-r2.  It wants updating to baselayout-r6 from -r5, but there is some DHCP conflict.

----------

## Janne Pikkarainen

Strange. I would guess sshd would like to use /dev/pts/0, /dev/pts/1 and so on... anyway, I seem to have at my server /dev/ptya0 to /dev/ptyzf. For starters, try to 

```
cd /dev

mknod -m 660 ptya0 c 2 176

mknod -m 660 ptya1 c 2 177

mknod -m 660 ptya2 c 2 178

mknod -m 660 ptya3 c 2 179

mknod -m 660 ptya4 c 2 180

mknod -m 660 ptya5 c 2 181

mknod -m 660 ptya6 c 2 182

mknod -m 660 ptya7 c 2 183

mknod -m 660 ptya8 c 2 184

mknod -m 660 ptya9 c 2 185

chown root:tty ptya*
```

Then restart sshd and try your stanza again.

----------

## Felig

Made no difference.  The working ~amd64 machine and another working x86 machine have the same pt* setup as the failing ~x86 machine.  Besides, isn't udev supposed to make mknod pretty much unnecessary?

I hope I can in an strace session in sometime today.

----------

## Felig

I see hundreds of these:

```
17064 open("/dev/ptyZe", O_RDWR|O_NOCTTY|O_LARGEFILE) = -1 ENOENT (No such file or directory)

17064 open("/dev/ptyp830", O_RDWR|O_NOCTTY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
```

O_LARGEFILE is interesting.  The failing ~x86 machine's kernel does not have CONFIG_LSF (large single file) set, but the working x86 and ~amd64 machines do have it set.  I will have to try a kernel recompile.  Why would this make a difference, if it is indeed the problem?

----------

## Felig

Rebooted with a LSF kernel and it made no difference to sshd.

----------

## Janne Pikkarainen

Man, this gotta be the most strange ssh problem EVER.  :Very Happy: 

I think the next step must be to examine every single dependency ssh can have, from kernel to glibc to readline.

----------

## Felig

I was thinking of running the two working machines with strace and see what the differences are.  Gotta be something real simple, eventually.  At any rate, I'll have to do that Saturday.  I wonder if strace will show more info if I recompile openssh with debug flags.

----------

## Felig

The -D and -d options also look interesting.  With -D, I could always run it under gdb and step thru the process.  Who knows, I might learn something!  But that might have to wait, since I only access the working computers easily on weekends, and the failing one only during the week.

----------

## Felig

Here is the output from trying to update sysvinit, another complaint about openpty.  Something odd is going on here.

```
i686-pc-linux-gnu-gcc   -o init init.o init_utmp.o

change_console.c: In function 'main':

change_console.c:42: warning: value computed is not used

i686-pc-linux-gnu-gcc  -o halt halt.o ifdown.o hddown.o utmp.o

i686-pc-linux-gnu-gcc  -o shutdown dowall.o shutdown.o utmp.o

i686-pc-linux-gnu-gcc  -o runlevel runlevel.o

i686-pc-linux-gnu-gcc   -o sulogin sulogin.o -lcrypt

i686-pc-linux-gnu-gcc  -o bootlogd bootlogd.o -lutil

i686-pc-linux-gnu-gcc  -o last last.o

i686-pc-linux-gnu-gcc  -o mesg mesg.o

bootlogd.o: In function `findpty':

bootlogd.c:(.text+0x520): undefined reference to `openpty'

collect2: ld returned 1 exit status

make: *** [bootlogd] Error 1

make: *** Waiting for unfinished jobs....

make: Leaving directory `/var/tmp/portage/sysvinit-2.86-r5/work/sysvinit-2.86/src'

!!! ERROR: sys-apps/sysvinit-2.86-r5 failed.

```

----------

## Janne Pikkarainen

I'm starting to suspect this has something to do with your very recent glibc version.

----------

## Felig

The working ~amd64 machine and the failing ~x86 machine both have glibc-2.4-r3, baselayout-1.12.4-r7, sysvinit-2.86-r5.  The working x86 machine has the same baselayout and sysvinit but glibc-2.3.6-r4.  I don't know why the ~x86 machine wants to remerge sysvinit; it shows as "sys-apps/sysvinit-2.86-r5  USE="(-ibm) (-selinux) -static" 0 kB" but those flags haven't at all as far as I can remember (not a thinkpad, never used selinux).

----------

## Felig

Don't know what else to do, I have run out of ideas, so I added a bug, 145550.  I suspect it is almost certainly a cockpit error of some sort, some misconfiguration on my end, some USE flag not set, who knows ...

----------

## Felig

Bug 145550 has the details --- package app-accessibility/sphinx3 creates a bogus /usr/lib/libutil which overrides /lib/libutil, thus breaking the openssh build, and many others.  Many thanks to SpanKY for solving this.  He has opened a new bug for sphinx3, 145667.

The temporary workaround for anyone else who stumbles upon this badness is to replace the bogus /usr/lib/libutil with a symlink to the correct /lib/libutil, then remerge whatever needs it.

----------

## Janne Pikkarainen

Wow. This must be one of the nastiest bugs I've seen in a while.  :Surprised: 

----------

