# [SOLVED] ssh connection hangs on system reboot/shutdown/halt

## DNAspark99

This is a long-standing problem. It's a minor one, though, but i've finally set out to resolve it once and for all. 

The issue: 

With gentoo (either on years-old systems or even the latest builds), logging into a remote system and issuing a shutdown/reboot/halt, the system goes down - but unless you explicitly 'logout' of the session before the system goes down, any active ssh connections do not disconnect, and effectively leave that terminal 'locked out'.  

however, ubuntu, (and likely many other distros), does manage to close the connections within seconds of issuing the command. Any and all active ssh connections are disconnected, with "Connection closed by remote host", or something similar. 

I've googled around a bit. It seems there's no definitive solution. Several suggested 'hacks' to try to accomplish this, most completely beside the point and only offer a 'bandaid' fix. 

So I've begun to dig into and compare the shutdown procedures of both distros. 

On ubuntu, I've isolated this behavior to the /etc/init.d/sendsigs init script, which is executed during reboot or halt runlevels. (rc0.d + rc6.d). Removing this script from execution also removes the ssh 'auto-logout' functionality, the sessions no longer disconnect automatically. 

The functionality of this ubuntu init script appears to be very similar to gentoo's /etc/init.d/killprocs script, which is also executed during system shutdown. The main point of the script (on both systems) seems to be the execution of 'killall5 -15', which sends the signal to all procs. 

So I tried just this command on both systems. Sure enough, 'killall5 -15' on ubuntu almost immediately closes any active ssh connections cleanly. 

On gentoo, no-go, the sessions will hang.  

So, I'm uncertain where to go from here. It doesn't appear to be a matter of sshd_config - I've tried the same ubuntu config on my gentoo systems and it doesn't result in an auto-logout. 

Could it be a kernel config issue? sshd init scripts? Elsewhere in the system? I have no idea. 

It would be great to solve this once and for all and get it upstream so no one ever has to complain about this again!

ThanksLast edited by DNAspark99 on Fri Feb 08, 2013 10:25 pm; edited 1 time in total

----------

## 666threesixes666

https://bugs.gentoo.org/show_bug.cgi?id=259183

----------

## wcg

Apparently the comments in the bug tracker for this bug do not

reflect knowledge that this problem does not exist in Ubuntu.

The posters seem to think it happens with any openssh sshd.

So what is Ubuntu doing differently? Leaving the network device

up and running until the kernel shuts it down on kernel exit?

What happens on the clients if you simply unplug the network

at the server?

----------

## DNAspark99

 *wcg wrote:*   

> What happens on the clients if you simply unplug the network at the server?

 

Well, physically yanking the cable will hang the client. It wouldn't get anything to tell it to terminate the connection. 

Ubuntu seems to have a way to tell any active ssh connections that it's time to hang up and go home. 

I'm still digging into it. strace on child and parent sshd processes doesn't seem to indicate anything different, though I suspect strace is being terminated first anyways.

----------

## DNAspark99

I've been playing around with this, and what I've come up with to replicate this behavior (by way of a dirty hack!) is to modify /etc/init.d/net.lo script to include a ssh-specific killall command as one of the last things to run during stop(); (near the very bottom of this)

```

stop()

{

        ...

        ...

        ...

   killall -s 15 sshd

   return 0

}

```

A dirty hack, but at least it works in replicating the expected behavior....

Putting it into the /etc/init.d/killprocs script had no effect. Indeed it looks like the network shutdown in run before killprocs....

----------

## DNAspark99

scratch that, seems to work for reboot but not halt. wtf.

----------

## kimmie

You can probably work around this with ssh and/or sshd config.

You can configure the client to disconnect from a dead server using ServerAliveInterval and ServerAliveCountMax .. man ssh_config. Similaryly, you can configure the server to disconnect from dead clients using ClientAliveInterval/Max .. man sshd_config.

You can also use TCPKeepAlive in either the server or client config, but that will kill your session much more quickly (eg. if you're running a ssh client on your laptop, via your wifi, to a server on the internet... your session won't survive restarting your laptop's wlan interface).

----------

## DNAspark99

Yes, I'm aware of ssh keepalive, and numerous other 'workarounds' that could _help_ mitigate the situation, but that's all completely beside the point. It's not the root of the problem. 

The issue is, ubuntu seems to be doing something 'correctly' here, gentoo is not. Gentoo is missing something, or something is out of order. 

Most likely, gentoo's killprocs script *should* be killing and closing these connections. It certainly looks to be the intention. But it's not operating as expected. 

Sure, it's not overly critical by any means - but it's a minor annoyance that has bugged me - and undoubtedly numerous other users - for years now. 

And, if I can track it down, great - we'll hopefully never have to deal with it again! :p

----------

## DNAspark99

OK, I think i've found an acceptable fix. It comes down to the way gentoo handles runlevels differently from most other distros. 

For some reason the network interface is dropped long before the /etc/init.d/killprocs script is run. Effectively, "the cord has been yanked",  hence any active ssh connections hang until 'keepalive' expires. The server-process undoubtedly (eventually) receives the signal, but it can't send the RST to the client without the interface up. 

I've been playing with the various gentoo-specific init script dependancy structures of the killprocs vs net.lo script - I dunno, I couldn't get it to work (likely because I don't fully understand the way it determines it's ordering things (vs ubuntu/redhat's setup where the symlinks in the respective runlevel dirs can be clearly prioritized for start/stop)

So, ultimately, -in order to not muck up the order of existing shutdown - it seems this should fall into the sshd init script itself. OK. That's what I've done, and it works as expected. YAY! (finally!)

3 lines (+comment) appended to the 'stop' block of /etc/init.d/sshd 

```

stop() {

        if [ "${RC_CMD}" = "restart" ] ; then

                checkconfig || return 1

        fi

        ebegin "Stopping ${SVCNAME}"

        start-stop-daemon --stop --exec "${SSHD_BINARY}" \

            --pidfile "${SSHD_PIDFILE}" --quiet

        eend $?

        # Close any active connections if system is going into shutdown

        if [ "$RC_RUNLEVEL" = shutdown ]; then

                ps auxw | grep sshd\: | grep -v grep | awk '{print $2}' | xargs kill -s 15

        fi

}

```

So, who do I talk to to get this upstream? :pLast edited by DNAspark99 on Fri Feb 08, 2013 9:56 pm; edited 1 time in total

----------

## wcg

A default interval for declaring a connection dead on either

the client or server, followed by an exit more graceful

than "kill -9 $PID" from a root console would seem to

be robust design in an environment where any number

of forces beyond the user's control can disconnect

the network between client and server. So it is perhaps

not surprising if Gentoo maintainers do not consider this

a critical error. User's ssh clients should recognize and

react sanely to the network going down before the server

explicitly terminates the session (or they are broken).

That is no excuse to be sloppy in our shutdown scripts,

though. Telling sshd instances with a signal to terminate

all connections from clients and exit before the system

shuts down should be doable, too, and it sounds like it

is only an ordering issue in what happens at shutdown.

Maybe make network interface shutdown depend on

a while loop around checking for still-running network

servers? I remember the "Bernstein xmalloc" (more-or-less)

from qmail or something:

malloc() some space

if malloc() returns null, wait 60 seconds

try again

if malloc() still returns null, bail out with an error

So in this case,

killall -s SIGTERM sshd

wait a polite interval for running sshd processes to clean up

killall -s SIGKILL sshd (for any that are hung)

continue

----------

## DNAspark99

Yea, as you see, I seem to have figured it out moments before you posted. 

I do hope this change (or an improved variant of it) makes its way upstream. 

It just seems fitting that, since you get a broadcast message anyways telling you that the 'System is going down NOW!', any active connections be closed cleanly. 

 *wcg wrote:*   

> A default interval for declaring a connection dead on either
> 
> the client or server, followed by an exit more graceful
> 
> than "kill -9 $PID" from a root console would seem to
> ...

 

----------

## derk

so what happens if you  are restarting your remote server sshd session only and want to maintain connectivity ..  if this is in your sshd init.d script you kill your exsisting connectioin  .. can be very .. not good .. in many cases ..

----------

## Hu

 *derk wrote:*   

> so what happens if you  are restarting your remote server sshd session only and want to maintain connectivity ..  if this is in your sshd init.d script you kill your exsisting connectioin  .. can be very .. not good .. in many cases ..

 The script appears to kill connections only when entering runlevel shutdown.

----------

## DNAspark99

 *derk wrote:*   

> so what happens if you  are restarting your remote server sshd session only and want to maintain connectivity ..  if this is in your sshd init.d script you kill your exsisting connectioin  .. can be very .. not good .. in many cases ..

 

Yea, turns out there's a handy variable, RC_RUNLEVEL, so you can safely stop / start  sshd as need be. But when the _system_ itself goes down, it'll kill the connections. 

It looks like there may finally be an 'official fix' in the works anyways, that isn't ssh specific:

https://bugs.gentoo.org/show_bug.cgi?id=259183

----------

## 666threesixes666

THE FORCE IS STRONG WITH YOU YOUNG SKY WALKER, BUT YOU ARE NOT A JEDI YET!!!!  the command you posted started throwing errors at shutdown for me.  its not shutting down ssh if there are connections, its shutting down connections regardless if there are connections or not.....

```

   if [ "$RC_RUNLEVEL" = shutdown ]; then

      SSHCONNECTIONS=$(ps ax | grep sshd\: | grep -v grep | awk '{print $5}' | head -n 1)

      if [ "$SSHCONNECTIONS" = sshd\: ] ; then

         SSHDPIDS=$(ps ax | grep sshd\: | grep -v grep | awk '{print $1}')

         kill -s 15 $SSHDPIDS

      fi

   fi

```

and back to my clean shutdowns...  ahhh yes, the dark side of the force is the pathway to many abilities some consider to be unnatural.   :Twisted Evil: 

ill second your bug tracker stuff saying it should be sshd init script, not net.lo...  i disabled net.lo for networkmanager (though subsequently revived it) sshd should control its ssh connections, not some other random script somewhere else on the system.Last edited by 666threesixes666 on Sun Mar 03, 2013 9:59 pm; edited 1 time in total

----------

## khayyam

 *666threesixes666 wrote:*   

> 
> 
> ```
> if [ "$RC_RUNLEVEL" = shutdown ]; then
> 
> ...

 

666threesixes666 ... this is simply wrong. Firstly, your SSHCONNECTIONS var will be empty as you have the single quotes inside the baces, and so awk will provide no output. Secondly, its an example of 'pipeaholism', where the tools used are not *used* but chained together with other tools when one tool would have sufficed (see: the useless use of this and that). Thridly, the wrong tool is used inplace of a tool designed for such a task (see How do I kill a process by name? I need to get the PID out of ps aux | grep ....).

The above could be simplified into the following:

```
if [ "$RC_RUNLEVEL" = shutdown ] ; then

    SSHCONNECTIONS=$(pgrep -c sshd)

    if [ "$SSHCONNECTIONS" > "1" ] ; then

        kill -s 15 $(pgrep sshd)

    fi

fi
```

Also, awk is not simply for '{print $1}', awk can do the same regex matching that grep can, so, for example, your SSHCONNECTIONS var above could avoid using grep, grep -v, head, etc, by the use of awk alone:

```
ps ax | awk '/[s]shd/{print $5;exit}'
```

... that is an example, as I pointed out above 'pgrep' should be used for such things.

 *666threesixes666 wrote:*   

> ahhh yes, the dark side of the force is the pathway to many abilities some consider to be unnatural.

 

I for one ...

Anyhow, this issue will be fixed in openrc-0.12.x (see bug #259183), and as William Hubbs pointed out (see comment #20) the issue can be fixed with the following as the first line of the stop() function in /etc/init.d/net.lo

```
yesno ${shutdown_network:-YES} && yesno $RC_GOINGDOWN && return 0
```

best ... khay

----------

## 666threesixes666

your right, i totally screwed that up on the '{print $#}' syntax, awks still strange voodoo to me.  idk regex. pgrep and pkill are new to me....  never seen them before...  are they in other distributions?  what package do they come from?  gpasswd was new to me like a month ago.  where u suggest i get on with my regex educations?

your not entirely correct.

```

mkultra@mksrv [ ~ ]$ ./ssh

mkultra@mksrv [ ~ ]$ ./ssh

ssh connections

mkultra@mksrv [ ~ ]$ cat ssh

CONNECT=$(ps ax | grep sshd\: | grep -v grep | awk {'print $5'} | head -n 1)

if [ "$CONNECT" = sshd\: ] ; then

echo "ssh connections"

fi

```

other than that u sir are a total linux god

what about people that dont use net.lo but use ssh?

kill -s 15 $(pgrep sshd)  results in killing /usr/sbin/sshd not just the connected clients.  i like pipes, they are very useful.

----------

## khayyam

 *666threesixes666 wrote:*   

> your not entirely correct.
> 
> ```
> mkultra@mksrv [ ~ ]$ ./ssh
> 
> ...

 

666threesixes666 ... I run it from the prompt, and not from a script or sub shell, I then assumed the reason the var was empty was the awk, and not the fact that metachars are protected with slashes rather than quoted.

```
% ssh localhost

% CONNECT=$(ps ax | grep sshd\: | grep -v grep | awk '{print $5}' | head -n 1)

% echo $CONNECT

% CONNECT=$(ps ax | grep "sshd:" | grep -v grep | awk '{print $5}' | head -n 1)

% echo $CONNECT

sshd:
```

 *666threesixes666 wrote:*   

> what about people that dont use net.lo but use ssh?

 

Everyone uses net.lo ... it provides 'loopback'.

```
% ifconfig lo

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536

        inet 127.0.0.1  netmask 255.0.0.0

        loop  txqueuelen 0  (Local Loopback)

        RX packets 1643  bytes 357836 (349.4 KiB)

        RX errors 0  dropped 0  overruns 0  frame 0

        TX packets 1643  bytes 357836 (349.4 KiB)

        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
```

 *666threesixes666 wrote:*   

> kill -s 15 $(pgrep sshd) results in killing /usr/sbin/sshd not just the connected clients.  i like pipes, they are very useful.

 

Well, I should have written "untested", anyhow, its easy to make the regex specific enough to exclude the daemon process but match the clients: 

```
if [ "$RC_RUNLEVEL" = shutdown ] ; then

    SSHCONNECTIONS=$(pgrep -c sshd)

    if [ "$SSHCONNECTIONS" > "1" ] ; then

        kill -s 15 $(pgrep -f "sshd:")

    fi

fi
```

Anyhow, "liking pipes" is not a sufficent enough reason to not use the correct tools, and to use them well.

best ... khay

----------

## DNAspark99

 *666threesixes666 wrote:*   

> THE FORCE IS STRONG WITH YOU YOUNG SKY WALKER, BUT YOU ARE NOT A JEDI YET!!!!  the command you posted started throwing errors at shutdown for me. 

 

Yes, correct - I wasn't overly concerned with the noisy output if there's nothing to kill (just appended the ol' "> /dev/null 2>&1" to it), as my reboots were usually done through ssh anyways, and as had already been pointed out elsewhere, 'the fix is in' - and that's all I was after. 

Good to know I'm not the only one though, thanks  :Smile: 

----------

## DNAspark99

Ok, had a moment to fiddle with this again (on the older system where the net.lo fix doesn't work)

The suggested fix:

```

if [ "$RC_RUNLEVEL" = shutdown ] ; then 

    SSHCONNECTIONS=$(pgrep -c sshd) 

    if [ "$SSHCONNECTIONS" > "1" ] ; then 

        kill -s 15 $(pgrep -f "sshd:") 

    fi 

fi

```

... also spits a usage error for me if there are no child 'sshd:' processes connected to kill off...

So I took a quick glance at the man page for xargs, and obtained correct shutdown by adding the '-r' option to the xargs portion. (the option is short for --no-run-if-empty : "If the standard input does not contain any nonblanks, do not run the command.  Normally, the command is run once even if there is no input."). This removes the scenario where there's nothing for it to kill. 

```

  if [ "$RC_RUNLEVEL" = shutdown ]; then

    ps auxw | grep sshd\: | grep -v grep | awk '{print $2}' | xargs -r kill -s 15

  fi

```

This performs as expected in all scenarios I've tested. It's certainly not as clean though, so hopefully you can easily tweak your fix...  for those who don't like pipes :p

----------

## 666threesixes666

```

   if [ "$RC_RUNLEVEL" = shutdown ]; then

      SSHCONNECTIONS=$(ps ax | grep sshd\: | grep -v grep | awk '{print $5}' | head -n 1)

      if [ "$SSHCONNECTIONS" = sshd\: ] ; then

         SSHDPIDS=$(ps ax | grep sshd\: | grep -v grep | awk '{print $1}')

         kill -s 15 $SSHDPIDS

      fi

   fi 

```

works perfectly, shuts down clean, works, tested.  pgrep -c is a bit dirtier, im actually testing for open connections, not counting up and including the sshd process(s).  its piped to the max and ugly, but gits er done.  -r sounds like a better fix to me though.  i dont like xargs because its not familiar to me and im just piping what i know in a method that make sense to me.

i think combining the 2 would fix the problem in an alternate way.  my long ugly ssh connection check...  with his short process kill.

```

   if [ "$RC_RUNLEVEL" = shutdown ]; then

      SSHCONNECTIONS=$(ps ax | grep sshd\: | grep -v grep | awk '{print $5}' | head -n 1)

      if [ "$SSHCONNECTIONS" = sshd\: ] ; then

         kill -s 15 $(pgrep -f "sshd\:")

      fi

   fi 

```

this should throw no errors....  or to correct you as he corrected me...

```

if [ "$RC_RUNLEVEL" = shutdown ]; then

    pgrep -f "sshd\:" | xargs -r kill -s 15

  fi 

```

building on a remote VM the wiki.gentoo.org gitlab rebooting hanging was getting old way quick....  i like my command pyramid   :Twisted Evil: 

----------

## khayyam

 *DNAspark99 wrote:*   

> The suggested fix [...] also spits a usage error for me if there are no child 'sshd:' processes connected to kill off...

 

DNAspark99 ... yes, and again, I didn't say "untested", not sure about the error, but its basically harmless. Unless your saying there were clients that we'ren't logged out? The following supresses any errors.

```
if [ "$RC_RUNLEVEL" = shutdown ] ; then 

    SSHCONNECTIONS=$(pgrep -c sshd) 

    if [ "$SSHCONNECTIONS" -gt "1" ] ; then 

        kill -TERM $(pgrep -f "sshd:") > /dev/null 2>&1

    fi 

fi
```

 *DNAspark99 wrote:*   

> 
> 
> ```
> if [ "$RC_RUNLEVEL" = shutdown ]; then
> 
> ...

 

Well, I don't understand how, "kill -s 15" is ('-s') SIGNAL (which would be -TERM) and not '-n', or '-15', so I'm not sure why it works, perhaps the signal is just ignored. Anyhow, there are various reasons for not doing it that way, some of which I've provided. If you want an idea of the difference, try comparing the following (which I've fixed up somewhat):

```
# time ps auxw | grep "sshd:" | grep -v grep | awk '{print $2}' 1> /dev/null

real   0m0.017s

user   0m0.004s

sys   0m0.013s

# time pgrep -f "sshd:" 1> /dev/null

real   0m0.010s

user   0m0.006s

sys   0m0.004s
```

 *666threesixes666 wrote:*   

> pgrep -c is a bit dirtier, im actually testing for open connections, not counting up and including the sshd process(s)

 

That isn't what its doing, with sshd running there will be *one* process (at least) so if there are more than one, kill those matching the regex. Nothing "dirty" about it, it doesn't count but ask for a count to use as an expression to match: more than one ssh proccess,  ie: clients connected.

best ... khay

----------

## khayyam

scratch that ... what was I thinking ... KISS

```
if [[ "$RC_RUNLEVEL" = shutdown ]] ; then

    SSH_CLIENT_PIDS="$(pgrep -f "sshd:")"

    [[ -n ${SSH_CLIENT_PIDS} ]] && kill -TERM ${SSH_CLIENT_PIDS}

fi
```

best ... khay

----------

## DNAspark99

 *khayyam wrote:*   

> scratch that ... what was I thinking ... KISS
> 
> ```
> if [[ "$RC_RUNLEVEL" = shutdown ]] ; then
> 
> ...

 

Just tried it, and it's still causing an incorrect shutdown if there are no ssh clients connected?

It's in the 'green' if there _are_ sshd clients to kill, but if there are none, it exits with the red "ERROR: sshd failed to stop" msg at shutdown. That's not right...

So I'll continue to stick with my 'ugly' fix for now, as I can spare the .010th of a second  :Smile: 

----------

## khayyam

 *DNAspark99 wrote:*   

> Just tried it, and it's still causing an incorrect shutdown if there are no ssh clients connected? It's in the 'green' if there _are_ sshd clients to kill, but if there are none, it exits with the red "ERROR: sshd failed to stop" msg at shutdown. That's not right...

 

DNAspark99 ... again, I didn't test it, except to see that the basics of it work, so please stop acting as though I've provided you with faulty information.

The issue is probably that the '&&' will return a non-zero exit status, and the script probably expects success.

```
# /etc/init.d/sshd status

 * status: started

# cat test.sh

SSH_CLIENT_PIDS="$(pgrep -f "sshd:")"

if [[ -n ${SSH_CLIENT_PIDS} ]] ; then

    kill -TERM ${SSH_CLIENT_PIDS}

fi

echo "return value is $?"

######## switch terminal #########

% ssh localhost

Last login: Fri Mar  8 20:56:55 CET 2013 from xxxxxxx.local on pts/8

######## switch terminal #########

# sh ./test.sh

return value is 0

######## switch terminal #########

%

Connection to localhost closed by remote host.

Connection to localhost closed.

######## switch terminal #########

# sh ./test.sh

return value is 0

# /etc/init.d/sshd status

 * status: started
```

So, nothing wrong there, it just needed placed in an 'if' statement and so provide an exit status of zero regardless of outcome.

 *DNAspark99 wrote:*   

> So I'll continue to stick with my 'ugly' fix for now, as I can spare the .010th of a second :)

 

You do that ...

best ... khay

----------

## DNAspark99

Sorry, didn't mean to come off indignant or anything - I absolutely appreciate your input on this, the issue needs attention, and you've provided nothing but valuable input! 

I'm just updating the thread with feedback as I try out your suggestions. )

Indeed, the following works in all scenarios, and am now using your suggestion in /etc/init.d/sshd, it's quicker, it's cleaner, everything works, everyones happy. Thanks!!

I just hope this fix finds it's way into future gentoo  :Smile: 

```

stop() {

        if [ "${RC_CMD}" = "restart" ] ; then

                checkconfig || return 1

        fi

        ebegin "Stopping ${SVCNAME}"

        start-stop-daemon --stop --exec "${SSHD_BINARY}" \

            --pidfile "${SSHD_PIDFILE}" --quiet

        eend $?

        if [ "${RC_RUNLEVEL}" = "shutdown" ]; then

                SSH_CLIENT_PIDS="$(pgrep -f 'sshd:')"

                if [[ -n ${SSH_CLIENT_PIDS} ]] ; then

                    kill -TERM ${SSH_CLIENT_PIDS}

                fi

        fi

}

```

[SOLVED], 3rd time's the charm!  :Smile: 

----------

## m27315

this last solution worked great for me!

----------

## Whissi

Hi,

this thread is marked as "fixed" but it seems like the issue still exists.

openrc-0.12 has landed in tree and should contain a fixed according to William but it looks like, that Gentoo still leaves out connections on shutdown.

From #openrc it seems like I am the only one who is able to reproduce the problem. It would be nice if some of you could test it again (don't forget to deactivate your manually fixes!) and report in https://bugs.gentoo.org/show_bug.cgi?id=259183 if you still see the issue or not.

Thanks!

----------

## UncleVan

I found the simplest and most compatible solution so far:

```
cat > /etc/local.d/10kill_ssh_session.stop

# To not let ssh/telnet connection(s) hanging on shutdown/reboot

killall ssh sshd telnet telnetd in.telnetd

CTRL-D
```

and then

```
chmod u+x /etc/local.d/10kill_ssh_session.stop
```

/etc/local.d/*.stop scripts are executed on reboot/shutdown. 

Try it !

----------

## yuyuyak

Been following this thread for awhile, all contributions, bravo!  Used DNAspark99's solution for a month or two, works as advertised.

But I gotta give the gold star to UncleVan's solution, neat, clean, the Gentoo way and most important of all, it works too.

Thanks to all

----------

## DNAspark99

Fresh install. 

It appears this is still an issue.

----------

