# systemd-218: sockets not working, sometimes

## mv

Hello,

please do not turn this thread into yet another systemd flamewar - I am completely aware, that my question reveals one of the weaknesses of systemd which I had expected from the very beginning, but I would like to understand the technical reason for the particular issue:

After upgrading to sytemd-218 and when booting with systemd it happens randomly (it depends on the booting - either it happens always or never after a fresh boot) that sockets are not working.

By "not working", I really mean not working: Sockets are created and visible, but whenever something is trying to write to a socket, this process just hangs and apparently never returns from the library call. There are no error messages or otherwise unexpected behaviour - only indefinite hangs.

This is not related to starting the process with systemd: Starting a process manually which creates a socket and reads from it yields the same result.

It might play a role that I use hardened-sources with grsecurity, but I also tried to turnvarious chroot-security features on or off: Due to the unreproducible nature of the problem, I am not really sure, but it seems that none of the features is really related; turning some features off did sometimes hide some problem for many boots, but eventually the problem returned.

My first question is: Does anybody else experience the same problem?

My second question: How is systemd able to cause such a behaviour at all? Again some broken cgroups handling? How can I look for the cause?

And the last question is then of course: Why does this happen only sometimes, and why then either always or never until the next booting?

----------

## krinn

With respects to others ; i think you + systemd = the most competent gentoo user with systemd.

So any question you could ask, i'm afraid the only user that could answer them is: you

(i'm actually a bit surprise by "this process just hangs" ; i would expect the supervision awesome systemd would have a watchdog, else it's no better supervision then the classic pid file ; is it another "we don't use pid as it sucks we use sockets to sucks the same" (ok that's trolling)

----------

## mv

 *krinn wrote:*   

> i would expect the supervision awesome systemd would have a watchdog

 

You can activate a watchdog, but this is unrelated to the problem: The problem is not to kill the job but that it does not work.

I am not talking about processes supervised by systemd, I am talking about "ordinary" processes started "manually" (i.e. from a shell, either as a user or as root - it does not matter; of course, if I start them from systemd I get the same problem).

Since the problem is so unreproducible, I am not even sure whether systemd is the culprit: It could be that there is some kernel or grsecurity bug which just for some reason is triggered by systemd more often than by openrc (where the problem did not occur so far).

Once more: For testing, I have just started a server which opens a socket and listens to it, and a client which writes to a socket; these simple (ordinary user) programs already do not work. It is not clear to me, in which sense systemd is related to this - I just observe that this problem did not occur with systemd-217 and with openrc so far, so I guess systemd does something which can trigger the problem

Of course, once a "faulty" boot happened (in which case the above testing programs fail) actually a lot of other programs failed too: dhcpcd fails to report anythnig - in fact, it is not possible to do any internet connection at all, since all programs accessing some port will just hang. But the testing case shows that it is not the internet connection but actually already the plain socket access which fails.

----------

## hansb

Hello mv,

I am searching for a problem in my software, that exactly matches your error description, since month:

 *Quote:*   

> By "not working", I really mean not working: Sockets are created and visible, but whenever something is trying to write to a socket, this process just hangs and apparently never returns from the library call. There are no error messages or otherwise unexpected behaviour - only indefinite hangs. 
> 
> 

 

My application uses a huge amount (500) of TCP/IP connections from one "data source" process to 500 destination processes. Each destination process consumes the data only. TCP/IP data transfers run well for some time, but after an unspecified period of time one of the 500 TCP/IP connections hang. The thread writing to the socket is blocked in the system call forever. All output queues (netstat) are empty. So exactly the observation you described. 

Have you found a solution for your problem meanwhile?

----------

