1

Here is my amazing service, written in shell script:

$ cat ~/junk/demoapp 
#! /bin/bash -eu

while true
do
    echo "in shell"
    ( echo "in subshell" )
    /usr/bin/echo "in subprocess"
    sleep 1
done

It produces some repetitive output:

$ ~/junk/demoapp
in shell
in subshell
in subprocess
in shell
in subshell
in subprocess
in shell
in subshell
in subprocess
in shell
in subshell
in subprocess
^C

Here is a user service configuration for it:

$ cat ~/.config/systemd/user/demoapp.service 
[Unit]
Description=Demo App

[Service]
Type=exec
ExecStart=/home/tomanderson/junk/demoapp

But when i run this service, using systemd 239, the logged output is missing the lines produced by the subshell and the subprocess:

$ systemctl --user daemon-reload

$ systemctl --user start demoapp

$ journalctl --user --unit demoapp

Sep 12 18:53:27 myhost systemd[539847]: Started Demo App.
Sep 12 18:53:27 myhost demoapp[559387]: in shell
Sep 12 18:53:28 myhost demoapp[559387]: in shell
Sep 12 18:53:29 myhost demoapp[559387]: in shell
Sep 12 18:53:30 myhost demoapp[559387]: in shell
Sep 12 18:53:31 myhost demoapp[559387]: in shell
Sep 12 18:53:32 myhost demoapp[559387]: in shell
Sep 12 18:53:33 myhost demoapp[559387]: in shell
Sep 12 18:53:34 myhost demoapp[559387]: in shell
Sep 12 18:53:35 myhost demoapp[559387]: in shell

Any idea why? From reading around, it seems that systemd would normally capture output from subprocesses here. Is this something specific that the shell is doing which is interacting with this?

Googling around, i see people having problems like this with Python, where buffering is the culprit, but i don't see how that can be relevant here.

EDIT: I see exactly the same behaviour after taking shell script out of the equation, using two simple C programs. I do not see this behaviour with a simple parent process taking the place of systemd and collecting output via a pipe. This strongly indicates that it is systemd that is doing something funky. See: https://github.com/tomwhoiscontrary/child-stdout-demo

EDIT 2: An observant colleague who has root reports that (a) the subprocess output is in the journal, it's just not associated with the service, and (b) he only sees this behaviour with a user service; if he sets up a system service with the same code, the subprocess output is associated with it! Surely this is a systemd bug?

2
  • I'll test this tomorrow. PS it would be helpful to know your used systemd version
    – paladin
    Sep 12 at 20:52
  • @paladin good point, i am using systemd 239, and have edited the question to say so Sep 13 at 14:03

2 Answers 2

2

EDIT 2: An observant colleague who has root reports that (a) the subprocess output is in the journal, it's just not associated with the service, and (b) he only sees this behaviour with a user service; if he sets up a system service with the same code, the subprocess output is associated with it! Surely this is a systemd bug?

It's a known, long-lived issue; the problem is that the kernel does not provide sufficient means to associate a socket client with a cgroup (unlike, for example, the ability to retrieve the client's PID). So whenever journald receives a message it only knows the sender's PID but has to asynchronously look up its unit name from /proc/<pid>/cgroup. If the process is very short-lived (such as a subshell), it could very well be that it will exit before journald is even woken up – and by the time its message is processed, the information required to associate its output with the service is no longer available.

I'm a bit hazy on the details, but from what I remember, recent systemd versions have a partial workaround that only works if the stdout "pipe" to journald (which is really a socketpair) has been set up by a privileged process, whereas your "user" services are set up by another systemd instance that only has the same privileges as yourself.

1
  • That sounds like the answer. Do you by any chance have a link to (discussion of) the issue, or something i could search for? I haven't quite managed to accept that they've messed this up so badly. Sep 17 at 13:22
1

While I haven't investigated the underlying technical details, the same solution that works for Python (disable buffering) also works in this situation. If I use this unit file...

[Unit]
Description=Demo App

[Service]
Type=exec
ExecStart=/usr/bin/unbuffer %h/bin/demoapp

...then the expected output is logged in the journal.

The unbuffer command is part of the expect package.


This works by forcing the command to execute as an interactive process attached to a pty device, which disables the normal buffering.

If you don't have the unbuffer command handy, you can use the script command instead:

ExecStart=/usr/bin/script -c %h/bin/demoapp /dev/null
1
  • 1
    Aha, that does indeed work. But what is actually going on here? I have reproduced this loss of output with a couple of trivial C programs, no shell or built-in commands involved. And i can add as many fflush and fsync calls as i like to both of them, and the output still gets lost. This doesn't feel like behaviour i have ever seen before on unix! Sep 13 at 14:06

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .