Re: runit: why ignore SIGCONT for stages?

From: Steve Litt <slitt_at_troubleshooters.com>
Date: Tue, 23 Nov 2021 13:28:30 -0500

Leah Neukirchen said on Tue, 23 Nov 2021 13:17:58 +0100

>Hello,
>
>During debugging a ksh issue (https://github.com/ksh93/ksh/issues/301),
>we noticed that many processes on a Void Linux system booted with runit
>are ignoring SIGCONT. This seems to be due to runit(8) before execing
>into the stages:
>
> sig_unblock(sig_cont);
> sig_ignore(sig_cont);
>...
> strerr_warn3(INFO, "enter stage: ", stage[st], 0);
> execve(*prog, (char *const *)prog, envp);
>
>This code has been there since 2001. Can someone explain why?
>Ignoring SIGCONT seems to be a no-op, and the default handler seems to
>create no problems for other init systems.

Hi Leah,

For one thing, are you sure you're sending the SIGCONT to the correct
process? As far as I know, runit provides no way to retrieve the PID of
a daemon, so how do you send the signal?

Also, how do you know whether the daemon is stopped or paused? Is it
possible that the SIGCONT *is* working correctly on the daemon?

Assuming the preceding two questions indicate you're sending the
right signal to the right daemon, and the daemon really isn't
responding, I have an idea why runit might be built this way...

I've neither looked at that part of the source code, nor done any
experimentation, so what I'm about to say is pure guess.

My guess is that runit's intent was to have SIGSTOP and SIGCONT done
solely by the sv command, as described by the sv man page:

/* Stop mydaemon in its tracks */
sv pause mydaemon

/* Make mydaemon pick up where it left off */
/* Note the syntax diverges from the sv man page */
sv cont mydaemon

My proposed explanation has some logical inconsistencies: This actually
makes some sense, because to directly send a signal to the daemon,
you'd need its PID, and daemontools/runit/s6 don't write a PID file, as
far as I know.

1) If SIGCONT is really shut off in the daemon, then sv can't send
   the daemon a SIGCONT any more than anyone else.

2) If runsv is required for a specifically crafted program to run (one
   that sends a SIGCONT to a daemon), then why is better than systemd?
   I suppose it would be easy enough to #IFDEF RUNIT or something, for
   the sole purpose of sending signals to the daemon.

I don't have time to research this right now, but if I were to research
it, I'd build a dummy daemon that did nothing but write a file on /tmp
every second, writing an incrementing integer and the time. Then run a
shellscript something like the following:

sv status mydaemon
echo "Before stop ======================="
sv stop mydaemon
sv status mydaemon
echo "After stop ======================="
sleep 30
echo "After sleep ======================="
sv status mydaemon
echo "After cont ======================="
sv cont mydaemon
sv status mydaemon
echo "Done ======================="

If the integer picks up where it left off, even though the time skips
30 seconds, that's proof you were stopped. If the last sv status says
"run" instead of "pause", it's proof that the process continued.

If your desire for SIGCONT to work is just so you can start and stop it
from the command prompt, you can just use sv stop and sv cont instead.
If you have a program that actually needs to stop and continue the
daemon, and this program needs to be portable, I guess you have to
detect that the runit supervisor is running and ran your daemon, by
doing a sv status mydaemon, and if the preceding returns 0 then
system("sv stop mydaemon") or system("sv cont mydaemon"). Otherwise
send the signal manually.

Please keep us in the loop. I've used runit for 6 years, and until now,
I thought it was flawless except for the theoretical disadvantage of
PD1 not supervising anything, and the theoretical disadvantage of not
being able to start the daemons in a particular order, and the
theoretical disadvantage of polling.

Thanks,

SteveT

Steve Litt
Spring 2021 featured book: Troubleshooting Techniques of the Successful
Technologist http://www.troubleshooters.com/techniques
Received on Tue Nov 23 2021 - 19:28:30 CET

This archive was generated by hypermail 2.4.0 : Tue Nov 23 2021 - 19:29:06 CET