Re: s6-svscan does not terminate after SIGTERM under amd64 (Docker) qemulation

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Fri, 04 Feb 2022 12:00:27 +0000

> I stumbled on what might be an odd bug.

  Hi Saj,

  Thank you for such a detailed bug-report!


>The call to posix_spawn(3) appears evident there, but there is no evidence of any follow-up to the failure. s6-svscan should call term() and die soon afterwards but that never happens.
>
>Repeating the test with no .s6-svscan/SIGTERM file produced a similar result (albeit with errno=2 on exec).
>
>At this stage, I am unsure where the problem lies. It may not be in s6.

  Congratulations: you have found a bug in qemu! (unfortunately, there
are many of those.)

  posix_spawn(3) uses clone(CLONE_VM|CLONE_VFORK) + execve().
  Normally, when clone() is run with the CLONE_VFORK option, the
parent immediately stops execution (the clone() call doesn't even
return) until the child has completed, or failed, its execve().
Then the parent can resume, and check whether the child has succeeded
and is now running the new process.

  You can check the glibc source code doing it here:
https://elixir.bootlin.com/glibc/latest/source/sysdeps/unix/sysv/linux/spawni.c#L373

  But this is not what is happening here. Your strace shows that
clone() returns 25 before the child thread runs, and the
parent execution returns to s6-svscan: read(6, 0x1808550,128) is
s6-svscan reading on its signalfd, checking for another signal,
and returning to its ppoll() loop when it fails.

  From s6-svscan's point of view, posix_spawn() has succeeded, so
the SIGTERM is being handled and the default term() routine should
not be called - even though the child hasn't execve()'d yet. And when
the execve() happens and fails, it's just the child process terminating,
nothing to see here, nothing to do but wait4() it.

  So, SIGTERM does nothing because posix_spawn() is lying to s6-svscan,
pretending to have succeeded when it doesn't know it yet (and is going
to fail), and goading it into not doing anything. And it's lying
because something in qemu is messing with the semantics of CLONE_VFORK.

  I'm sorry, but I don't have an easy solution for you, apart from
fixing qemu (which I'm sure everyone here has already done twice
before breakfast).

  The only workaround I can suggest is to rebuild the s6 stack
(skalibs/execline/s6) with posix_spawn() disabled; to do that, add
"--with-sysdep-posixspawn=no" to the ./configure command line when
building skalibs. But of course, that makes it impossible to use
prebuilt packages.

--
  Laurent
Received on Fri Feb 04 2022 - 13:00:27 CET

This archive was generated by hypermail 2.4.0 : Fri Feb 04 2022 - 13:00:56 CET