RE: runit-scripts gone, supervision-scripts progress from James Powell on 2015-01-02 (supervision)

From: James Powell <james4591_at_hotmail.com>
Date: Fri, 2 Jan 2015 15:42:39 -0800

Hey Laurent,

Over at LQ, I'm working on importing s6 into LFS again, but this time at a slower pace. I was hoping to also see about using the native LFS utilities as much as possible and only include the init-shim tools (halt, shutdown, pause, and runlevel scripts and binaries) from Runit-For-LFS for low level system management if possible to avoid using more extras.

I have had a though, why not include symlinkable functionality for halt, poweroff, shutdown, and reboot directly in s6-svscanctl and move s6-pause into s6 itself to simplify the packages (you could even have a configure trigger --with-s6-pause to enable or disable it during build. Just a suggestion, but no biggie.

Anyways, I'll be posting more frequently about getting init-stage-1/2/3 drafted correctly and in execline script language. Avery maybe you can share your notes as well on this with me, if possible.

Thanks,
Jim

Sent from my Windows Phone
________________________________
From: Laurent Bercot<mailto:ska-supervision_at_skarnet.org>
Sent: ‎1/‎2/‎2015 4:59 AM
To: supervision_at_list.skarnet.org<mailto:supervision_at_list.skarnet.org>
Subject: Re: runit-scripts gone, supervision-scripts progress

  Hi Avery,
  Happy new year to you !

  Congratulations on the achievements so far, even if they're not reaching
the bar you set for yourself.

  Just a little note:

> + The ./finish concept needs development and refinement.
>
> + Need to incorporate some kind of alerting or reporting mechanism into
> ./finish, so that the sysadmin receives notifications

  ./finish is a delicate beast. It is not only run when the admin brings
the service down, which is fine, but also when the service stops in an
untimely fashion; and the service cannot start again as long as ./finish
is running. So, if anything time-consuming, or worse, blocking, happens
in ./finish, the service can be totally hosed.
  Services should do all their necessary work in ./run, before executing
into the long-lived process: when they are in ./run, it's a known and
manageable state, they are up, even if they are not ready yet. But in
./finish, it's kind of a limbo state that shouldn't be drawn out. The
service is down, but it's still doing something, can't be brought up
right now, etc. Having a service stuck in "finish" state is about as
infuriating as having a process stuck in "D" state on Linux.

  s6-supervise has a built-in protection against misbehaving ./finish
scripts: if ./finish is still around after 5 seconds, it kills it.
(With a SIGKILL. When a service is down is not the time to be polite.)
AFAICT, runsv does not have such a protection, which makes it even more
important to pay attention when writing ./finish scripts.

  One way or the other, ./finish should only be used scarcely, for clean-up
duties that absolutely need to happen when the long-lived process has died:
removing stale or temporary files, for instance. Those should be brief
operations and absolutely cannot block.
  So, if you're implementing reporting in ./finish, make sure you are using
fast, non-blocking commands that just fail (possibly logging an error
message) if they have trouble doing their job.

  The way I would implement reporting wouldn't be based on ./finish, but on
an external set of processes listening to down/up/ready notifications in
/service/foobar/event. It would only work with s6, though.

--
  Laurent

Received on Fri Jan 02 2015 - 23:42:39 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:18 UTC