Re: Inteligent service restart with s6

From: Laurent Bercot <ska-skaware_at_skarnet.org>
Date: Mon, 25 Jul 2016 13:55:19 +0200

On 25/07/2016 13:18, Jan Olszak wrote:
> - Configurable sleep time before service is restarted (configured per
> service)
> - This interval should grow in case of consecutive crashes
> - After multiple crashes the watchdog must try to restart a crashed service
> at least once every 24 hours
> - Current "interval" and number of crashes should be accessible
>
>
> I'm not sure how to implement this, do you have any suggestions?

  Those are all things that can be done in the run and finish scripts.

  You can use the arguments to ./finish to check whether a service has
exited normally, exited abnormally or crashed. You can then either sleep
in the finish script (provided you have set a bigger timeout-finish first),
or - my preferred solution - invoke some program of yours that will modify
something in the filesystem so the crash is reported. Then in the ./run
script, before executing into the service, you check the "crash report"
and deduce the interval you want to sleep for, etc. etc. and perform the
actions you need before the final exec.

  I haven't added that to s6 because it's pure policy: it can be achieved
with very basic scripting in ./run and ./finish.

  If you have a watchdog of any sort, however, I recommend that you make it
a separate service (say, foo-watch if your main service is foo) and adjust
your foo/run and foo/finish scripts to send the appropriate reports to
foo-watch. If the watchdog decides it needs to restart foo, then it can just
invoke "s6-svc -t foo". Or it can decide to keep foo down for a while
(s6-svc -d foo) before starting it (s6-svc -u foo). I would advise for a
separate foo-watch service controlling the state of foo if you have some
complex policy to maintain.

-- 
  Laurent
Received on Mon Jul 25 2016 - 11:55:19 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:38:49 UTC