Re: perp - how to notify if service suddenly starts dying all the time

From: Wayne Marshall <wcm_at_b0llix.net>
Date: Thu, 16 Jul 2015 05:13:15 -0700

On Thu, 16 Jul 2015 09:52:55 +0300
Georgi Chorbadzhiyski <georgi_at_unixsol.org> wrote:

> Yesterday, something have corrupted the database file that Redis uses
> and Redis have crashed and then refused to start.
>
> I'm using perp to monitor the service and of course perp was doing
> it's job and restarted the service after it died. The problem was
> that I can't think of a way to notify me if a service dies all the
> time. In this case since Redis have never died on me, it'll be enough
> to know it the service have been restarted X times in the last 30
> seconds (for example).
>
> I can monitor the logs but that doesn't seem like a good idea (to
> start parallel monitor service for each service that is being
> monitored).
>
> Any ideas?
>
> Here is how my rc.main script for the service looks like (it is
> pretty standard).
>
> #!/bin/sh
>
> exec 2>&1
>
> TARGET="$1"
> SVNAME="$2"
>
> [ -z "$SVNAME" ] && SVNAME=$(basename $(readlink -m $(dirname $0)))
>
> start() {
> echo "*** $SVNAME: starting..."
> exec runuid -s redis /usr/bin/redis
> }
>
> reset() {
> case "$3" in
> 'exit')
> echo "*** $SVNAME: exited status $4 $PERP_SVSECS
> seconds runtime." ;;
> 'signal')
> echo "*** $SVNAME: killed on signal $5 $PERP_SVSECS
> seconds runtime." ;;
> *)
> echo "*** $SVNAME: stopped ($3) $PERP_SVSECS seconds
> runtime." ;;
> esac
> exit 0
> }
>
> eval "$TARGET" "$_at_"
>
> exit 0
>

Hi Georgi,

Simple way to notify from perp is to send yourself (admin) an email from
within the "reset" target:

...
reset() {
    case "$3" in
    'exit')
      echo "*** $SVNAME: exited status $4 $PERP_SVSECS seconds runtime."
      mail -s "$SVNAME exited" admin_at_myserver.com << END_MAIL
NOTICE:
The $SVNAME service has exited status $4 after runtime of $PERP_SVSECS
seconds.
END_MAIL
    ;;
    'signal')
       echo "*** $SVNAME: killed on signal $5 $PERP_SVSECS seconds
       runtime."
    ;;
    *)
      echo "*** $SVNAME: stopped ($3) $PERP_SVSECS seconds
      runtime."
    ;;
    esac
    exit 0
}
...


The above example shows usage of a generic mail(1) command that may vary
a little among plaforms/mail agents. Also uses shell "here" document to
generate the body of the email.

This is just a bare bones starting point. You could embellish this to
suit your own sites' requirements.

Another suggestion is to develop an executable "perp_notify" script that
incorporates the above to provide a consistent notification message,
without having to duplicate within each/every runscript.

All the best,

Wayne
Received on Thu Jul 16 2015 - 12:13:15 UTC

This archive was generated by hypermail 2.3.0 : Sun May 09 2021 - 19:44:19 UTC