gentoo, openrc, apache and monit – proper starting and stopping

I regularly use monit to monitor services and restart them if needed (and possible).  An issue I’ve run into though with Gentoo is that openrc doesn’t act as I expect it to.  openrc keeps it’s own record of the state of a service, and doesn’t look at the actual PID to see if it’s running or not.  In this post, I’m talking about apache.

For context, it’s necessary to share what my monit configuration looks like for apache.  It’s just a simple ‘start’ for startup and ‘stop’ command for shutdown:

check process apache with pidfile /var/run/apache2.pid start program = “/etc/init.d/apache2 start” with timeout 60 seconds stop program = “/etc/init.d/apache2 stop”

When apache gets started, there are two things that happen on the system: openrc flags it as started, and apache creates a PID file.

The problem I run into is when apache dies for whatever reason, unexpectedly.  Monit will notice that the PID doesn’t exist anymore, and try to restart it, using openrc.  This is where things start to go wrong.

To illustrate what happens, I’ll duplicate the scenario by running the command myself.  Here’s openrc starting it, me killing it manually, then openrc trying to start it back up using ‘start’.

# /etc/init.d/apache2 start
# pkill apache2
# /etc/init.d/apache2 status
* status: crashed
# /etc/init.d/apache2 start
* WARNING: apache2 has already been started

You can see that ‘status’ properly returns that it has crashed, but when running ‘start’, it thinks otherwise.  So, even though an openrc status check reports that it’s dead, when running ‘start’ it only checks it’s own internal status to determine it’s status.

This gets a little weirder in that if I run ‘stop’, the init script will recognize that the process is not running, and reset’s openrc’s status to stopped.  That is actually a good thing, and so it makes running ‘stop’ a reliable command.

Resuming the same state as above, here’s what happens when I run ‘stop’:

# /etc/init.d/apache2 stop
* apache2 not running (no pid file)

Now if I run it again, it checks both the process and the openrc status, and gives a different message, the same one it would as if it was already stopped.

# /etc/init.d/apache2 stop
* WARNING: apache2 is already stopped

So, the problem this creates for me is that if a process has died, monit will not run the stop command, because it’s already dead, and there’s no reason to run it.  It will run ‘start’, which will insist that it’s already running.  Monit (depending on your configuration) will try a few more times, and then just give up completely, leaving your process completely dead.

The solution I’m using is that I will tell monit to run ‘restart’ as the start command, instead of ‘start’.  The reason for this is because restart doesn’t care if it’s stopped or started, it will successfully get it started again.

I’ll repeat my original test case, to demonstrate how this works:

# /etc/init.d/apache2 start
# pkill apache2
# /etc/init.d/apache2 status
* status: crashed
# /etc/init.d/apache2 restart
* apache2 not running (no pid file)
* Starting apache2 …

I don’t know if my expecations of openrc are wrong or not, but it seems to me like it relies on it’s internal status in some cases instead of seeing if the actual process is running.  Monit takes on that responsibility, of course, so it’s good to have multiple things working together, but I wish openrc was doing a bit more strict checking.

I don’t know how to fix it, either.  openrc has arguments for displaying debug and verbose output.  It will display messages on the first run, but not the second, so I don’t know where it’s calling stuff.

# /etc/init.d/apache2 -d -v start
<lots of output>
# /etc/init.d/apache2 -d -v start
* WARNING: apache2 has already been started

No extra output on the second one.  Is this even a ‘problem’ that should be fixed, or not?  That’s kinda where I’m at right now, and just tweaking my monit configuration so it works for me.

12 comments on “gentoo, openrc, apache and monit – proper starting and stopping

  1. Lonnie Olson

    I agree. That behavior is wrong. Another problem I had w/ openrc (mind you this was a few years ago) was dependencies: I changed some network settings and restarted the network interface using the init.d script. This caused a bunch of other services getting restarted because they were dependent on networking. More annoyances.

    I prefer Upstart personally. It’s dead simple to deal with to create custom jobs and works quite well.

    Reply
  2. Patrick M.

    This is a use case where systemd really shines as an init system. It does not do or need that much hacking to monitor an restart (crashed) services, it supports that natively. That’s why I started using systemd even on my servers. Beside’s lot of rants aggainst systemd in the net – i find it a very good init system not even for desktops (fast bootup) but also and especially for servers (proper service monitoring, ressource (cgroups) and capabilites management and socket activation etc.). With socket activated services you even do not get any down time for the crashed services while they are restarting, as the socket requests are buffered by systemd.

    Reply
    1. beandog Post author

      That’s pretty cool. Everyone has been telling me to use systemd. I’m hesitant to try it out, because it’d mean rolling it out to some production servers. It sounds like the benefits are worth it.

      Reply
      1. Patrick M.

        You may want to test your setup in a VM first to check everything is working as expected. When I started testing systemd I had to play around with the service files a bit to get used to it, and even wrote some myself to get every daemon running. That was though in a very early stage of systemd (something <systemd-10 i think) and especially with gentoo it was a bit rough that time, as for example udev was set up to call openrc init files for some hot plug events and such. But I guess now that many packages provide their own systemd service files and there is a "systemd" use flag for most of the stuff that won't be much of an issue. If you use a distri that ships systemd by default it may even more work out off the box.
        But hey, just try it in a VM an see if it does the job for you.

        I would also recommend reading Lennart's blog post series "systemd for administrators" while setting it up. It helps to graps some of the underlaying concepts of systemd: http://www.freedesktop.org/wiki/Software/systemd (scroll down to "The systemd for Administrators Blog Series")

        Reply
    1. beandog Post author

      Yah I could do that, but I have other times when monit will need to stop a service as well (high load, etc.) so I want the stop one to be correct.

      Reply
  3. Xake

    I acctually think this behaviour is intended.
    There are som init-scripts where stop() has some cleanup-functionality, like killing helper-processes, that are not really needed if the system crashes, but are very useful if you want to just restart the service since those processes may block the service from coming up properly again.

    Personally I have stopped using “start” and am just using “restart” if I want to start/restart a daemon.

    Reply
  4. Weedy

    It gets even better when monit is trying to kill a run a way apache because some fcgid processes is having a seizure.
    stop program = “/monit/daemonStop.sh apache2 ‘apache2.*SSL’ ‘/var/run/apache2.pid'” with timeout 180 seconds
    http://dpaste.com/802036/

    Reply

Leave a Reply to Patrick M.Cancel reply