Tuesday, July 5th 2011

Graceful restart without downtime with node.js

I recently started playing with node.js and one thing I like is that it enables (or at least makes easier) many interesting possibilities with regard to deployment and update methods (for example compared to Java/Jetty which I mostly use now).

One scenario I am interested in is restarting a node app without the users noticing any down time. Of course there is lots of information online about doing this by load balancing between several node processes.

But for our own reasons lets say we want to do this without load balancing. Assuming any session data is stored out of process (perhaps in Redis), the idea is to structure our node.js application like this:

Perform any slow initialization (perhaps loading and compiling templates)
Send TERM signal to old node process
Start http server
listen for TERM signal and stop server from accepting further connections.

This is so that while we are initializing, the original node process is still running and serving requests. On receiving SIGTERM the old process stops accepting new connections, but still completes serving any current requests before quiting. In the mean time the new node process is up and handling any new requests.

This is straightforward except perhaps step two. How do we know which process to send TERM signal to? We could save the PID in a file and read if from there but it turns out there is a more fun way.

By using the fuser linux utility we can find out which process is listening on a specific port and send it a signal as well. For example:

fuser -n tcp 8000 -k -TERM

will find the process listening on port 8000 and send it the TERM signal. So using this, each time we start our node app we can send a TERM signal to the old version of the app which is listening for http connections and ask it to retire.

Here is some code:

var exec = require('child_process').exec;

var server = null;
var signalReceived = false;

process.on('SIGTERM', function () {
  if (server != null) {
    server.close();
  }
  signalReceived = true;
});

exports.listen = function(_server, port, timeout) {
  port    = port || 8080;
  timeout = timeout || 5;
  server  = _server;

  exec('fuser -n tcp ' + port + ' -k -TERM', 
    function (error) {
      function attemptSocket(noretry) {
        if (signalReceived) {
          return;
        }

        try {
          server.listen(port);
        } catch (e) {
          console.log(e);
          if (e.code == 'EADDRINUSE' && !noretry) {
            setTimeout(attemptSocket, timeout);
          }
        }
      }

      attemptSocket(error !== null);
    });
}

And here is how we would use it:

var http = require("http");
var graceful = require('./graceful');

function onRequest(request, response) {
  console.log("Request received: " + request.path);
  response.writeHead(200, {"Content-Type": "text/html"});
  response.write("Hello World");
  response.end();
}


var server = http.createServer(onRequest);
graceful.listen(server, 8000);

One small problem with the above is with keep-alive connections. When a browser has a keep-alive connection open this will keep the old node process running (and possibly serving new requests from that connection) until the keep alive connection times out after a couple of minutes of inactivity.