Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change: cluster connection behavior when between workers #1239

Closed
scottnonnenberg opened this issue Mar 23, 2015 · 8 comments
Closed
Assignees
Labels
cluster Issues and PRs related to the cluster subsystem. doc Issues and PRs related to the documentations.
Milestone

Comments

@scottnonnenberg
Copy link

Already reported this on node 0.12, thought I should report it here as well.

On OSX, I've noticed a big difference between the way that connections are dealt with by a master process when there are no workers ready to take care of that incoming connection. In node 0.10.36 (and before), the connection would be held open, and a worker that hadn't been started when that request was made would have the chance to handle it. In all versions of iojs (and node 0.12.0), incoming connections when between workers are outright refused.

At the very least, this should be documented.

Example code and output on both node 0.10.36 and iojs 1.6.1 follows:

var cluster = require('cluster');
var http = require('http');
var supertest = require('supertest');
var PORT = 3000;

// cluster.schedulingPolicy = cluster.SCHED_NONE;

if (!cluster.isMaster) {
  http.createServer(function (req, res) {
    if (req.url === '/error') {
      setTimeout(function() {
        throw new Error('something went wrong!');
      }, 500);
    }
    else {
      res.writeHead(200, {'Content-Type': 'text/plain'});
      res.end('Hello World\n');
    }
  }).listen(PORT);

  console.log('Worker %s running at port %s', cluster.worker.id, PORT);
}
else {
  var count = 0;
  var request = supertest('http://localhost:' + PORT);

  var hitWorker = function(count) {
    console.log('%s: Worker listening! Hitting it...', count);

    request
      .get('/error')
      .expect(200, function(err, res) {
        console.log('%s: Worker taken down, now making second request', count);

        request
          .get('/')
          .expect('Hello World\n')
          .expect(200, function(err, res) {
            console.log('%s: Second request complete. Error:', count, err);
          });
      });
  };

  cluster.on('disconnect', function() {
    count +=1;
    if (count < 2) {
      cluster.fork();
    }
  });

  cluster.on('listening', function() {
    hitWorker(count);
  });

  // start just one worker
  cluster.fork();

  var interval = setInterval(function() {
    console.log('...');
  }, 1000);
  interval.unref();
}

output

iojs 1.6.1 (scheduling policy does not make a difference):

Worker 1 running at port 3000
0: Worker listening! Hitting it...
/Users/scottnonnenberg/Development/thehelp/cluster/test.js:12
        throw new Error('something went wrong!');
              ^
Error: something went wrong!
    at null._onTimeout (/Users/scottnonnenberg/Development/thehelp/cluster/test.js:12:15)
    at Timer.listOnTimeout (timers.js:88:15)
0: Worker taken down, now making second request
0: Second request complete. Error: { [Error: connect ECONNREFUSED 127.0.0.1:3000]
  code: 'ECONNREFUSED',
  errno: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 3000 }
Worker 2 running at port 3000
1: Worker listening! Hitting it...
...
/Users/scottnonnenberg/Development/thehelp/cluster/test.js:12
        throw new Error('something went wrong!');
              ^
Error: something went wrong!
    at null._onTimeout (/Users/scottnonnenberg/Development/thehelp/cluster/test.js:12:15)
    at Timer.listOnTimeout (timers.js:88:15)
1: Worker taken down, now making second request
1: Second request complete. Error: { [Error: connect ECONNREFUSED 127.0.0.1:3000]
  code: 'ECONNREFUSED',
  errno: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 3000 }
...

node 0.10.36:

Worker 1 running at port 3000
0: Worker listening! Hitting it...

/Users/scottnonnenberg/Development/thehelp/cluster/test.js:13
        throw new Error('something went wrong!');
              ^
Error: something went wrong!
    at null._onTimeout (/test.js:13:15)
    at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)
0: Worker taken down, now making second request
Worker 2 running at port 3000
1: Worker listening! Hitting it...
0: Second request complete. Error: null
...

/Users/scottnonnenberg/Development/thehelp/cluster/test.js:13
        throw new Error('something went wrong!');
              ^
Error: something went wrong!
    at null._onTimeout (/test.js:13:15)
    at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)
1: Worker taken down, now making second request
...
...
...
...
^C

This version hangs, because third worker not started, and master keeps connection open. Note also that '0: second request complete' actually comes after '1: worker listening!'. This is because that initial second request actually ends up hitting the second worker.

@mscdex mscdex added the cluster Issues and PRs related to the cluster subsystem. label Mar 23, 2015
@benjamingr
Copy link
Member

cc @petkaantonov

@rpaterson
Copy link

The OP is being too charitable calling this a "Breaking change". IMO this is a serious regression that is preventing us from upgrading from v0.10 to v0.12. One of the main reasons for using a cluster is to provide a high availability service - in v0.12 that is impossible because if all the workers die the clients will see "connection refused".

@Fishrock123
Copy link
Contributor

This commit is the origin of the changed behavior, investigating.

@Fishrock123
Copy link
Contributor

Hmm, does

  • Handles in the master process are now closed when the last worker
    that holds a reference to them quits. Previously, they were only
    closed at cluster shutdown.

Sound like this issue?

@Fishrock123
Copy link
Contributor

cc @bnoordhuis since he wrote that commit.

@bnoordhuis
Copy link
Member

A documentation issue, perhaps. The change itself is intentional.

One of the main reasons for using a cluster is to provide a high availability service

That's not a design goal of the cluster module and never has been.

@Fishrock123 Fishrock123 added the doc Issues and PRs related to the documentations. label Aug 27, 2015
@Fishrock123 Fishrock123 self-assigned this Aug 28, 2015
@Fishrock123 Fishrock123 added this to the 4.0.0 milestone Aug 28, 2015
Fishrock123 added a commit that referenced this issue Aug 29, 2015
Fixes: #1239
Ref: 41b75ca
PR-URL: #2606
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
@Fishrock123
Copy link
Contributor

A docs fix has been landed in 4c5fc3b, thanks for reporting this!

Fishrock123 added a commit that referenced this issue Aug 31, 2015
Fixes: #1239
Ref: 41b75ca
PR-URL: #2606
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
@parshap
Copy link

parshap commented Sep 3, 2015

@bnoordhuis: Would it be possible to implement the previous behavior in userland?

I'm thinking something along the lines of having the master process create and bind the socket, but never call accept() on it. Basically I want a require("net").createServer().listen() that never accepts connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem. doc Issues and PRs related to the documentations.
Projects
None yet
Development

No branches or pull requests

7 participants