Breaking change: cluster connection behavior when between workers #1239

scottnonnenberg · 2015-03-23T00:50:34Z

Already reported this on node 0.12, thought I should report it here as well.

On OSX, I've noticed a big difference between the way that connections are dealt with by a master process when there are no workers ready to take care of that incoming connection. In node 0.10.36 (and before), the connection would be held open, and a worker that hadn't been started when that request was made would have the chance to handle it. In all versions of iojs (and node 0.12.0), incoming connections when between workers are outright refused.

At the very least, this should be documented.

Example code and output on both node 0.10.36 and iojs 1.6.1 follows:

var cluster = require('cluster');
var http = require('http');
var supertest = require('supertest');
var PORT = 3000;

// cluster.schedulingPolicy = cluster.SCHED_NONE;

if (!cluster.isMaster) {
  http.createServer(function (req, res) {
    if (req.url === '/error') {
      setTimeout(function() {
        throw new Error('something went wrong!');
      }, 500);
    }
    else {
      res.writeHead(200, {'Content-Type': 'text/plain'});
      res.end('Hello World\n');
    }
  }).listen(PORT);

  console.log('Worker %s running at port %s', cluster.worker.id, PORT);
}
else {
  var count = 0;
  var request = supertest('http://localhost:' + PORT);

  var hitWorker = function(count) {
    console.log('%s: Worker listening! Hitting it...', count);

    request
      .get('/error')
      .expect(200, function(err, res) {
        console.log('%s: Worker taken down, now making second request', count);

        request
          .get('/')
          .expect('Hello World\n')
          .expect(200, function(err, res) {
            console.log('%s: Second request complete. Error:', count, err);
          });
      });
  };

  cluster.on('disconnect', function() {
    count +=1;
    if (count < 2) {
      cluster.fork();
    }
  });

  cluster.on('listening', function() {
    hitWorker(count);
  });

  // start just one worker
  cluster.fork();

  var interval = setInterval(function() {
    console.log('...');
  }, 1000);
  interval.unref();
}

output

iojs 1.6.1 (scheduling policy does not make a difference):

Worker 1 running at port 3000
0: Worker listening! Hitting it...
/Users/scottnonnenberg/Development/thehelp/cluster/test.js:12
        throw new Error('something went wrong!');
              ^
Error: something went wrong!
    at null._onTimeout (/Users/scottnonnenberg/Development/thehelp/cluster/test.js:12:15)
    at Timer.listOnTimeout (timers.js:88:15)
0: Worker taken down, now making second request
0: Second request complete. Error: { [Error: connect ECONNREFUSED 127.0.0.1:3000]
  code: 'ECONNREFUSED',
  errno: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 3000 }
Worker 2 running at port 3000
1: Worker listening! Hitting it...
...
/Users/scottnonnenberg/Development/thehelp/cluster/test.js:12
        throw new Error('something went wrong!');
              ^
Error: something went wrong!
    at null._onTimeout (/Users/scottnonnenberg/Development/thehelp/cluster/test.js:12:15)
    at Timer.listOnTimeout (timers.js:88:15)
1: Worker taken down, now making second request
1: Second request complete. Error: { [Error: connect ECONNREFUSED 127.0.0.1:3000]
  code: 'ECONNREFUSED',
  errno: 'ECONNREFUSED',
  syscall: 'connect',
  address: '127.0.0.1',
  port: 3000 }
...

node 0.10.36:

Worker 1 running at port 3000
0: Worker listening! Hitting it...

/Users/scottnonnenberg/Development/thehelp/cluster/test.js:13
        throw new Error('something went wrong!');
              ^
Error: something went wrong!
    at null._onTimeout (/test.js:13:15)
    at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)
0: Worker taken down, now making second request
Worker 2 running at port 3000
1: Worker listening! Hitting it...
0: Second request complete. Error: null
...

/Users/scottnonnenberg/Development/thehelp/cluster/test.js:13
        throw new Error('something went wrong!');
              ^
Error: something went wrong!
    at null._onTimeout (/test.js:13:15)
    at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)
1: Worker taken down, now making second request
...
...
...
...
^C

This version hangs, because third worker not started, and master keeps connection open. Note also that '0: second request complete' actually comes after '1: worker listening!'. This is because that initial second request actually ends up hitting the second worker.

The text was updated successfully, but these errors were encountered:

benjamingr · 2015-03-25T16:53:49Z

cc @petkaantonov

rpaterson · 2015-05-22T18:04:23Z

The OP is being too charitable calling this a "Breaking change". IMO this is a serious regression that is preventing us from upgrading from v0.10 to v0.12. One of the main reasons for using a cluster is to provide a high availability service - in v0.12 that is impossible because if all the workers die the clients will see "connection refused".

Fishrock123 · 2015-08-27T17:58:49Z

This commit is the origin of the changed behavior, investigating.

Fishrock123 · 2015-08-27T18:01:45Z

Hmm, does

Handles in the master process are now closed when the last worker
that holds a reference to them quits. Previously, they were only
closed at cluster shutdown.

Sound like this issue?

Fishrock123 · 2015-08-27T18:39:08Z

cc @bnoordhuis since he wrote that commit.

bnoordhuis · 2015-08-27T19:32:58Z

A documentation issue, perhaps. The change itself is intentional.

One of the main reasons for using a cluster is to provide a high availability service

That's not a design goal of the cluster module and never has been.

Fixes: #1239 Ref: 41b75ca PR-URL: #2606 Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>

Fishrock123 · 2015-08-29T00:34:59Z

A docs fix has been landed in 4c5fc3b, thanks for reporting this!

Fixes: #1239 Ref: 41b75ca PR-URL: #2606 Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>

parshap · 2015-09-03T22:43:50Z

@bnoordhuis: Would it be possible to implement the previous behavior in userland?

I'm thinking something along the lines of having the master process create and bind the socket, but never call accept() on it. Basically I want a require("net").createServer().listen() that never accepts connections.

mscdex added the cluster Issues and PRs related to the cluster subsystem. label Mar 23, 2015

Fishrock123 added the doc Issues and PRs related to the documentations. label Aug 27, 2015

Fishrock123 self-assigned this Aug 28, 2015

Fishrock123 added this to the 4.0.0 milestone Aug 28, 2015

Fishrock123 mentioned this issue Aug 28, 2015

doc: clarify cluster behaviour with no workers #2606

Closed

Fishrock123 added a commit that referenced this issue Aug 29, 2015

doc: clarify cluster behaviour with no workers

4c5fc3b

Fixes: #1239 Ref: 41b75ca PR-URL: #2606 Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>

Fishrock123 closed this as completed Aug 29, 2015

Fishrock123 added a commit that referenced this issue Aug 31, 2015

doc: clarify cluster behaviour with no workers

f75d546

Fixes: #1239 Ref: 41b75ca PR-URL: #2606 Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>

Fishrock123 mentioned this issue Sep 3, 2015

Breaking change: cluster connection behavior when between workers nodejs/node-v0.x-archive#10427

Closed

This was referenced Jul 17, 2020

[Snyk] Fix for 1 vulnerabilities pmq20/node#23

Open

[Snyk] Security upgrade ajv from 5.5.2 to 6.12.3 saeedahassan/node#20

Open

[Snyk] Security upgrade ajv from 5.5.2 to 6.12.3 olegnn/node#21

Open

[Snyk] Fix for 1 vulnerabilities erdun/node#22

Open

aliscco mentioned this issue Apr 24, 2023

[Snyk] Fix for 19 vulnerabilities aliscco/alisco-node#109

Open

snyk-bot mentioned this issue Apr 24, 2023

[Snyk] Fix for 17 vulnerabilities aliscco/alisco-node#127

Open

This was referenced Apr 26, 2023

[Snyk] Security upgrade nyc from 11.9.0 to 15.0.0 aliscco/alisco-node#213

Open

[Snyk] Fix for 3 vulnerabilities aliscco/alisco-node#214

Open

[Snyk] Security upgrade nyc from 11.9.0 to 15.0.0 aliscco/alisco-node#219

Open

This was referenced Jul 8, 2023

[Snyk] Fix for 2 vulnerabilities leonardoadame/node#104

Open

[Snyk] Fix for 25 vulnerabilities leonardoadame/node#114

Open

[Snyk] Fix for 33 vulnerabilities leonardoadame/node#124

Open

[Snyk] Fix for 4 vulnerabilities leonardoadame/node#133

Open

leonardoadame mentioned this issue Aug 4, 2023

[Snyk] Security upgrade nyc from 11.9.0 to 15.0.0 leonardoadame/node#156

Open

aliscco mentioned this issue Nov 30, 2023

[Snyk] Security upgrade nyc from 14.1.1 to 15.0.0 aliscco/alisco-node#868

Open

This was referenced Nov 30, 2023

[Snyk] Fix for 1 vulnerabilities leonardoadame/node#323

Open

[Snyk] Fix for 1 vulnerabilities leonardoadame/node#330

Open

This was referenced Dec 1, 2023

[Snyk] Security upgrade nyc from 10.3.2 to 15.0.0 leonardoadame/node#405

Open

[Snyk] Fix for 1 vulnerabilities leonardoadame/node#418

Open

This was referenced Dec 2, 2023

[Snyk] Security upgrade nyc from 14.1.1 to 15.0.0 leonardoadame/node#448

Open

[Snyk] Fix for 1 vulnerabilities leonardoadame/node#453

Open

[Snyk] Fix for 1 vulnerabilities leonardoadame/node#478

Open

aliscco mentioned this issue Dec 3, 2023

[Snyk] Fix for 7 vulnerabilities aliscco/alisco-node#1097

Open

leonardoadame mentioned this issue Dec 3, 2023

[Snyk] Security upgrade nyc from 14.1.1 to 15.0.0 leonardoadame/node#501

Open

aliscco mentioned this issue Dec 3, 2023

[Snyk] Fix for 2 vulnerabilities aliscco/alisco-node#1110

Open

This was referenced Dec 3, 2023

[Snyk] Security upgrade nyc from 14.1.1 to 15.0.0 leonardoadame/node#506

Open

[Snyk] Security upgrade nyc from 10.3.2 to 15.0.0 leonardoadame/node#558

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking change: cluster connection behavior when between workers #1239

Breaking change: cluster connection behavior when between workers #1239

scottnonnenberg commented Mar 23, 2015

benjamingr commented Mar 25, 2015

rpaterson commented May 22, 2015

Fishrock123 commented Aug 27, 2015

Fishrock123 commented Aug 27, 2015

Fishrock123 commented Aug 27, 2015

bnoordhuis commented Aug 27, 2015

Fishrock123 commented Aug 29, 2015

parshap commented Sep 3, 2015

Breaking change: cluster connection behavior when between workers #1239

Breaking change: cluster connection behavior when between workers #1239

Comments

scottnonnenberg commented Mar 23, 2015

output

benjamingr commented Mar 25, 2015

rpaterson commented May 22, 2015

Fishrock123 commented Aug 27, 2015

Fishrock123 commented Aug 27, 2015

Fishrock123 commented Aug 27, 2015

bnoordhuis commented Aug 27, 2015

Fishrock123 commented Aug 29, 2015

parshap commented Sep 3, 2015