Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: fix flaky cluster-disconnect-race #4457

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented Dec 28, 2015

Before this commit, on single core Windows systems process.send() would cause an EPIPE because of the ordering of the IPC channel disconnect and the process.send().

Disconnecting the second worker once it receives the sent message vs. when the worker merely comes online fixes this race condition.

Fixes: #4450

@mscdex mscdex added cluster Issues and PRs related to the cluster subsystem. windows Issues and PRs related to the Windows platform. test Issues and PRs related to the tests. labels Dec 28, 2015
@@ -16,12 +16,14 @@ if (cluster.isMaster) {
worker1.on('message', common.mustCall(function() {
worker2 = cluster.fork();
worker1.disconnect();
worker2.on('online', common.mustCall(worker2.disconnect));
worker2.on('online', common.mustCall(function() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just worker2.on('message', ...) by itself rather than wrapped inside a worker2.on('online', ...) ?

@Trott
Copy link
Member

Trott commented Dec 28, 2015

Seems to blow up a bit on ubuntu more often than not...

Mismatched <anonymous> function calls. Expected 1, actual 2.
    at Object.<anonymous> (/home/vagrant/node/test/parallel/trott.js:24:29)
    at Module._compile (module.js:408:26)
    at Object.Module._extensions..js (module.js:415:10)
    at Module.load (module.js:354:32)
    at Function.Module._load (module.js:311:12)
    at Function.Module.runMain (module.js:440:10)
    at startup (node.js:141:18)
    at node.js:997:3

@mscdex
Copy link
Contributor Author

mscdex commented Dec 28, 2015

Yeah I'm seeing that now. Looking into it...

@mscdex mscdex force-pushed the fix-win-flaky-test-cluster-disconnect-race branch from a6f4f4f to dd5f204 Compare December 29, 2015 00:53
@mscdex
Copy link
Contributor Author

mscdex commented Dec 29, 2015

Ok that should be fixed now.

@mscdex
Copy link
Contributor Author

mscdex commented Dec 29, 2015

@mscdex
Copy link
Contributor Author

mscdex commented Dec 29, 2015

Stress test is green! \o/

General CI run: https://ci.nodejs.org/job/node-test-pull-request/1101/

@Trott
Copy link
Member

Trott commented Dec 29, 2015

Unfortunately, this change means that the test no longer fails in Node 5.3.0, which means that it doesn't test the issue it was written to test anymore. (Tested on ubuntu 14.04.)

@Trott
Copy link
Member

Trott commented Dec 29, 2015

I think the issue in the test that you've fixed is the source of the race condition that's supposed to be tested here. It's a feature and not a bug. Maybe the thing to do is swallow EPIPE on Windows.

Another perhaps better option is to skip the test on anything that isn't Linux, since that's the only place (that I know of, anyway) where the thing (that this test detects) ever happened.

@mscdex
Copy link
Contributor Author

mscdex commented Dec 29, 2015

I thought about that, but swallowing EPIPE seems like a bad idea since then it would appear as though the send() was successful but it actually wasn't.

@Trott
Copy link
Member

Trott commented Dec 29, 2015

Maybe just skipping the test on Windows since the test does not test the issue it is supposed to test on Windows anyway (perhaps because the issue never existed there). In Node 5.3.0, it blows up on Linux but not on Windows. It's supposed to blow up in Node 5.3.0. Either this issue never manifested itself on Windows or else it did but this test never caught it anyway. (I'm guessing the former.)

@mscdex mscdex force-pushed the fix-win-flaky-test-cluster-disconnect-race branch from dd5f204 to 192c097 Compare December 29, 2015 04:50
@mscdex
Copy link
Contributor Author

mscdex commented Dec 29, 2015

@Trott Ok, I've modified it to just skip on Windows and left in a couple extra mustCall()s. LGTY?

@@ -7,6 +7,12 @@ const common = require('../common');
const assert = require('assert');
const net = require('net');
const cluster = require('cluster');

if (common.isWindows) {
console.log('1..0 # Skipped: This test is disabled on windows.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Windows rather than windows

Nit: is disabled on Windows makes it sound like we just gave up and hope to enable it one day. I'd rather convey a message that it just doesn't apply to Windows. Maybe: Skipped: This test does not apply to Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I just copied this directly from another test that presumably was doing a check for the same reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh. Either way. It's a nit.

@Trott
Copy link
Member

Trott commented Dec 29, 2015

A couple of nits that you can take or leave (because, hey, that's what a nit is). Beyond that, yeah, totally LGTM if CI is happy and as long as it still fails on Linux on Node 5.3.0.

On single core Windows systems, process.send() would cause an
EPIPE because of the ordering of the IPC channel disconnect and
the process.send().

The test was originally only relevant for non-Windows platforms,
so this commit merely skips the test on Windows.

Fixes: nodejs#4450
PR-URL: nodejs#4457
@mscdex mscdex force-pushed the fix-win-flaky-test-cluster-disconnect-race branch from 192c097 to d2d7c94 Compare December 29, 2015 05:11
@mscdex
Copy link
Contributor Author

mscdex commented Dec 29, 2015

Wording fixed. One last CI run: https://ci.nodejs.org/job/node-test-pull-request/1103/

@Trott
Copy link
Member

Trott commented Dec 29, 2015

Behaves as expected with Node 5.3.0 on Ubuntu 14.04.

mscdex added a commit that referenced this pull request Dec 29, 2015
On single core Windows systems, process.send() would cause an
EPIPE because of the ordering of the IPC channel disconnect and
the process.send().

The test was originally only relevant for non-Windows platforms,
so this commit merely skips the test on Windows.

Fixes: #4450
PR-URL: #4457
Reviewed-By: Rich Trott <rtrott@gmail.com>
@mscdex
Copy link
Contributor Author

mscdex commented Dec 29, 2015

Landed in fd551c3.

@mscdex mscdex closed this Dec 29, 2015
@mscdex mscdex deleted the fix-win-flaky-test-cluster-disconnect-race branch December 29, 2015 05:48
Fishrock123 pushed a commit to Fishrock123/node that referenced this pull request Jan 6, 2016
On single core Windows systems, process.send() would cause an
EPIPE because of the ordering of the IPC channel disconnect and
the process.send().

The test was originally only relevant for non-Windows platforms,
so this commit merely skips the test on Windows.

Fixes: nodejs#4450
PR-URL: nodejs#4457
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jan 14, 2016
On single core Windows systems, process.send() would cause an
EPIPE because of the ordering of the IPC channel disconnect and
the process.send().

The test was originally only relevant for non-Windows platforms,
so this commit merely skips the test on Windows.

Fixes: #4450
PR-URL: #4457
Reviewed-By: Rich Trott <rtrott@gmail.com>
MylesBorins pushed a commit that referenced this pull request Jan 19, 2016
On single core Windows systems, process.send() would cause an
EPIPE because of the ordering of the IPC channel disconnect and
the process.send().

The test was originally only relevant for non-Windows platforms,
so this commit merely skips the test on Windows.

Fixes: #4450
PR-URL: #4457
Reviewed-By: Rich Trott <rtrott@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Jan 19, 2016
scovetta pushed a commit to scovetta/node that referenced this pull request Apr 2, 2016
On single core Windows systems, process.send() would cause an
EPIPE because of the ordering of the IPC channel disconnect and
the process.send().

The test was originally only relevant for non-Windows platforms,
so this commit merely skips the test on Windows.

Fixes: nodejs#4450
PR-URL: nodejs#4457
Reviewed-By: Rich Trott <rtrott@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem. test Issues and PRs related to the tests. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate flaky test-cluster-disconnect-race
3 participants