Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: shutdown gracefully #1164

Closed
wants to merge 230 commits into from

Conversation

indutny
Copy link
Member

@indutny indutny commented Mar 16, 2015

Wait for the shutdown request completion before emitting the finish
event and destroying the socket. While this might not be that relevant
in case of plain TCP sockets, the TLS implementation sends close-notify
packet during shutdown request. Destroying socket whilst this write is
in progress tends to cause ECONNRESET errors in our tests.

@indutny
Copy link
Member Author

indutny commented Mar 16, 2015

cc @piscisaureus should fix that nasty issue on windows. Going to test it now.

@calvinmetcalf
Copy link
Contributor

you beat me to opening this, through I was going call it _end for symmetry with the end method

@indutny
Copy link
Member Author

indutny commented Mar 16, 2015

Do not land this yet, it doesn't fix the issue on windows. Needs some investigation.

@indutny
Copy link
Member Author

indutny commented Mar 16, 2015

Going to push here a couple of commits just for my own testing, sorry about this!

@indutny
Copy link
Member Author

indutny commented Mar 17, 2015

Ok, now it seems to be ready. @piscisaureus : this fixes the issue on windows once and for free ;)

@Fishrock123
Copy link
Contributor

@piscisaureus
Copy link
Contributor

@indutny

First of all, great work for digging into this and finding a fix.

I have one question though. Back in the day there we deliberately chose not to gracefully close TCP connections when serving http, for performance reasons. Instead we'd just wait until all writes were completed, and then close the connection. The kernel then takes care of doing a graceful close (note that it's still a graceful close!).

This is also the source of the subtle differences between the various finalization methods for TCP connections:

  • end() flushes all the writes, then calls shutdown(3), then close(3).
  • destroySoon() flushes all the writes and calls close(3) to do a graceful shutdown.
  • destroy() does a hard close, potentially causing a RST packet to be sent (so that the other side gets an ECONNRESET error).

I think it would be best if the TLS wrap also supported this kind of behavior.

I am not sure I completely understand how this patch affects that behavior.

@bnoordhuis I would also like to hear some deep thoughts from you.

@indutny
Copy link
Member Author

indutny commented Mar 17, 2015

@piscisaureus I guess we could use it for TLS only then, putting this __flush method to TLSSocket.

@Fishrock123
Copy link
Contributor

Looks like this cropped up on both windows machines:

=== release test-tls-server-verify ===
Path: parallel/test-tls-server-verify
connecting with agent1
Running 'Do not request certs. Everyone is unauthorized.'
- unauthed connection: null
  * unauthed
events.js:141
      throw er; // Unhandled 'error' event
            ^
Error: read ECONNRESET
    at exports._errnoException (util.js:734:11)
    at TLSWrap.onread (net.js:514:26)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2008r2\test\parallel\test-tls-server-verify.js

@indutny
Copy link
Member Author

indutny commented Mar 17, 2015

@Fishrock123 yeah, I see it on my windows box too. Looking.

@indutny
Copy link
Member Author

indutny commented Mar 18, 2015

Ok, let's hope this one is a green now: https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/316/

@bnoordhuis
Copy link
Member

What's the reason for always calling shutdown() on TCP sockets now? I have a gut feeling that's going to impact performance for no good reason.

@Fishrock123
Copy link
Contributor

@indutny

=== release test-child-process-stdout-flush-exit ===
Path: parallel/test-child-process-stdout-flush-exit
assert.js:87
  throw new assert.AssertionError({
        ^
AssertionError: false == true
    at ChildProcess.<anonymous> (c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-child-process-stdout-flush-exit.js:43:5)
    at emitTwo (events.js:87:13)
    at ChildProcess.emit (events.js:169:7)
    at maybeClose (child_process.js:984:16)
    at Process.ChildProcess._handle.onexit (child_process.js:1057:5)
Command: c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\Release\iojs.exe c:\workspace\iojs+any-pr+multi\nodes\iojs-win2012r2\test\parallel\test-child-process-stdout-flush-exit.js

Edit: this doesn't seem related? Weird. Haven't seen this before/recently..

@indutny
Copy link
Member Author

indutny commented Mar 18, 2015

@Fishrock123 is it related or not?

@bnoordhuis I was trying to figure out shutdown in the middle thing, but it might be not worth it, indeed.

@indutny indutny closed this Mar 18, 2015
@Fishrock123
Copy link
Contributor

@indutny I think you'd be able to tell better than me, but I'd wager to say it might be, given it tests functionality related to flush and this changes some stream flush code.

@indutny indutny reopened this Mar 18, 2015
@indutny
Copy link
Member Author

indutny commented Mar 18, 2015

@Fishrock123 I was wrong to close it yesterday :) Now I see that it should be still useful.

@indutny
Copy link
Member Author

indutny commented Mar 18, 2015

CI again: https://jenkins-iojs.nodesource.com/job/iojs+any-pr+multi/324/

I don't see any failures except fs-access on my box.

@indutny
Copy link
Member Author

indutny commented Mar 18, 2015

It appears to me that parallel/test-net-reconnect-error is just very slow on windows.

@indutny
Copy link
Member Author

indutny commented Mar 18, 2015

Looking into other tests too.

@indutny
Copy link
Member Author

indutny commented Mar 18, 2015

parallel/test-http-curl-chunk-problem works just fine without timing out.

@indutny
Copy link
Member Author

indutny commented Mar 18, 2015

Same about parallel/test-process-kill-null, all of the rest works too.

@indutny
Copy link
Member Author

indutny commented Mar 18, 2015

@piscisaureus may I ask you to take a look at these? I'm not sure how I could reproduce...

@Fishrock123
Copy link
Contributor

@indutny Odd, that test failure did not manifest itself again.

The timeouts seem standard and almost certainly are not related to the PR. (That should be for a different issue.)

@indutny indutny force-pushed the fix/graceful-shutdown-in-tls branch from c57e567 to aff0cb3 Compare March 19, 2015 01:56
@indutny
Copy link
Member Author

indutny commented Mar 19, 2015

@jbergstroem
Copy link
Member

Looks like the os x box fails sequential/test-child-process-fork-getconnections.js. Related?

@Fishrock123
Copy link
Contributor

An OS X failure has popped up, I reported this one in #1100 as I had previously seen it locally. This is the first time I've seen it on the CI. (should be unrelated)

Trott and others added 9 commits June 4, 2015 11:38
It's not clear what additional tests are wanted.
The current malformed URL test seems adequate.

PR-URL: nodejs#1875
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Evan Lucas <evanlucas@me.com>
`printDeprecationMessage` is used to deprecate modules
and execution branches.

PR-URL: nodejs#1822
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
It makes no sense to allow people use constants from
`smalloc`, since it will be removed completely eventually.

PR-URL: nodejs#1822
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
When loading session, OCSP response, SNI, always check that the
`self._handle` is present. If it is not - the socket was closed - and we
should emit the error instead of throwing an uncaught exception.

Fix: nodejs/node-v0.x-archive#8780
Fix: nodejs#1696
PR-URL: nodejs#1702
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
* Destroy `SSL*` and friends on a next tick to make sure that we are not
  doing it in one of the OpenSSL callbacks
* Add more checks to the C++ methods that might be invoked during
  destructor's pending queue cleanup

Fix: nodejs/node-v0.x-archive#8780
Fix: nodejs#1696
PR-URL: nodejs#1702
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
PR-URL: nodejs#1702
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Add string encoding option for fs.createReadStream and
fs.createWriteStream. and check argument type more strictly

PR-URL: nodejs#1845
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
When using `iojs debug -p <pid>` with an invalid pid, the debugger
printed an internal error message because it wasn't smart enough
to figure out that the target process didn't exist.  Now it is.

PR-URL: nodejs#1863
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Fedor Indutny <fedor@indutny.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Yosuke Furukawa <yosuke.furukawa@gmail.com>
This allows `graceful-fs` to evaluate `fs` source
without access to internals.

This is a temporary workaround that makes npm work.

See: isaacs/node-graceful-fs#41
Fixes: nodejs#1898
PR-URL: nodejs#1903
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
@bnoordhuis
Copy link
Member

@indutny Is this PR still relevant? I remember when I last reviewed it I didn't really get the point. Perhaps some TLS regression tests would help?

@indutny
Copy link
Member Author

indutny commented Jun 5, 2015

Yes, it is relevant. I will work on regression test for this.

saghul and others added 4 commits June 5, 2015 22:12
PR-URL: nodejs#1905
Refs: nodejs#1791
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
On case insensitive platforms, the rule was catching the debug module
under npm and eslint.

See: nodejs#1899 (comment)
PR-URL: nodejs#1908
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
os.homedir() calls libuv's uv_os_homedir() to retrieve the current
user's home directory.

PR-URL: nodejs#1791
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Rod Vagg <rod@vagg.org>
Wait for the `shutdown` request completion before emitting the `finish`
event and destroying the socket. While this might not be that relevant
in case of plain TCP sockets, the TLS implementation sends close-notify
packet during shutdown request. Destroying socket whilst this write is
in progress tends to cause ECONNRESET errors in our tests.
@indutny indutny force-pushed the fix/graceful-shutdown-in-tls branch from bd4daca to f3503ba Compare June 6, 2015 16:26
@indutny
Copy link
Member Author

indutny commented Jun 6, 2015

Thinking about it, it looks like it is no longer needed. Thanks for raising the question, @bnoordhuis .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net Issues and PRs related to the net subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.