Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: fix socket re-use races #32000

Closed
wants to merge 3 commits into from

Conversation

ronag
Copy link
Member

@ronag ronag commented Feb 28, 2020

This fixes two race conditions related to socket re-use in keep alive agents.

  • sockets in the free list, that has emitted 'timeout' should not be re-used
  • sockets that has been destroy():d but has not yet emitted 'close' should not be re-used

Refs: #31526 (comment)

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@ronag ronag added the http Issues or PRs related to the http subsystem. label Feb 28, 2020
@ronag ronag requested a review from jasnell February 28, 2020 10:59
@ronag
Copy link
Member Author

ronag commented Feb 28, 2020

I think this should be able to land as semver-minor bugfix. Though I'd like a second opinion on that.

delete this.freeSockets[name];
}

const freeLen = freeSockets ? freeSockets.length : 0;
Copy link
Member Author

@ronag ronag Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freeLen used to include the socket that was removed, however I think that was a mistake as well and it's only used for debug logging.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ronag ronag added semver-minor PRs that contain new features and should be released in the next minor version. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Feb 28, 2020
@nodejs-github-bot
Copy link
Collaborator

@mcollina
Copy link
Member

I think this should be able to land as semver-minor bugfix. Though I'd like a second opinion on that.

I would consider it a patch, it really fixes a bad bug.

@mcollina mcollina requested a review from lpinca February 28, 2020 11:25
@mcollina mcollina added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. and removed author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Feb 28, 2020
@ronag ronag removed the semver-minor PRs that contain new features and should be released in the next minor version. label Feb 28, 2020
lib/_http_agent.js Outdated Show resolved Hide resolved
lib/_http_agent.js Show resolved Hide resolved
test/parallel/test-http-agent-timeout.js Outdated Show resolved Hide resolved
test/parallel/test-http-agent-timeout.js Outdated Show resolved Hide resolved
test/parallel/test-http-agent-timeout.js Outdated Show resolved Hide resolved
test/parallel/test-http-agent-timeout.js Outdated Show resolved Hide resolved
lib/_http_agent.js Outdated Show resolved Hide resolved
@nodejs-github-bot
Copy link
Collaborator

@lpinca
Copy link
Member

lpinca commented Feb 28, 2020

Is the socket destroyed when the 'timeout' event is emitted? If not why can't it be reused if it is still a working socket?

@ronag
Copy link
Member Author

ronag commented Feb 28, 2020

Is the socket destroyed when the 'timeout' event is emitted? If not why can't it be reused if it is still a working socket?

No, it's currently up to the user. Users often set a timeout and assume it is destroyed. The conservative/non-breaking solution is to assume the socket is not working if it emits timeout and simply not re-use it.

As a separate semver-major PR I would propose to actually destroy the socket on 'timeout'.

@lpinca
Copy link
Member

lpinca commented Feb 28, 2020

If it is added back to the free socket list and only removed from the same list on timeout, who closes it? It is not clear to me.

@ronag
Copy link
Member Author

ronag commented Feb 28, 2020

If it is added back to the free socket list and only removed from the same list on timeout, who closes it? It is not clear to me.

You are absolutely right. I missed that. I will sort that out. Thank you.

@ronag ronag removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 28, 2020
@ronag
Copy link
Member Author

ronag commented Feb 28, 2020

Given #32000 (comment) I had to update this PR and it can no longer be two separate commits. Essentially, if a socket has a timeout while in the free list we destroy it, and then ensure that destroyed sockets are not re-used.

@ronag ronag changed the title http: keep alive socket re-use races http: fix socket re-use races Feb 28, 2020
@nodejs-github-bot
Copy link
Collaborator

socket.setTimeout(agentTimeout);
}
});
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to do this if re-using the socket. Move it to where it is put into the free list.

Whether and when a socket is destroyed or not after a timeout is up to
the user. This leaves an edge case where a socket that has emitted
'timeout' might be re-used from the free pool. Even if destroy is called
on the socket, it won't be removed from the freelist until 'close' which
can happen several ticks later.

Sockets are removed from the free list on the 'close' event.
However, there is a delay between calling destroy() and 'close'
being emitted. This means that it possible for a socket that has
been destroyed to be re-used from the free list, causing unexpected
failures.
@lpinca
Copy link
Member

lpinca commented Feb 29, 2020

It seems good but we should update the documentation of the agent specifying that free sockets are destroyed when they time out.

@ronag
Copy link
Member Author

ronag commented Feb 29, 2020

@lpinca updated doc

@ronag
Copy link
Member Author

ronag commented Mar 2, 2020

@mcollina: There has been some changes here since you reviewed. You still approve?

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@ronag
Copy link
Member Author

ronag commented Mar 7, 2020

Landed in 8700d89

@ronag ronag closed this Mar 7, 2020
ronag added a commit that referenced this pull request Mar 7, 2020
Whether and when a socket is destroyed or not after a timeout is up to
the user. This leaves an edge case where a socket that has emitted
'timeout' might be re-used from the free pool. Even if destroy is called
on the socket, it won't be removed from the freelist until 'close' which
can happen several ticks later.

Sockets are removed from the free list on the 'close' event.
However, there is a delay between calling destroy() and 'close'
being emitted. This means that it possible for a socket that has
been destroyed to be re-used from the free list, causing unexpected
failures.

PR-URL: #32000
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
MylesBorins pushed a commit that referenced this pull request Mar 9, 2020
Whether and when a socket is destroyed or not after a timeout is up to
the user. This leaves an edge case where a socket that has emitted
'timeout' might be re-used from the free pool. Even if destroy is called
on the socket, it won't be removed from the freelist until 'close' which
can happen several ticks later.

Sockets are removed from the free list on the 'close' event.
However, there is a delay between calling destroy() and 'close'
being emitted. This means that it possible for a socket that has
been destroyed to be re-used from the free list, causing unexpected
failures.

PR-URL: #32000
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Mar 10, 2020
targos pushed a commit that referenced this pull request Apr 20, 2020
Whether and when a socket is destroyed or not after a timeout is up to
the user. This leaves an edge case where a socket that has emitted
'timeout' might be re-used from the free pool. Even if destroy is called
on the socket, it won't be removed from the freelist until 'close' which
can happen several ticks later.

Sockets are removed from the free list on the 'close' event.
However, there is a delay between calling destroy() and 'close'
being emitted. This means that it possible for a socket that has
been destroyed to be re-used from the free list, causing unexpected
failures.

PR-URL: #32000
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
const sockets = agent.freeSockets;
for (const name of ObjectKeys(sockets)) {
if (sockets[name].includes(s)) {
return s.destroy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should remove the destroy socket from freeSockets list immediately to prevent new requests from being sent through this socket.

if (sockets[name].includes(s)) {
  s.destroy();
  return agent.removeSocket(s, options);
}

debug('CLIENT socket onTimeout');

// Destroy if in free list.
// TODO(ronag): Always destroy, even if not in free list.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this TODO plan to be completed? From the current usage, it may be a breaking change, which makes timeout change from idletimeout to datatimeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants