Health Monitor continues to log after errors #2523
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When health_monitor is sending metrics to, for example, graphite, and there is a network error (e.g.
Errno::EPIPE
), it stops sending metrics until health_monitor is restarted.This was a regression due to replacing the
EventMachine
gem with theAsync
gem.This commit fixes the regression by, when a network error occurs, attempting to re-establish the connection and continue sending data.
The attempt to re-establish the connection follows the same retry & backoff logic as when establishing the initial connection.
Note: we feel the method
unbind
is poorly named; it should be namedclose_old_and_open_new_connection
.[fixes #2522]
[#187636407]
What is this change about?
See commit message.
Please provide contextual information.
See #2522.
What tests have you run against this PR?
How should this change be described in bosh release notes?
Health Monitor attempts to re-establish plugins' network connections when disconnected.
Does this PR introduce a breaking change?
No.
Tag your pair, your PM, and/or team!
@mingxiao @cunnie