Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry mechanism to handle connection reset by peer for Conjur reads #63

Open
diverdane opened this issue Mar 23, 2020 · 0 comments
Open

Comments

@diverdane
Copy link

diverdane commented Mar 23, 2020

Is your feature request related to a problem? Please describe.

An issue came up in a customer support case where the PCF conjur-env binary that was being used to retrieve secrets was reporting "read: connection reset by peer" errors. This in turn was causing the customer application to read a null string for a port environment variable.

The connection reset seemed to be coming from something in the network, not from Conjur itself: follower logs showed no errors, and no signs that a request for the port variable was ever received. Best guess is that these connection resets are coming from something in the customer topology, e.g. a load balancer or something like that. Whatever was doing the connect reset must have been experiencing CPU or buffer overload, maybe because of bursty network traffic, or lots of concurrent messages being received. (Reference: golang/go#20960).

Since these errors only happened in one setup in the customer's network (1 out of 100 or more), and the errors in that setup were intermittent (happening ~50% of the time), it is clear that a retry mechanism on the Conjur API side would help recover gracefully in cases such as these.

It would also help to have a limit on the number of concurrent messages being sent by the Conjur API.

The proposal is to add a retry mechanism to the Conjur Golang API for network errors such as ECONNRESET or ECONNABORTED. A PoC has been created by @doodlesbykumbi: https://github.com/doodlesbykumbi/cloudfoundry-conjur-buildpack/pull/1/files

Ideally, time permitting, it would be really good to have the retry parameters configurable e.g. in a secretless.yml configuration.

Describe the solution you would like

Queries of Conjur variables get retried for ECONNRESET or ECONNABORTED errors. The character of the retries can be:

  • Minimum: Always enabled, fixed number of retries, fixed delay between retry attempts
  • Nice-to-Have: Enabling, number of retries, and retry delay configurable in a secretless.yml file.
  • Nice-to-Have: Limits on concurrency configurable either in secretless.yml or in a Conjur variable.

Describe alternatives you have considered

A clear and concise description of any alternative solutions or features that may be related to this that
you have considered.

Additional context

Add any other context information about the feature request here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants