You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
An issue came up in a customer support case where the PCF conjur-env binary that was being used to retrieve secrets was reporting "read: connection reset by peer" errors. This in turn was causing the customer application to read a null string for a port environment variable.
The connection reset seemed to be coming from something in the network, not from Conjur itself: follower logs showed no errors, and no signs that a request for the port variable was ever received. Best guess is that these connection resets are coming from something in the customer topology, e.g. a load balancer or something like that. Whatever was doing the connect reset must have been experiencing CPU or buffer overload, maybe because of bursty network traffic, or lots of concurrent messages being received. (Reference: golang/go#20960).
Since these errors only happened in one setup in the customer's network (1 out of 100 or more), and the errors in that setup were intermittent (happening ~50% of the time), it is clear that a retry mechanism on the Conjur API side would help recover gracefully in cases such as these.
It would also help to have a limit on the number of concurrent messages being sent by the Conjur API.
Is your feature request related to a problem? Please describe.
An issue came up in a customer support case where the PCF conjur-env binary that was being used to retrieve secrets was reporting "read: connection reset by peer" errors. This in turn was causing the customer application to read a null string for a
port
environment variable.The connection reset seemed to be coming from something in the network, not from Conjur itself: follower logs showed no errors, and no signs that a request for the port variable was ever received. Best guess is that these connection resets are coming from something in the customer topology, e.g. a load balancer or something like that. Whatever was doing the connect reset must have been experiencing CPU or buffer overload, maybe because of bursty network traffic, or lots of concurrent messages being received. (Reference: golang/go#20960).
Since these errors only happened in one setup in the customer's network (1 out of 100 or more), and the errors in that setup were intermittent (happening ~50% of the time), it is clear that a retry mechanism on the Conjur API side would help recover gracefully in cases such as these.
It would also help to have a limit on the number of concurrent messages being sent by the Conjur API.
The proposal is to add a retry mechanism to the Conjur Golang API for network errors such as ECONNRESET or ECONNABORTED. A PoC has been created by @doodlesbykumbi: https://github.com/doodlesbykumbi/cloudfoundry-conjur-buildpack/pull/1/files
Ideally, time permitting, it would be really good to have the retry parameters configurable e.g. in a secretless.yml configuration.
Describe the solution you would like
Queries of Conjur variables get retried for ECONNRESET or ECONNABORTED errors. The character of the retries can be:
Describe alternatives you have considered
A clear and concise description of any alternative solutions or features that may be related to this that
you have considered.
Additional context
Add any other context information about the feature request here.
The text was updated successfully, but these errors were encountered: