Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve OpenSearch retry mechanism #2641

Closed
kkondaka opened this issue May 4, 2023 · 2 comments
Closed

Improve OpenSearch retry mechanism #2641

kkondaka opened this issue May 4, 2023 · 2 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@kkondaka
Copy link
Collaborator

kkondaka commented May 4, 2023

Is your feature request related to a problem? Please describe.
Currently OpenSearch supports exponential backoff with optional max-retries as retry mechanism. This is not very useful because the sleep time between the retries becomes exceedingly high some retries. Here is the wait times (in milliseconds) before each retry

  1 50
  2 60
  3 80
  4 150
  5 280
  6 580
  7 1250
  8 2740
  9 6050
 10 13430
 11 29840
 12 66380
 13 147680
 14 328630
 15 731340
 16 1627580
 17 3622210
 18 8061330
 19 17940780
 20 39927900
 21 88861140
 22 197764060
 23 440131970
 24 979531670

At iteration 21, with a value of 88,861,140 is more than one day of wait time. And iteration 25 would result in integer overflow of the wait time.

Describe the solution you'd like
Provide a more reasonable wait scheme before retries. It would be a hybrid model of exponential back off and constant backoff. First exponential backoff followed by constant backoff. It would have four parameters.

  1. Initial delay ( Enforce min and max values)
  2. Max wait time with exponential backoff (Enforce maximum to be less than one hour or even less like 15 minutes)
  3. Value of constant wait time for constant backoff (For example, 15 minutes)
  4. Max wait time with constant backoff

First retry is done after initial delay
Next "n" number of retries are done until max exponential backoff wait time
Next "m" number of retries are done at constant pace of configured interval for max constant backoff wait time

Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@kkondaka kkondaka self-assigned this May 4, 2023
@dlvenable
Copy link
Member

Could we simplify the configurations to just an initial delay and a maximum backoff time?

We can keep the exponential backoff approach - I'm not sure pipeline authors need to configure this.

Also, we might be able to get a partial fix in for 2.2.1 if we just limit the backoff time. We can't add new parameters to a patch release, so we could set a reasonable value (say 15 minutes).

Then in 2.3 we can add the parameters for users to tune as they see fit.

@dlvenable dlvenable added bug Something isn't working and removed untriaged labels May 5, 2023
@dlvenable dlvenable added this to the v2.3 milestone May 5, 2023
@dlvenable
Copy link
Member

Completed by #2643.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants