Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix retry logic in BrokerClient and flakey BrokerClientTest #16618

Merged
merged 2 commits into from
Jun 17, 2024

Conversation

abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Jun 17, 2024

BrokerClientTest was added in PR #14322. When running this test from the IDE (outside of Maven), testError() reliably fails. However, it passes in CI due to the Maven setting <surefire.rerunFailingTestsCount>3</surefire.rerunFailingTestsCount> in the pom.xml. This setting allows Maven to retry the test up to three times, and the test ultimately passes because of the retry logic in the code, but maven treats it as "flaky".

Changes

Looking into the error handling in BrokerClient, it was found that the retry logic doesn't effectively handle 5xx DruidException categories since the HTTP error codes are converted into DruidExceptions. So the retry logic didn't fully account transient errors and this patch fixes that. We could perhaps change the semantics of the method to just use the HTTP response, or write a service client that uses the built-in mechanisms, but it'll be a larger change.

Also, I removed the code that converts all non-200 error codes into retryable 5xx DruidException categories. For example, 4xx error codes shouldn't be retried. If there are additional 500 error codes we need to retry, we can add them explicitly similar to the ones that are currently handled.

This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.

The testError() method reliably fails in the IDE. This is because the
the test runner has

<surefire.rerunFailingTestsCount>3</surefire.rerunFailingTestsCount> is set to 3, so maven
retries this "flaky test" multiple times and the test code returns a successful response
in the third attempt.

The exception handling in BrokerClientTest was broken:
- All non-2xx errors were being turned as 5xx errors. Remove that block of
code. If we need to handle retries of more specific 5xx error codes, that should be
hanlded explicitly. Or if there's a source of 4xx class error that needs to be 5xx,
fix that in the source of error.
Copy link
Contributor

@zachjsh zachjsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abhishekrb19 abhishekrb19 merged commit 51b2f6c into apache:master Jun 17, 2024
87 checks passed
@abhishekrb19 abhishekrb19 deleted the fix_flaky_broker_test branch June 17, 2024 19:48
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants