Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wait in replica recovery for allocation id to propagate on source node #15558

Merged
merged 3 commits into from
Sep 6, 2024

Conversation

gbbafna
Copy link
Collaborator

@gbbafna gbbafna commented Sep 1, 2024

Description

When remote cluster state is enabled, cluster state propagation might get delayed . This cause replica recoveries to fail complaining that source node does not have the shard listed in its state as allocated on the node . This PR adds retry and backoff and gives some time for cluster state to get propagated and would prevent shards from failing due to same.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Sep 1, 2024

❌ Gradle check result for fe22335: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 2, 2024

❌ Gradle check result for dc1afb5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, pls address

Copy link
Contributor

github-actions bot commented Sep 4, 2024

❌ Gradle check result for 221243c: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 5, 2024

❌ Gradle check result for 1213150: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 5, 2024

❌ Gradle check result for 1213150: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Copy link
Contributor

github-actions bot commented Sep 5, 2024

✅ Gradle check result for 487fa64: SUCCESS

Copy link

codecov bot commented Sep 5, 2024

Codecov Report

Attention: Patch coverage is 88.46154% with 3 lines in your changes missing coverage. Please review.

Project coverage is 71.94%. Comparing base (729e40d) to head (487fa64).
Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
...search/indices/recovery/RecoverySourceHandler.java 86.95% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #15558      +/-   ##
============================================
- Coverage     71.95%   71.94%   -0.02%     
- Complexity    64192    64197       +5     
============================================
  Files          5270     5271       +1     
  Lines        300052   300181     +129     
  Branches      43368    43384      +16     
============================================
+ Hits         215917   215963      +46     
- Misses        66442    66493      +51     
- Partials      17693    17725      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ashking94 ashking94 merged commit 3c6019d into opensearch-project:main Sep 6, 2024
34 checks passed
@gbbafna gbbafna added backport 2.17 backport 2.x Backport to 2.x branch labels Sep 6, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 6, 2024
… node (#15558)

* Add wait for target allocation id to appear

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* making waitForAssignment same

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* Add more test

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

---------

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
(cherry picked from commit 3c6019d)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 6, 2024
… node (#15558)

* Add wait for target allocation id to appear

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* making waitForAssignment same

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* Add more test

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

---------

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
(cherry picked from commit 3c6019d)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gbbafna pushed a commit that referenced this pull request Sep 6, 2024
… node (#15558) (#15785)

(cherry picked from commit 3c6019d)

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024
… node (opensearch-project#15558)

* Add wait for target allocation id to appear

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* making waitForAssignment same

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* Add more test

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

---------

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants