Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer remote replicas for primary promotion during migration #13136

Merged
merged 3 commits into from
Apr 23, 2024

Conversation

ltaragi
Copy link
Contributor

@ltaragi ltaragi commented Apr 9, 2024

Description

  • During remote store migration (MIXED compatibility mode and REMOTE_STORE migration direction), we aim to continuously move forward.
  • To ensure this, in case of a primary shard failover:
    • If the primary and one or more replica copies had already migrated to remote nodes, then the new primary must be chosen randomly from the migrated replicas
    • If no migrated replica is found, fall back upon the existing priority choosing
  • This preference has been implemented as follows:
    • metadata is used to identify migration conditions
    • activeReplicaOnRemoteNode makes use of the failedShard's id, to obtain migrated replica copies and randomly returns one

Related Issues

Resolves #13135

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework Storage:Remote labels Apr 9, 2024
@ltaragi ltaragi added skip-changelog and removed enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework Storage:Remote labels Apr 9, 2024
Copy link
Contributor

github-actions bot commented Apr 9, 2024

Compatibility status:

Checks if related components are compatible with change f1ed40d

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

Copy link
Contributor

github-actions bot commented Apr 9, 2024

✅ Gradle check result for ef77db6: SUCCESS

Copy link

codecov bot commented Apr 9, 2024

Codecov Report

Attention: Patch coverage is 78.57143% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 71.54%. Comparing base (b15cb0c) to head (3a25b08).
Report is 199 commits behind head on main.

Files Patch % Lines
...a/org/opensearch/cluster/routing/RoutingNodes.java 81.81% 0 Missing and 2 partials ⚠️
...earch/node/remotestore/RemoteStoreNodeService.java 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13136      +/-   ##
============================================
+ Coverage     71.42%   71.54%   +0.12%     
- Complexity    59978    60714     +736     
============================================
  Files          4985     5039      +54     
  Lines        282275   285446    +3171     
  Branches      40946    41342     +396     
============================================
+ Hits         201603   204234    +2631     
- Misses        63999    64297     +298     
- Partials      16673    16915     +242     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ltaragi ltaragi changed the title Remote replica preference Prefer remote replicas for primary promotion during migration Apr 10, 2024
@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework Storage:Remote labels Apr 10, 2024
Copy link
Contributor

❌ Gradle check result for de62d59: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for f1ed40d: SUCCESS

Copy link
Contributor

❕ Gradle check result for fa31864: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Signed-off-by: Lakshya Taragi <lakshya.taragi@gmail.com>
Signed-off-by: Lakshya Taragi <lakshya.taragi@gmail.com>
Signed-off-by: Lakshya Taragi <lakshya.taragi@gmail.com>
Copy link
Contributor

✅ Gradle check result for 3a25b08: SUCCESS

@ltaragi ltaragi self-assigned this Apr 23, 2024
@gbbafna gbbafna merged commit efa06fe into opensearch-project:main Apr 23, 2024
28 of 32 checks passed
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Apr 23, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 23, 2024
Signed-off-by: Lakshya Taragi <lakshya.taragi@gmail.com>
(cherry picked from commit efa06fe)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ltaragi added a commit to ltaragi/OpenSearch that referenced this pull request Apr 25, 2024
…arch-project#13136)

Signed-off-by: Lakshya Taragi <lakshya.taragi@gmail.com>
(cherry picked from commit efa06fe)
gbbafna pushed a commit that referenced this pull request Apr 25, 2024
#13377)

Signed-off-by: Lakshya Taragi <lakshya.taragi@gmail.com>
(cherry picked from commit efa06fe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request skip-changelog Storage:Durability Issues and PRs related to the durability framework Storage:Remote
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

[Remote Store] Prefer remote replicas for primary promotion during migration
4 participants