Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Design - Dual Mode Replication during Remote Store migration #12413

Closed
shourya035 opened this issue Feb 21, 2024 · 1 comment · Fixed by #12821
Closed

[Remote Store] Design - Dual Mode Replication during Remote Store migration #12413

shourya035 opened this issue Feb 21, 2024 · 1 comment · Fixed by #12821
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Storage:Durability Issues and PRs related to the durability framework Storage:Remote

Comments

@shourya035
Copy link
Member

shourya035 commented Feb 21, 2024

Introduction:

In order to support migration to RemoteStore backed nodes, we would be moving over shards from DocRep backed nodes to the RemoteStore backed and SegRep enabled ones. The migration would be done as:

  • Primaries being moved to RemoteStore based nodes first
  • Replicas would follow after the primary relocation is completed
  • Applying remote store based index settings

More details on the migration process is here : #12246

During this phase, there would be a time wherein certain shard copies in a replication group resides in a DocRep engine based node while the primary has moved over to RemoteStore enabled ones. We would need to support a mixed replication mode to cater to the tenet that there would be no impact to the index and search traffic during the migration process.


Tenets:

  • No impact to search and indexing functionality
  • Data consistency should be intact
  • Snapshot creation/restore process would continue as is

Handling dual mode replication on _shrink and _split API invocation during the migration process would be handled separately and will not be a part of this enhancement story.


Proposed solution:

Today, we depend on the index metadata to determine if an index is Remote/Segrep enabled or Docrep enabled. Since index metadata update will take place after all shard copies have moved over to the remote enabled nodes, the source of truth will be moved over to node attributes instead of index metadata.

With the new MIXED compatibility mode introduced through #11986 , node attributes would be considered for determining the replication mode and remote upload/download enablement when compatibility mode is set to MIXED and the migration direction is set.

To ensure data consistency on failovers during this migration process, Peer Recovery Retention Lease (PRRL) publication would be kept unblocked during this time. This is done to ensure that we do not lose out on any sequence number based recovery when a DocRep enabled replica shard copy in the replication group is promoted to a primary. Checks would be introduced to ensure that there are no missing sequence numbers during this failover process.

The following diagram explains the flow for a write request in this stage:

Dual Mode Replication - New Flow


The entire dual mode replication change set would be divided in the following 4 charters:

  • Handle replication action changes on primary in the write path
  • Handle replication action changes on replica in the write path
  • Handle GlobalCheckpointSyncAction and PublishCheckpointAction replication actions
  • Handle PRRLs during the migration process
@shourya035 shourya035 added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 21, 2024
@shourya035 shourya035 changed the title [Remote Store] Dual Mode Replication during Remote Store migration [Remote Store] Design - Dual Mode Replication during Remote Store migration Feb 21, 2024
@peternied peternied added RFC Issues requesting major changes Storage:Durability Issues and PRs related to the durability framework and removed untriaged labels Feb 21, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5]
@shourya035 Thanks for creating this issue, look forward to see where this lands

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Storage:Durability Issues and PRs related to the durability framework Storage:Remote
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants