Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Supporting existing indices migration to SegRep and Remote Store #7986

Open
gbbafna opened this issue Jun 9, 2023 · 2 comments
Open
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing RFC Issues requesting major changes Roadmap:Cost/Performance/Scale Project-wide roadmap label Storage Issues and PRs relating to data and metadata storage

Comments

@gbbafna
Copy link
Collaborator

gbbafna commented Jun 9, 2023

This feature proposal is WIP. We will continue to add details to Sections that are marked with ToDo.

Goal

OpenSearch will be launching remote store feature and has already GA'd Segment Replication. However this replication method and durability enhancement is only available for the newer indices. The next step is to support migration of existing indices into SegRep and Remote Store enablement.

Requirements

Functional

  • Data integrity - No data loss due to the migration process itself.
  • High availability - Index should be available for writes and reads during the migration .
  • High durability - We should not regress on durability guarantees provided by existing configuration.
  • Failovers & Recovery - Continue as normal.

Non-Functional

  • Minimal disruptions to write - As we reload engines in place , we will need to hold on writes . We should strive to make them minimal .
  • No extra capacity requirement - We shouldn’t need extra nodes/space for the migration .
  • Disruptions to read - Minimal disruptions to read capacity .

Non-Requirements

  • We are assuming the end state to be SegRep and Remote Store enabled and not just SegRep enablement. This is just to reduce the modes we need to support to start with . We can provide migration to just SegRep as an incremental feature, which would reuse most of the components designed here.
  • SegRep to DocRep Migration - This is also not covered as part of this feature. However this is also an incremental feature , which could be considered later on if needed .

Potential Approaches

[Recommended]

Rolling restarts of replica copies

Here we restart replicas one by one . The challenge here is to make primary understand both SegRep and DocRep. We will also need to store replica's property durably . Primary will send checkpoint update to segrep based indices and documents to docrep based indices.

ToDo : Exploration is still ongoing in this.

Enabling Remote Store & Remote Translog followed by SegRep Enablement

We would support remote segment store and translog for DocRep indices . This will give us ability to store data durably even with writing to one copy of data. Proposed migration steps would be executed in a FSM . Below are the proposed high level details . More details will be covered in a separate issue.

  1. Enable Remote Store and Remote Translog for DocRep
    1. Seed Remote Store first - This ensures subsequent refresh doesn’t time out .
    2. Take all the permits on all primary shards
    3. enable remote store and remote translog on primary .
    4. release the permits to enable writes on all primary shards.
  2. Enable SegRep :
    1. Decrease replica to 0
    2. Take all the permits on primary
    3. Reload the primary engine as InternalEngine with SegRep and Remote Store integration with SegRep enabled
    4. Enable the writes.
    5. Increase replica count to previous value

Alternative Approaches

Bringing new replica copies w/o remote store

Following steps could help migrate index :

  1. Set replica to 0 .
  2. Take all the permits on primary
  3. Reload primary engine to do SegRep .
  4. Release the permits to enable writes.
  5. Sets replica to previous value .
  6. Enable Remote Translog Store , followed by Remote Segment Store.

The con in this approach is regression in durability and availability guarantees. During the times where new replica is coming up , shards are left with only 1 copy .

Using Remote Store for Async durability

  1. Seed Remote Store first - This ensures subsequent refresh doesn’t time out .
  2. Enable Remote Store for DocRep
    1. Take all the permits on all primary shards
    2. enable remote store on primary , by reloading the engine.
    3. release the permits to enable writes. on all primary shards
    4. Wait for all shards to.
  3. Enable SegRep :
    1. Decrease replica to 0
    2. Stop the writes.
    3. Reload the primary engine as InternalEngine with SegRep enabled.
    4. Enable the writes.
    5. Increase replica count to previous value
  4. Enable RemoteTranslog

Using Remote Translog Store for durability

We can’t just use Remote Translog for durability. It needs to be supplemented with Remote Segment Store. Hence this is not feasible.

Comparison

ToDo

Potential Issues

ToDo

Next Steps

  1. POC to check feasibility of enabling Remote Translog and Remote Segment Store on DocRep based indices.
  2. POC to create FSM based migration of indices while acquiring permits.
  3. Deeper exploration for Rolling restarts of replica copies .
@gbbafna gbbafna added enhancement Enhancement or improvement to existing feature or request untriaged RFC Issues requesting major changes and removed untriaged labels Jun 9, 2023
@mch2
Copy link
Member

mch2 commented Jun 9, 2023

@gbbafna Thanks for writing this up! Couple thoughts:

We are assuming the end state to be SegRep and Remote Store enabled and not just SegRep enablement. This is just to reduce the modes we need to support to start with . We can provide migration to just SegRep as an incremental feature, which would reuse most of the components designed here.

It makes sense to start with node-node, but with the lower level components abstracting away the source of replication I think the complexity is mostly in configuration. How are you envisioning the conversion being initiated? We would likely need a new API here to go from DocRep -> SegRep w/ remote storage to properly update all settings.

SegRep to DocRep Migration

Until remote store + DocRep is supported as a standalone feature I think its reasonable that conversion from SegRep with remote store back to docRep would remove remote store capabilities? With that said, I think it would be wise to support this first. If a user switches to SegRep and wishes to revert for whatever reason the only option would be a reindex. Also, complexity wise I think this would actually be a fairly trivial engine swap on replicas.

The challenge here is to make primary understand both SegRep and DocRep. We will also need to store replica's property durably . Primary will send checkpoint update to SegRep based indices and documents to DocRep based indices.

Currently we are sending all docs to SegRep based indices for durability. Are you referring to remote translog case?

In general, for DocRep -> SegRep I think the approach of rolling restarts of replica engines is the right one. I'd imagine we would need a full recovery here so that the shard is not serving stale reads until it catches up. Would be great to do this without triggering any reallocation/failing the shard but I don't think is something that exists today. An alternative here is to fetch the required segments from primary's latest cp and write to a separate directory, but this would likely not be feasible with disk constraints.

@anasalkouz anasalkouz added the Indexing Indexing, Bulk Indexing and anything related to indexing label Jun 9, 2023
@gbbafna
Copy link
Collaborator Author

gbbafna commented Jun 12, 2023

Thanks @mch2 for the review and feedback .

It makes sense to start with node-node, but with the lower level components abstracting away the source of replication I think the complexity is mostly in configuration. How are you envisioning the conversion being initiated? We would likely need a new API here to go from DocRep -> SegRep w/ remote storage to properly update all settings

Yes, the initial idea was an API which would trigger an FSM and might need to store the details in cluster state as well .

Until remote store + DocRep is supported as a standalone feature I think its reasonable that conversion from SegRep with remote store back to docRep would remove remote store capabilities?

Yes .

With that said, I think it would be wise to support this first. If a user switches to SegRep and wishes to revert for whatever reason the only option would be a reindex. Also, complexity wise I think this would actually be a fairly trivial engine swap on replicas.

Agreed . Once we have all the details hashed out and POC done , we might do this in first phase as well .

Currently we are sending all docs to SegRep based indices for durability. Are you referring to remote translog case?

I am referring to the case, where we are hydrating the replica from primary segments. Since it is going to take a good amount of time as it is full recovery , the solution is not durable for 1 replica indices.

. An alternative here is to fetch the required segments from primary's latest cp and write to a separate directory, but this would likely not be feasible with disk constraints.

This is what we explored as well. But due to disk constraints , we didn't list it out here.

@gbbafna gbbafna changed the title [RFC] [Draft] Supporting existing indices migration to SegRep and Remote Store [RFC] Supporting existing indices migration to SegRep and Remote Store Jun 19, 2023
@Bukhtawar Bukhtawar added the Storage Issues and PRs relating to data and metadata storage label Jul 27, 2023
@sohami sohami added the Roadmap:Cost/Performance/Scale Project-wide roadmap label label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing RFC Issues requesting major changes Roadmap:Cost/Performance/Scale Project-wide roadmap label Storage Issues and PRs relating to data and metadata storage
Projects
Status: New
Status: 🆕 New
Development

No branches or pull requests

6 participants