Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] API support to restore data from remote store #3145

Closed
sachinpkale opened this issue May 3, 2022 · 14 comments
Closed

[Remote Store] API support to restore data from remote store #3145

sachinpkale opened this issue May 3, 2022 · 14 comments
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework

Comments

@sachinpkale
Copy link
Member

Describe the solution you'd like
This will provide manual way for use to restore data from remote store for a given index and shard. It could be a new API or the modification of existing snapshot restore API.

@sachinpkale sachinpkale added enhancement Enhancement or improvement to existing feature or request untriaged labels May 3, 2022
@kartg kartg added distributed framework Storage:Durability Issues and PRs related to the durability framework and removed untriaged labels May 3, 2022
@sachinpkale
Copy link
Member Author

We can re-use snapshot restore API to restore data from remote segment and translog store.
This may require having an additional query parameter to differentiate between segment restore vs restore from remote store

@sachinpkale
Copy link
Member Author

We have decided to go with a separate REST endpoint.
PR link for adding rest endpoint for restore: #3576

@dblock
Copy link
Member

dblock commented Jun 14, 2022

We have decided to go with a separate REST endpoint. PR link for adding rest endpoint for restore: #3576

Is there a discussion somewhere?

For snapshots we do _snapshot/my-repository/2/_restore, so this is going to be _remotestore/_restore? Where can I see the complete API to manipulate _remotestore?

@sachinpkale
Copy link
Member Author

Is there a discussion somewhere?

This issue is where we should be having discussion around the restore API. Following are the points where we decided to not go with snapshot restore API:

  • Snapshot restore API requires snapshot_id as one of the path parameters. In remote store, as we don't have snapshot IDs, we need to provide dummy snapshot id which is not user friendly API behavior.
  • Snapshot restore restores only segments data. As remote store will be storing committed (segments) as well as uncommitted (remote) data to the segment store, we need a separate flow in the restore API implementation where data is restored from remote translog as well as remote segment store.

For snapshots we do _snapshot/my-repository/2/_restore, so this is going to be _remotestore/_restore?

Yes, it will be _remotestore/_restore.

Where can I see the complete API to manipulate _remotestore?

In remote-store V1, we will have following APIs on remote store:

  1. Restore
  2. Restore status
  3. Remote store sync status

As part of #3576, we started with restore API.

@dblock
Copy link
Member

dblock commented Jun 15, 2022

Could I please have some more eyes on the API proposal, maybe @andrross? Just want to make sure it doesn't conflict with things like #2922, and I assume this is in line with #1968.

@andrross
Copy link
Member

Where can I see the complete API to manipulate _remotestore?

In remote-store V1, we will have following APIs on remote store:

Restore
Restore status
Remote store sync status

I think a little more detail here would be useful, even if not everything is implemented initially. The initial questions I have are: how is a user expected to use the restore API? A remote store-backed index can be created by defining the appropriate index settings, but then how does a user get to the point of restoring that backed-up data to a new index? (i.e. do they delete the index and then later decide to restore it? if so, how do they discover which indexes are restorable from the remote store?). Maybe I'm misunderstanding something, but sketching out the use case / lifecycle of a remote store-backed index would probably be helpful.

Specific to the restore API, will it have the same or similar semantics to snapshot restore? I think it's really helpful to write something like that linked documentation (doesn't have to be super polished) defining the behavior and a quick description of all parameters before implementing an API. Does something like that exist?

@sachinpkale
Copy link
Member Author

Where can I see the complete API to manipulate _remotestore?

In remote-store V1, we will have following APIs on remote store:
Restore
Restore status
Remote store sync status

I think a little more detail here would be useful, even if not everything is implemented initially. The initial questions I have are: how is a user expected to use the restore API? A remote store-backed index can be created by defining the appropriate index settings, but then how does a user get to the point of restoring that backed-up data to a new index? (i.e. do they delete the index and then later decide to restore it? if so, how do they discover which indexes are restorable from the remote store?). Maybe I'm misunderstanding something, but sketching out the use case / lifecycle of a remote store-backed index would probably be helpful.

Let me add Lifecycle of data in Remote store section in the design proposal. It would be incremental approach but would be able to cover V1 details under it.

Specific to the restore API, will it have the same or similar semantics to snapshot restore? I think it's really helpful to write something like that linked documentation (doesn't have to be super polished) defining the behavior and a quick description of all parameters before implementing an API. Does something like that exist?

We will definitely have the documentation for _remotestore APIs. Let me create a tracking issue for the same. If you look at the API spec added in PR :#3576, I have provided a link to the doc as well.

@sachinpkale
Copy link
Member Author

sachinpkale commented Jun 23, 2022

@andrross

@andrross
Copy link
Member

In case of data loss scenario, data for a given set of indices can be restored using following API

How does this scenario happen? Also, how does the user discover which indices can be recovered via the remote store?

@sachinpkale
Copy link
Member Author

How does this scenario happen?

Here, data loss is referred to red index with no valid shard copy exists. We also want to make this process automated and this automated recovery can be enabled/disabled at runtime. #3145. I will update the section with these details as well.

Also, how does the user discover which indices can be recovered via the remote store?

As of V1, we will not support a separate API to get all the indices with remote store enabled. But as the setting is a part of index, user can still fetch such indices by fetching index settings.
But I understand having this API would be helpful. We can track it as a part of V2.

@andrross
Copy link
Member

Here, data loss is referred to red index with no valid shard copy exists.

Restoring a snapshot to an existing index requires closing or deleting the existing index first, right? Would that be required here as well, or will this API behave differently from snapshot restore?

@sachinpkale
Copy link
Member Author

sachinpkale commented Jun 23, 2022

Here, data loss is referred to red index with no valid shard copy exists.

Restoring a snapshot to an existing index requires closing or deleting the existing index first, right? Would that be required here as well, or will this API behave differently from snapshot restore?

That's correct, for this API as well, existing index needs to be closed before restoring. In case of red index, the state changes to closed.

@andrross
Copy link
Member

andrross commented Jun 23, 2022

Thanks for bearing with me! Here's my understanding of this API, and please correct me where I'm wrong:

Remote store restore allows restoring an existing index that was created with the remote_store=true setting. The existing index must be closed in order to restore it. The index can only be restored with the same name and settings it was created with. Multiple indexes can be specified in the restore request, but wildcards are not supported. Deleted indexes cannot be restored because they will be removed from remote storage upon deletion. The intended usage of this API is to restore indexes that are in the red status due to no longer having a local copy of one or more shards.

Assuming that is correct, then will this API have any utility once automated restore is implemented? If a shard copy exists in remote storage and is guaranteed to be fully up-to-date, then under what scenario would it not be the right thing to pull that copy down and restore the shard to health?

The question-behind-the-question here is whether introducing a limited restore API here is the right thing to do, versus making automated recovery the behavior when a local copy of the shard is lost. I don't want to expand the scope of V1, but we should be really careful about introducing APIs if they will soon be obsolete.

@sachinpkale
Copy link
Member Author

Assuming that is correct, then will this API have any utility once automated restore is implemented? If a shard copy exists in remote storage and is guaranteed to be fully up-to-date, then under what scenario would it not be the right thing to pull that copy down and restore the shard to health?

The question-behind-the-question here is whether introducing a limited restore API here is the right thing to do, versus making automated recovery the behavior when a local copy of the shard is lost. I don't want to expand the scope of V1, but we should be really careful about introducing APIs if they will soon be obsolete.

  • I think even with the automated flow, we will need the API due to following reasons:
    • API gives more control to user on what to restore and when. Out of X red indices, user may want to restore few of the important indices first.
    • We will be providing few performance knobs to user (like async upload to remote translog) which will impact the durability guarantees. In such cases, we may not want to restore automatically.
  • Also, I yet to look at the automated flow and the feasibility of it. We certainly don't want to restore unnecessarily. If we find too many edge cases in the automated flow, at least for V1, we can go with the API. Most of the implementation will be re-used anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Storage:Durability Issues and PRs related to the durability framework
Projects
None yet
Development

No branches or pull requests

4 participants