Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Azure snapshot plugin to support taking snapshot into multiple storage accounts. #22709

Closed
wants to merge 2 commits into from

Conversation

JeffreyZZ
Copy link

The default Elasticsearch Azure snapshot plugin write the whole snapshot data into a single Azure storage account. For big Elasticsearch cluster with multiple TB data in size, the snapshot could fail because of the storage account throttling limit. Here is an example of the snapshot failure error:

    {
      "node_id": "cR_OjhxaRvW0ZmR14w8c6g",
      "index": "rawevents_v3.2016_06_29",
      "reason": "IndexShardSnapshotFailedException[[rawevents_v3.2016_06_29][1] Failed to perform snapshot (index files)]; nested: IOException; nested: StorageException[The server encountered an unknown failure: ]; nested: IOException[Error writing to server]; ",
      "shard_id": 1,
      "status": "INTERNAL_SERVER_ERROR"
    }

To address this problem, we extend Elasticsearch Azure plugin by adding the feature to support taking snapshot into and restore from multiple storage accounts to avoid overload a single storage account. That said, Elasticsearch can write their snapshot data into multiple storage accounts evenly and in parallel.

Here are the configuration as well as the commands to take snapshot and restore snapshot:

Elasticsearch.yml
cloud.azure.storage.my_account1.account: storageaccount1
cloud.azure.storage.my_account1.key: key1
cloud.azure.storage.my_account1.default: true
cloud.azure.storage.my_account2.account: storageaccount2
cloud.azure.storage.my_account2.key: key2
cloud.azure.storage.my_account2.default: true
cloud.azure.storage.my_account3.account: storageaccount3
cloud.azure.storage.my_account3.key: key3
cloud.azure.storage.my_account3.default: true

Commands
#1: define repository
PUT _snapshot/plugintest160921
{
"type": "azure",
"settings": {
"account": "my_account1,my_account2,my_account3",
"container": "plugintest160921"
}
}

#2: take snapshot
PUT _snapshot/plugintest160921/backup0921?wait_for_completion=true
{
}

#3: restore
POST _snapshot/plugintest160921/backup0921/_restore?wait_for_completion=true
{
"ignore_unavailable": "true",
"include_global_state": false
}

#4: define repository for secondary
PUT _snapshot/plugintest160921
{
"type": "azure",
"settings": {
"account": "my_account1,my_account2,my_account3",
"container": "plugintest160921",
"location_mode": "secondary_only"
}
}

@clintongormley
Copy link

Thanks for the PR. Please could I ask you to sign the CLA before we review it?
http://www.elasticsearch.org/contributor-agreement/

@rjernst
Copy link
Member

rjernst commented Jan 20, 2017

Perhaps we need to be better about detecting throttling events and backing off instead of failing, and you also may need to expand your storage to increase the IOPS capacity, but I don't think we should make the (already complicated) repository settings here even more complicated. Introducing multiple accounts for a single repository raises all kinds of questions that I don't think we should be worrying about.

@abeyad
Copy link

abeyad commented Jan 20, 2017

I'm not sure this will solve the problem either. What if all configured storage accounts become full - then what? We would have to rehash the blobs to different buckets once more storage accounts are added. I think it will become too difficult to maintain and will require a lot of logic to solve a problem that most don't have. The simple solution here would be to create a new repository located at a different storage account. It will add a bit of extra complexity on the user to look into multiple repositories to find the snapshot they may be looking for, but in this case, I believe its the right tradeoff.

@abeyad
Copy link

abeyad commented Jan 20, 2017

Sorry I misunderstood regarding a throttling limit on the Azure storage accounts vs. a size limit. In any case, I think the complexity argument still holds (adding or removing a storage account would require rehashing all blobs to different accounts).

Perhaps we need to be better about detecting throttling events and backing off instead of failing

++

@rjernst
Copy link
Member

rjernst commented Jan 20, 2017

@JeffreyZZ Thank you for the PR, but we are going decline for now. I suggest looking into increasing your IOPS as I mentioned earlier, and I opened an issue to make our behavior on throttling better (so as not to fail a snapshot when throttled): #22728. And you are of course welcome to work on a PR for that issue!

@rjernst rjernst closed this Jan 20, 2017
@JeffreyZZ
Copy link
Author

Thanks for the quick response. I think that the feature to enable Elasticsearch to write and read snapshot to/from multiple Azure storage accounts is very important for running big production clusters (with 50+ data nodes) on public cloud, such as Azure. Here I’d like to provide more investigation details that we did before I started to add this feature to the plugin for our production cluster running on Azure cloud.

  1. Azure Storage accounts of higher IOPs.
    When we ran into the throttling issues consistently with our production clusters on Azure while writing the snapshot data into a single storage account, we engaged with Azure Storage team to confirm that the issue was because that we wrote too much data into the Azure storage accounts over a period time, which exceeded the account bandwidth limit and triggered the throttling. To solve this problem easily in an easy way, we first consulted Azure Storage production team to ask for the storage accounts of higher IOPs or the storage accounts in the dedicated clusters, where there were NO noisy neighbors. However, the answer that we got from Azure Storage team is that there were NO accounts of higher IOPS, and the only alternative is to shard across multiple accounts. As Elasticsearch is a distributed system, when taking snapshot, each data node writes the data to the storage account independent of each other. So it’s a perfect scenario to allow it to write the potential big snapshot data into different storage accounts to achieve higher IOPS by distributing the data load across multiple accounts. I’m NOT familiar with AWS, but I think that it should also apply to the clusters running on AWS cloud.

  2. Exception handling
    The current exception handling and its retry logical is already able to handle the exceptions thrown when writing the data into Azure Storage account, it includes the throttling exception. What’s more, we also introduced ExponentialRetry to enhance the retry logic to make sure that it has enough retries when the exceptions happened, see the code line below. But based on our investigation, this doesn’t help on the throttling scenarios, because when the snapshot data size is too big, such as 9TB in our case, it’s beyond the capability of exception retries and throttling last over a long period of time. Since the root cause is the bandwidth limit of a single storage account, I think that enabling the sharding of snapshot across multiple storage accounts is the right way to go for it.
    client.getDefaultRequestOptions().setRetryPolicyFactory(new RetryExponentialRetry(1000 * 30, 7));

  3. Backward combability
    I agree that supporting more multiple storage accounts incurs more complicated repository settings. But I think this complexity is reasonable and acceptable considering the scenario that it could address. Furthermore, it’s back-ward compatible. That is to say, if you’re using one storage account, there is NO change to you with this feature added.

@clintongormley
Copy link

Hi @JeffreyZZ

We've had a long internal discussion about this PR and the problem in general. We believe that this PR is not the right way to solve the problem because of the notion of bucketing blobs based on the number of accounts; if you change the number of accounts, you break everything (you can’t restore anymore), and your future snapshots will be sending blobs to different accounts. The design is fundamentally flawed.

Long term, we would like to rewrite snapshot-restore to use Lucene's recovery process instead of the BlobStore that we use today. With that rewrite in place, we could treat multiple repos as separate disks, and put one index on each "disk" (the same way we treat multiple local mount-points today). This would solve your issue in a much cleaner way.

Obviously, that is a major rewrite and will not be happening anytime soon.

In the meantime (and given the limitations of Azure) I'd suggest breaking your snapshots down by index (which could be sent to different accounts) so that you do not run into these issues with throttling.

@JeffreyZZ
Copy link
Author

Hi, @clintongormley ,

Thanks for the team's efforts to evaluate this PR and sharing your thoughts on. I think this might be the right way to solve the throttling problem in the long run. By the way, this is another change in PR about improving the retry with exponential retry, see the code line below, for your reference. This should help to improve the performance of retry.

client.getDefaultRequestOptions().setRetryPolicyFactory(new RetryExponentialRetry(1000 * 30, 7));

Thanks, Jeffrey

@dadoonet
Copy link
Member

dadoonet commented Feb 3, 2017

@JeffreyZZ I agree that it could be a nice separated PR to send. Wanna do it? I believe it should be available as a setting though.

@clintongormley clintongormley added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Repository Azure labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting CLA :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants