Cleanup Concurrent RepositoryData Loading (#48329) #48835

original-brownbear · 2019-11-02T17:28:49Z

The loading of RepositoryData is not an atomic operation.
It uses a list + get combination of calls.
This lead to accidentally returning an empty repository data
for generations >=0 which can never not exist unless the repository
is corrupted.
In the test #48122 (and other SLM tests) there was a low chance of
running into this concurrent modification scenario and the repository
actually moving two index generations between listing out the
index-N and loading the latest version of it. Since we only keep
two index-N around at a time this lead to unexpectedly absent
snapshots in status APIs.
Fixing the behavior to be more resilient is non-trivial but in the works.
For now I think we should simply throw in this scenario. This will also
help prevent corruption in the unlikely event but possible of running into this
issue in a snapshot create or delete operation on master failover on a
repository like S3 which doesn't have the "no overwrites" protection on
writing a new index-N.

Fixes #48122

backport of #48329

The loading of `RepositoryData` is not an atomic operation. It uses a list + get combination of calls. This lead to accidentally returning an empty repository data for generations >=0 which can never not exist unless the repository is corrupted. In the test elastic#48122 (and other SLM tests) there was a low chance of running into this concurrent modification scenario and the repository actually moving two index generations between listing out the index-N and loading the latest version of it. Since we only keep two index-N around at a time this lead to unexpectedly absent snapshots in status APIs. Fixing the behavior to be more resilient is non-trivial but in the works. For now I think we should simply throw in this scenario. This will also help prevent corruption in the unlikely event but possible of running into this issue in a snapshot create or delete operation on master failover on a repository like S3 which doesn't have the "no overwrites" protection on writing a new index-N. Fixes elastic#48122

elasticmachine · 2019-11-02T17:28:51Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

original-brownbear added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs backport labels Nov 2, 2019

original-brownbear merged commit d159e5d into elastic:7.5 Nov 2, 2019

original-brownbear deleted the 48329-7.5 branch November 2, 2019 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup Concurrent RepositoryData Loading (#48329) #48835

Cleanup Concurrent RepositoryData Loading (#48329) #48835

original-brownbear commented Nov 2, 2019

elasticmachine commented Nov 2, 2019

Cleanup Concurrent RepositoryData Loading (#48329) #48835

Cleanup Concurrent RepositoryData Loading (#48329) #48835

Conversation

original-brownbear commented Nov 2, 2019

elasticmachine commented Nov 2, 2019