Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed index replica allocation #41784

Conversation

henningandersen
Copy link
Contributor

@henningandersen henningandersen commented May 3, 2019

When an index is closed, we expect primary and replicas to be identical.
This commit improves the gateway replica shard allocator to consider
shards with identical sequence numbers sync'ed for closed indices. This
ensures that we will pick a fast recovery regardless of whether synced
flush was performed prior to closing an index.

Fixed InternalTestCluster to allow doing operations inside onStopped()
when using restartXXXNode().

Relates #41400 and #33888

Please notice the todo on the explain API.

When an index is closed, we expect primary and replicas to be identical.
This commit improves the gateway replica shard allocator to consider
shards with identical sequence numbers sync'ed for closed indices. This
ensures that we will pick a fast recovery regardless of whether synced
flush was performed prior to closing an index.

Relates elastic#41400 and elastic#33888
@henningandersen henningandersen added >enhancement WIP :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.0.0 v7.2.0 labels May 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Added integration test validating that fast recovery is made for closed
indices when multiple shard copies can be chosen from.

Fixed InternalTestCluster to allow doing operations inside onStopped()
when using restartXXXNode().

Relates elastic#41400 and elastic#33888
@henningandersen
Copy link
Contributor Author

ci/1 failed with unrelated failure, reported here: #41794
@elasticmachine run elasticsearch-ci/1

@dnhatn
Copy link
Member

dnhatn commented May 3, 2019

@henningandersen I have merged #41400.

@henningandersen
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/1

@ywelsch
Copy link
Contributor

ywelsch commented May 6, 2019

It looks like TransportNodesListShardStoreMetaData is loading the last commit, not the safe commit. For the peer recovery, it’s the local checkpoint of the safe commit, however, which counts. This means that some of these replica allocation decisions by the master might be non-optimal (and require full file-based recoveries). We can of course argue that the likelihood of that is very small. The other issue is that this might work well for closed indices now (given the specialiation for closed indices), but the allocation code will not take frozen indices into account, which share many of the properties with closed replicated indices when it comes to recovery.

I’m mostly wondering if we should generalize the logic a bit more, and not rely on the max seq no / local checkpoint of the last commit, but explicitly enrich the TransportNodesListShardStoreMetaData response to contain additional info:

  • minimum sequence number from which this shard copy can offer operation-based recoveries (= local checkpoint of safe commit for now) (minProvRecoverySeq)
  • minimum sequence number of range which this shard copy requires for an operation-based recovery (= local checkpoint of safe commit + 1 for now) (minReqRecoverySeq)
  • maximum sequence number of the shard copy (maxSeq)

If we then have the condition that
primary.minProvRecoverySeq <= replica.minReqRecoverySeq == primary.maxSeq + 1 && primary.maxSeq == replica.maxSeq
we can assume that a recovery will be operation-based and not require sending any ops and therefore instanteneous (similar to synced flush). This condition won't require any qualification whether it is for closed / frozen or regular indices.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^

henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request May 24, 2019
This is a first step away from sync-ids. We now check if replica and
primary are identical using sequence numbers when determining where to
allocate a replica shard.

If an index is no longer indexed into, issuing a regular flush will now
be enough to ensure a no-op recovery is done.

This has the nice side-effect of ensuring that closed indices and frozen
indices choose existing shard copies with identical data over
file-overlap comparison, increasing the chance that we end up doing a
no-op recovery (only no-op and file-based recovery is supported by
closed indices).

Relates elastic#41400 and elastic#33888

Supersedes elastic#41784
@jakelandis jakelandis added v7.3.0 and removed v7.2.0 labels Jun 17, 2019
@dnhatn
Copy link
Member

dnhatn commented Jun 21, 2019

@henningandersen Should we close this PR?

@henningandersen
Copy link
Contributor Author

Thanks @dnhatn , yes this can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants