Closed index replica allocation #41784

henningandersen · 2019-05-03T09:20:06Z

When an index is closed, we expect primary and replicas to be identical.
This commit improves the gateway replica shard allocator to consider
shards with identical sequence numbers sync'ed for closed indices. This
ensures that we will pick a fast recovery regardless of whether synced
flush was performed prior to closing an index.

Fixed InternalTestCluster to allow doing operations inside onStopped()
when using restartXXXNode().

Relates #41400 and #33888

Please notice the todo on the explain API.

When an index is closed, we expect primary and replicas to be identical. This commit improves the gateway replica shard allocator to consider shards with identical sequence numbers sync'ed for closed indices. This ensures that we will pick a fast recovery regardless of whether synced flush was performed prior to closing an index. Relates elastic#41400 and elastic#33888

elasticmachine · 2019-05-03T09:20:08Z

Pinging @elastic/es-distributed

Added integration test validating that fast recovery is made for closed indices when multiple shard copies can be chosen from. Fixed InternalTestCluster to allow doing operations inside onStopped() when using restartXXXNode(). Relates elastic#41400 and elastic#33888

to assume closed indices are synced.

henningandersen · 2019-05-03T14:16:16Z

ci/1 failed with unrelated failure, reported here: #41794
@elasticmachine run elasticsearch-ci/1

…x_replica_allocation

dnhatn · 2019-05-03T15:39:52Z

@henningandersen I have merged #41400.

…x_replica_allocation

GatewayIndexIT relies on getInstance returning closed node inside onStopped.

henningandersen · 2019-05-04T15:59:26Z

@elasticmachine run elasticsearch-ci/1

ywelsch · 2019-05-06T09:48:58Z

It looks like TransportNodesListShardStoreMetaData is loading the last commit, not the safe commit. For the peer recovery, it’s the local checkpoint of the safe commit, however, which counts. This means that some of these replica allocation decisions by the master might be non-optimal (and require full file-based recoveries). We can of course argue that the likelihood of that is very small. The other issue is that this might work well for closed indices now (given the specialiation for closed indices), but the allocation code will not take frozen indices into account, which share many of the properties with closed replicated indices when it comes to recovery.

I’m mostly wondering if we should generalize the logic a bit more, and not rely on the max seq no / local checkpoint of the last commit, but explicitly enrich the TransportNodesListShardStoreMetaData response to contain additional info:

minimum sequence number from which this shard copy can offer operation-based recoveries (= local checkpoint of safe commit for now) (minProvRecoverySeq)
minimum sequence number of range which this shard copy requires for an operation-based recovery (= local checkpoint of safe commit + 1 for now) (minReqRecoverySeq)
maximum sequence number of the shard copy (maxSeq)

If we then have the condition that
primary.minProvRecoverySeq <= replica.minReqRecoverySeq == primary.maxSeq + 1 && primary.maxSeq == replica.maxSeq
we can assume that a recovery will be operation-based and not require sending any ops and therefore instanteneous (similar to synced flush). This condition won't require any qualification whether it is for closed / frozen or regular indices.

ywelsch

^^

This is a first step away from sync-ids. We now check if replica and primary are identical using sequence numbers when determining where to allocate a replica shard. If an index is no longer indexed into, issuing a regular flush will now be enough to ensure a no-op recovery is done. This has the nice side-effect of ensuring that closed indices and frozen indices choose existing shard copies with identical data over file-overlap comparison, increasing the chance that we end up doing a no-op recovery (only no-op and file-based recovery is supported by closed indices). Relates elastic#41400 and elastic#33888 Supersedes elastic#41784

dnhatn · 2019-06-21T00:01:34Z

@henningandersen Should we close this PR?

henningandersen · 2019-06-24T12:20:00Z

Thanks @dnhatn , yes this can be closed now.

henningandersen added >enhancement WIP :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.0.0 v7.2.0 labels May 3, 2019

henningandersen added 3 commits May 3, 2019 13:08

Use randomFrom()

1fba234

Added todo on the explain API output. Fixed ClusterAllocationExplainIT

5a6a06c

to assume closed indices are synced.

Merge remote-tracking branch 'origin/master' into improve_closed_inde…

9b7ee07

…x_replica_allocation

henningandersen added 3 commits May 3, 2019 18:18

Merge remote-tracking branch 'origin/master' into improve_closed_inde…

7cf950d

…x_replica_allocation

Actually verify we do noop recovery in integration test.

a9d5369

Fixed test failure.

df4eaf0

GatewayIndexIT relies on getInstance returning closed node inside onStopped.

henningandersen requested review from ywelsch and dnhatn May 4, 2019 17:22

henningandersen removed the WIP label May 4, 2019

ywelsch suggested changes May 21, 2019

View reviewed changes

henningandersen mentioned this pull request May 24, 2019

Replica allocation consider no-op #42518

Closed

henningandersen added the WIP label Jun 12, 2019

jakelandis added v7.3.0 and removed v7.2.0 labels Jun 17, 2019

henningandersen closed this Jun 24, 2019

ywelsch removed v7.3.0 v8.0.0 labels Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closed index replica allocation #41784

Closed index replica allocation #41784

henningandersen commented May 3, 2019 •

edited

Loading

elasticmachine commented May 3, 2019

henningandersen commented May 3, 2019

dnhatn commented May 3, 2019

henningandersen commented May 4, 2019

ywelsch commented May 6, 2019

ywelsch left a comment

dnhatn commented Jun 21, 2019

henningandersen commented Jun 24, 2019

Closed index replica allocation #41784

Closed index replica allocation #41784

Conversation

henningandersen commented May 3, 2019 • edited Loading

elasticmachine commented May 3, 2019

henningandersen commented May 3, 2019

dnhatn commented May 3, 2019

henningandersen commented May 4, 2019

ywelsch commented May 6, 2019

ywelsch left a comment

Choose a reason for hiding this comment

dnhatn commented Jun 21, 2019

henningandersen commented Jun 24, 2019

henningandersen commented May 3, 2019 •

edited

Loading