Restoring from snapshot should force generation of a new history uuid #26694

bleskes · 2017-09-18T14:42:52Z

Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of histroy_uuid in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore.

As suggested by @ywelsch , I derived the creation of a history_uuid from the RecoverySource of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid.

Relates #10708
Closes #26544

ywelsch

Left one request for change and one question, o.w. looks good to me.

ywelsch · 2017-09-19T08:50:10Z

core/src/main/java/org/elasticsearch/index/shard/IndexShard.java

@@ -2080,10 +2080,24 @@ private DocumentMapperForType docMapper(String type) {

    private EngineConfig newEngineConfig(EngineConfig.OpenMode openMode) {
        Sort indexSort = indexSortSupplier.get();
+        final boolean forceNewHistoryUUID;
+        switch (shardRouting.recoverySource().getType()) {
+            case EMPTY_STORE:


EMPTY_STORE should force a new history uuid (see also my comment here).
EMPTY_STORE is used for two situations:

creating a shard of a fresh index: Here, we trivially want to force a fresh UUID.

force allocating an empty shard: Here, we are creating a fresh history, so we want to force a fresh UUID too.

Note that this leaves the case of allocating a stale primary, which should also force a fresh history. Here, we have no way at the moment though to detect that based on the recovery source object. For this, if the recovery source is EXISTING_STORE, we can compare the on-disk allocation id (ShardStateMetaData.FORMAT.load(...)) with the expected allocation id in the shard routing object, and if they mismatch, force a new history id. Loading the ShardStateMetaData can happen in the IndexShard constructor.

I'm ok if you do this in a follow-up, but I still think EMPTY_STORE should be correctly set here to force a new history uuid.

EMPTY_STORE will always result in a new history uuid as we create a new index writer and it will not have any user data. I'm fine with setting forceNewHistoryUUID to true to be explicit (and later on morph this into just "generateHistoryUUID` once all existing indices are known to have one and we can be explicit about the moments we generate new ones).

Re forcing a stale allocations - good catch. As you say, that's a more complicated one indeed. I think we should do it as a follow up and discuss possible approaches. I'm thinking about a new recovery source. I think this is an edge case we should be explicit about.

ywelsch · 2017-09-19T09:02:47Z

core/src/main/java/org/elasticsearch/index/engine/InternalEngine.java

+            assert openMode == EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG : "translog uuid didn't change but open mode is " + openMode;
+        }
+        if (needsCommit) {
+            commitIndexWriter(indexWriter, translog, openMode == EngineConfig.OpenMode.OPEN_INDEX_CREATE_TRANSLOG


commitIndexWriter still has a check if (historyUUID != null). Is that one obsolete now?

good catch. fixing.

…ecovery_source

ywelsch

LGTM

…nd not delete people's stuff)

@ywelsch

…#26694) Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore. As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid. Relates #10708 Closes #26544

@ywelsch

…#26694) Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore. As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid. Relates #10708 Closes #26544

* master: (67 commits) Restoring from snapshot should force generation of a new history uuid (elastic#26694) test: Use a single primary shard so that the exception can caught in the same way Move pre-6.0 node checkpoint to SequenceNumbers Invalid JSON request body caused endless loop (elastic#26680) added comment fix line length violation Moved the check to fetch phase. This basically means that we throw a better error message instead of an AOBE and not adding more restrictions. inner hits: Do not allow inner hits that use _source and have a non nested object field as parent Separate Painless Whitelist Loading from the Painless Definition (elastic#26540) convert more admin requests to writeable (elastic#26566) Handle release of 5.6.1 Allow `InputStreamStreamInput` array size validation where applicable (elastic#26692) Update global checkpoint with permit after recovery Filter pre-6.0 nodes for checkpoint invariants Skip bad request REST test on pre-6.0 Reenable BWC tests after disabling for backport Add global checkpoint tracking on the primary [Test] Fix reference/cat/allocation/line_8 test failure [Docs] improved description for fs.total.available_in_bytes (elastic#26657) Fix discovery-file plugin to use custom config path ...

…roys translog recovery info (#26734) This is a bug introduced in #26694 . The issue comes from the attempt to share code that commits the new history uuid and/or a new translog uuid. This goes wrong an existing 5.6 index that is recovered from store: 1) A new history uuid is generated as it doesn't exist in the index 2) The translog is opened and it's uuid doesn't change. 3) The committing the new history uuid used the standard commitIndexWriter method. 4) The latter asks the translog for the oldest file generation that is required to recover from the local checkpoint + 1 5) The local checkpoint on old indices is -1 (until something is indexed) and the translog is asked to recover from position 0. That excludes any operations the translog that do not have a seq no assigned, causing the FullClusterRestart bwc tests to fail. To bypass this commit moves away from the attempt to share the committing code between a new translog uuid and a new history uuid. Instead we do what we did before and open the translog with a potential commit. Afterwards we commit the history uuid if needed. This comes with the expense of opening up the method to commit an index writer in the engine.

bleskes added 5 commits September 17, 2017 15:13

initial version

cd9ffba

better assertions for non existing uuids

ab13a76

add recovery IT

da5f848

preserve indices

171aa45

fix paths

efdf3f6

bleskes added :Sequence IDs >enhancement v6.0.0 v6.1.0 v7.0.0 labels Sep 18, 2017

bleskes requested a review from ywelsch September 18, 2017 14:42

ywelsch suggested changes Sep 19, 2017

View reviewed changes

bleskes added 3 commits September 19, 2017 11:40

history uuid is never null.

f5cfbdc

EMPTY_STORE to forceGenerating a new history uuid

3c363f0

Merge remote-tracking branch 'upstream/master' into history_uuid_by_r…

5c330b4

…ecovery_source

bleskes requested a review from ywelsch September 19, 2017 10:01

ywelsch approved these changes Sep 19, 2017

View reviewed changes

RecoveryIT should preserveReposUponCompletion (to be a good citizen a…

431ceef

…nd not delete people's stuff)

bleskes merged commit 04385a9 into elastic:master Sep 19, 2017

bleskes deleted the history_uuid_by_recovery_source branch September 19, 2017 13:59

bleskes mentioned this pull request Sep 19, 2017

Force a new history uuid when force allocating a stale primary #26712

Closed

imotov mentioned this pull request Sep 19, 2017

[CI] SharedClusterSnapshotRestoreIT.testBasicWorkFlow failure #26436

Closed

bleskes mentioned this pull request Sep 21, 2017

Generating and committing a history_uuid on existing old indices destroys translog recovery info #26734

Merged

colings86 added v6.0.0-rc1 and removed v6.0.0 labels Sep 22, 2017

lcawl removed the v6.1.0 label Dec 12, 2017

clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restoring from snapshot should force generation of a new history uuid #26694

Restoring from snapshot should force generation of a new history uuid #26694

bleskes commented Sep 18, 2017

ywelsch left a comment

ywelsch Sep 19, 2017

bleskes Sep 19, 2017

ywelsch Sep 19, 2017

bleskes Sep 19, 2017

ywelsch left a comment

Restoring from snapshot should force generation of a new history uuid #26694

Restoring from snapshot should force generation of a new history uuid #26694

Conversation

bleskes commented Sep 18, 2017

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Sep 19, 2017

Choose a reason for hiding this comment

bleskes Sep 19, 2017

Choose a reason for hiding this comment

ywelsch Sep 19, 2017

Choose a reason for hiding this comment

bleskes Sep 19, 2017

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment