Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restoring from snapshot should force generation of a new history uuid #26694

Merged
merged 9 commits into from
Sep 19, 2017

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Sep 18, 2017

Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of histroy_uuid in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore.

As suggested by @ywelsch , I derived the creation of a history_uuid from the RecoverySource of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid.

Relates #10708
Closes #26544

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one request for change and one question, o.w. looks good to me.

@@ -2080,10 +2080,24 @@ private DocumentMapperForType docMapper(String type) {

private EngineConfig newEngineConfig(EngineConfig.OpenMode openMode) {
Sort indexSort = indexSortSupplier.get();
final boolean forceNewHistoryUUID;
switch (shardRouting.recoverySource().getType()) {
case EMPTY_STORE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EMPTY_STORE should force a new history uuid (see also my comment here).
EMPTY_STORE is used for two situations:

  • creating a shard of a fresh index: Here, we trivially want to force a fresh UUID.
  • force allocating an empty shard: Here, we are creating a fresh history, so we want to force a fresh UUID too.

Note that this leaves the case of allocating a stale primary, which should also force a fresh history. Here, we have no way at the moment though to detect that based on the recovery source object. For this, if the recovery source is EXISTING_STORE, we can compare the on-disk allocation id (ShardStateMetaData.FORMAT.load(...)) with the expected allocation id in the shard routing object, and if they mismatch, force a new history id. Loading the ShardStateMetaData can happen in the IndexShard constructor.

I'm ok if you do this in a follow-up, but I still think EMPTY_STORE should be correctly set here to force a new history uuid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EMPTY_STORE will always result in a new history uuid as we create a new index writer and it will not have any user data. I'm fine with setting forceNewHistoryUUID to true to be explicit (and later on morph this into just "generateHistoryUUID` once all existing indices are known to have one and we can be explicit about the moments we generate new ones).

Re forcing a stale allocations - good catch. As you say, that's a more complicated one indeed. I think we should do it as a follow up and discuss possible approaches. I'm thinking about a new recovery source. I think this is an edge case we should be explicit about.

assert openMode == EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG : "translog uuid didn't change but open mode is " + openMode;
}
if (needsCommit) {
commitIndexWriter(indexWriter, translog, openMode == EngineConfig.OpenMode.OPEN_INDEX_CREATE_TRANSLOG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commitIndexWriter still has a check if (historyUUID != null). Is that one obsolete now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. fixing.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bleskes bleskes merged commit 04385a9 into elastic:master Sep 19, 2017
@bleskes bleskes deleted the history_uuid_by_recovery_source branch September 19, 2017 13:59
bleskes added a commit that referenced this pull request Sep 19, 2017
…#26694)

Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore.

As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid.

Relates #10708
Closes #26544
bleskes added a commit that referenced this pull request Sep 19, 2017
…#26694)

Restoring a shard from snapshot throws the primary back in time violating assumptions and bringing the validity of global checkpoints in question. To avoid problems, we should make sure that a shard that was restored will never be the source of an ops based recovery to a shard that existed before the restore. To this end we have introduced the notion of `histroy_uuid` in #26577 and required that both source and target will have the same history to allow ops based recoveries. This PR make sure that a shard gets a new uuid after restore.

As suggested by @ywelsch , I derived the creation of a `history_uuid` from the `RecoverySource` of the shard. Store recovery will only generate a uuid if it doesn't already exist (we can make this stricter when we don't need to deal with 5.x indices). Peer recovery follows the same logic (note that this is different than the approach in #26557, I went this way as it means that shards always have a history uuid after being recovered on a 6.x node and will also mean that a rolling restart is enough for old indices to step over to the new seq no model). Local shards and snapshot force the generation of a new translog uuid.

Relates #10708
Closes #26544
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 20, 2017
* master: (67 commits)
  Restoring from snapshot should force generation of a new history uuid (elastic#26694)
  test: Use a single primary shard so that the exception can caught in the same way
  Move pre-6.0 node checkpoint to SequenceNumbers
  Invalid JSON request body caused endless loop (elastic#26680)
  added comment
  fix line length violation
  Moved the check to fetch phase. This basically means that we throw a better error message instead of an AOBE and not adding more restrictions.
  inner hits: Do not allow inner hits that use _source and have a non nested object field as parent
  Separate Painless Whitelist Loading from the Painless Definition (elastic#26540)
  convert more admin requests to writeable (elastic#26566)
  Handle release of 5.6.1
  Allow `InputStreamStreamInput` array size validation where applicable (elastic#26692)
  Update global checkpoint with permit after recovery
  Filter pre-6.0 nodes for checkpoint invariants
  Skip bad request REST test on pre-6.0
  Reenable BWC tests after disabling for backport
  Add global checkpoint tracking on the primary
  [Test] Fix reference/cat/allocation/line_8 test failure
  [Docs] improved description for fs.total.available_in_bytes (elastic#26657)
  Fix discovery-file plugin to use custom config path
  ...
bleskes added a commit that referenced this pull request Sep 21, 2017
…roys translog recovery info (#26734)

This is a bug introduced in #26694 . The issue comes from the attempt to share code that commits the new history uuid and/or a new translog uuid. This goes wrong an existing 5.6 index that is recovered from store:

1) A new history uuid is generated as it doesn't exist in the index
2) The translog is opened and it's uuid doesn't change.
3) The committing the new history uuid used the standard commitIndexWriter method.
4) The latter asks the translog for the oldest file generation that is required to recover from the local checkpoint + 1
5) The local checkpoint on old indices is -1 (until something is indexed) and the translog is asked to recover from position 0. That excludes any operations the translog that do not have a seq no assigned, causing the FullClusterRestart bwc tests to fail.

To bypass this commit moves away from the attempt to share the committing code between a new translog uuid and a new history uuid. Instead we do what we did before and open the translog with a potential commit. Afterwards we commit the history uuid if needed. This comes with the expense of opening up the method to commit an index writer in the engine.
bleskes added a commit that referenced this pull request Sep 21, 2017
…roys translog recovery info (#26734)

This is a bug introduced in #26694 . The issue comes from the attempt to share code that commits the new history uuid and/or a new translog uuid. This goes wrong an existing 5.6 index that is recovered from store:

1) A new history uuid is generated as it doesn't exist in the index
2) The translog is opened and it's uuid doesn't change.
3) The committing the new history uuid used the standard commitIndexWriter method.
4) The latter asks the translog for the oldest file generation that is required to recover from the local checkpoint + 1
5) The local checkpoint on old indices is -1 (until something is indexed) and the translog is asked to recover from position 0. That excludes any operations the translog that do not have a seq no assigned, causing the FullClusterRestart bwc tests to fail.

To bypass this commit moves away from the attempt to share the committing code between a new translog uuid and a new history uuid. Instead we do what we did before and open the translog with a potential commit. Afterwards we commit the history uuid if needed. This comes with the expense of opening up the method to commit an index writer in the engine.
@lcawl lcawl removed the v6.1.0 label Dec 12, 2017
@clintongormley clintongormley added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Sequence IDs labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.0.0-rc1 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants