Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill missing sequence IDs up to max sequence ID when recovering from store #24238

Merged
merged 5 commits into from
Apr 21, 2017

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Apr 21, 2017

Today we might promote a primary and recover from store where after translog
recovery the local checkpoint is still behind the maximum sequence ID seen.
To fill the holes in the sequence ID history this PR adds a utility method
that fills up all missing sequence IDs up to the maximum seen sequence ID
with no-ops.

Relates to #10708
I still work on a test for store recovery to ensure it's called but I think it's ready for review.

…store

Today we might promote a primary and recover from store where after translog
recovery the local checkpoint is still behind the maximum sequence ID seen.
To fill the holes in the sequence ID history this PR adds a utility method
that fills up all missing sequence IDs up to the maximum seen sequence ID
with no-ops.
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I one minor request to the test.

final long maxSeqId = seqNoService.getMaxSeqNo();
int numNoOpsAdded = 0;
for (long i = localCheckpoint + 1; i <= maxSeqId;
// the local checkpoint might have been advanced so we are leap-frogging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the local checkpoint must have advanced by at least one. We can assert on that after the noop was indexed.

Engine.Index primaryResponse = indexForDoc(doc);
Engine.IndexResult indexResult = engine.index(primaryResponse);
if (randomBoolean()) {
doc.updateSeqID(indexResult.getSeqNo(), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? doesn't the engine take care of that?

assertEquals((maxSeqIDOnReplica+1) - numDocsOnReplica, recoveringEngine.fillSequenceNumberHistory(2));
assertEquals(maxSeqIDOnReplica, recoveringEngine.seqNoService().getMaxSeqNo());
assertEquals(maxSeqIDOnReplica, recoveringEngine.seqNoService().getLocalCheckpoint());
if ((flushed = randomBoolean())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we snapshot the translog and assert that the noops have the right primary term?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I had that but remvoed it... good catch...

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM. Left a suggestion for the new test.

// start a replica shard and index the second doc
final IndexShard otherShard = newStartedShard(false);
test = otherShard.prepareIndexOnReplica(
SourceToParse.source(SourceToParse.Origin.PRIMARY, shard.shardId().getIndexName(), test.type(), test.id(), test.source(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Origin should REPLICA

@@ -896,6 +898,46 @@ public void testRecoverFromStore() throws IOException {
closeShards(newShard);
}

/* This test just verifies that we fill up local checkpoint up to max seen seqID on primary recovery */
public void testRecoverFromStoreWithNoOps() throws IOException {
final IndexShard shard = newStartedShard(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can introduce a variant of indexDoc called indexDocOnReplica which takes a seq# as a parameter. This will remove the need for the extra shard. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that in a sep PR

@s1monw s1monw merged commit 2ca7072 into elastic:master Apr 21, 2017
@s1monw s1monw deleted the fill_gaps_on_tlog_recovery branch April 21, 2017 18:28
@s1monw s1monw removed the review label Apr 21, 2017
asettouf pushed a commit to asettouf/elasticsearch that referenced this pull request Apr 23, 2017
…store (elastic#24238)

Today we might promote a primary and recover from store where after translog
recovery the local checkpoint is still behind the maximum sequence ID seen.
To fill the holes in the sequence ID history this PR adds a utility method
that fills up all missing sequence IDs up to the maximum seen sequence ID
with no-ops.

Relates to elastic#10708
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Engine :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants