Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make _id terms optional in segment containing only noop #30409

Merged
merged 7 commits into from
May 8, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented May 5, 2018

Previously only index and delete operations are indexed into Lucene,
therefore every segment should have both _id and _version terms as these
operations contain both terms. However, this is no longer guaranteed
after noop is also indexed into Lucene. A segment which contains only
no-ops does not have neither _id or _version because a no-op does not
contain these terms.

This change adds a dummy version to no-ops and makes _id terms optional
in PerThreadIDVersionAndSeqNoLookup.

Relates #30226

Previously only index and delete operations are indexed into Lucene,
therefore every segment should have both _id and _version terms as these
operations contains both terms. However, this is no longer guaranteed
after noop is also indexed into Lucene. A segment which contains only
no-ops does not have either _id or _version.

This change makes _id and _version terms optional in
PerThreadIDVersionAndSeqNoLookup.

Relates elastic#30226
@dnhatn dnhatn added >feature :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels May 5, 2018
@dnhatn dnhatn requested review from jpountz, s1monw and bleskes May 5, 2018 22:53
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good catch. Left some nits.

engine.refresh("test");
}

public void testRandomOperations() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment as to why we have this test? something like "a simple test to check that random combination of operations can coexist in segments and be read. This is needed as some fields in lucene may not exist if a segment misses operation types and the code needs to check for that"

if (reader.getNumericDocValues(VersionFieldMapper.NAME) == null) {
throw new IllegalArgumentException("reader misses the [" + VersionFieldMapper.NAME + "] field");
// Uid is not found if a segment contains only NoOps; in this case, version should not exist either.
if (reader.getNumericDocValues(VersionFieldMapper.NAME) != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can make this more generic and check that the segment only has the noop fields?

@@ -111,7 +115,7 @@ public DocIdAndVersion lookupVersion(BytesRef id, LeafReaderContext context)
* {@link DocIdSetIterator#NO_MORE_DOCS} is returned if not found
* */
private int getDocID(BytesRef id, Bits liveDocs) throws IOException {
if (termsEnum.seekExact(id)) {
if (termsEnum != null && termsEnum.seekExact(id)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment as to when the id might not be there?

@@ -3781,6 +3781,51 @@ protected long doGenerateSeqNoForOperation(Operation operation) {
}
}

public void testSegmentContainsOnlyNoOps() throws Exception {
Engine.NoOpResult noOpResult = engine.noOp(new Engine.NoOp(1, primaryTerm.get(), REPLICA, randomNonNegativeLong(), "test"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

primaries can also have noops upon promotion/recovery. Randomize the engine type?

@jpountz
Copy link
Contributor

jpountz commented May 6, 2018

Change looks good. I'm wondering whether we should add a dummy _version to these noop documents, since the doc values impl in the current Lucene codec has better random-access with dense fields. cc @s1monw since we had related discussions in the past.

@s1monw
Copy link
Contributor

s1monw commented May 7, 2018

I was gonna say the same thing as @jpountz that we should make sure we index a dummy version to make the DV dense. I think this is essential for indexing perf.

@dnhatn dnhatn changed the title Make _id and _version terms optional in a segment Make _id terms optional in segment containing only noops May 7, 2018
@dnhatn dnhatn changed the title Make _id terms optional in segment containing only noops Make _id terms optional in segment containing only no-op May 7, 2018
@dnhatn dnhatn changed the title Make _id terms optional in segment containing only no-op Make _id terms optional in segment containing only noop May 7, 2018
@dnhatn
Copy link
Member Author

dnhatn commented May 7, 2018

@jpountz and @simonw Thanks for your suggestion. I've added a dummy version to no-ops. Can you please take another look?

@dnhatn
Copy link
Member Author

dnhatn commented May 8, 2018

@elasticmachine test this please

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -1347,6 +1347,9 @@ private NoOpResult innerNoOp(final NoOp noOp) throws IOException {
try {
final ParsedDocument tombstone = engineConfig.getTombstoneDocSupplier().newNoopTombstoneDoc();
tombstone.updateSeqID(noOp.seqNo(), noOp.primaryTerm());
// A noop tombstone does not require a _version but it's added to have a fully dense docvalues for the version field.
// 1L is selected to optimize the compression because it might probably be the most common value in version field.
tombstone.version().setLongValue(1L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also the version that append only docs has! 👍

@dnhatn
Copy link
Member Author

dnhatn commented May 8, 2018

Thanks @bleskes @jpountz and @simonw for reviewing!

@dnhatn dnhatn merged commit 75719ac into elastic:ccr May 8, 2018
@dnhatn dnhatn deleted the fix-segment-all-noops branch May 8, 2018 17:38
dnhatn added a commit that referenced this pull request May 10, 2018
Previously only index and delete operations are indexed into Lucene,
therefore every segment should have both _id and _version terms as these
operations contain both terms. However, this is no longer guaranteed
after noop is also indexed into Lucene. A segment which contains only
no-ops does not have neither _id or _version because a no-op does not
contain these terms.

This change adds a dummy version to no-ops and makes _id terms optional
in PerThreadIDVersionAndSeqNoLookup.

Relates #30226
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants