-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make _id terms optional in segment containing only noop #30409
Conversation
Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contains both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have either _id or _version. This change makes _id and _version terms optional in PerThreadIDVersionAndSeqNoLookup. Relates elastic#30226
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good catch. Left some nits.
engine.refresh("test"); | ||
} | ||
|
||
public void testRandomOperations() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment as to why we have this test? something like "a simple test to check that random combination of operations can coexist in segments and be read. This is needed as some fields in lucene may not exist if a segment misses operation types and the code needs to check for that"
if (reader.getNumericDocValues(VersionFieldMapper.NAME) == null) { | ||
throw new IllegalArgumentException("reader misses the [" + VersionFieldMapper.NAME + "] field"); | ||
// Uid is not found if a segment contains only NoOps; in this case, version should not exist either. | ||
if (reader.getNumericDocValues(VersionFieldMapper.NAME) != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can make this more generic and check that the segment only has the noop fields?
@@ -111,7 +115,7 @@ public DocIdAndVersion lookupVersion(BytesRef id, LeafReaderContext context) | |||
* {@link DocIdSetIterator#NO_MORE_DOCS} is returned if not found | |||
* */ | |||
private int getDocID(BytesRef id, Bits liveDocs) throws IOException { | |||
if (termsEnum.seekExact(id)) { | |||
if (termsEnum != null && termsEnum.seekExact(id)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment as to when the id might not be there?
@@ -3781,6 +3781,51 @@ protected long doGenerateSeqNoForOperation(Operation operation) { | |||
} | |||
} | |||
|
|||
public void testSegmentContainsOnlyNoOps() throws Exception { | |||
Engine.NoOpResult noOpResult = engine.noOp(new Engine.NoOp(1, primaryTerm.get(), REPLICA, randomNonNegativeLong(), "test")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
primaries can also have noops upon promotion/recovery. Randomize the engine type?
Change looks good. I'm wondering whether we should add a dummy |
I was gonna say the same thing as @jpountz that we should make sure we index a dummy version to make the DV dense. I think this is essential for indexing perf. |
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -1347,6 +1347,9 @@ private NoOpResult innerNoOp(final NoOp noOp) throws IOException { | |||
try { | |||
final ParsedDocument tombstone = engineConfig.getTombstoneDocSupplier().newNoopTombstoneDoc(); | |||
tombstone.updateSeqID(noOp.seqNo(), noOp.primaryTerm()); | |||
// A noop tombstone does not require a _version but it's added to have a fully dense docvalues for the version field. | |||
// 1L is selected to optimize the compression because it might probably be the most common value in version field. | |||
tombstone.version().setLongValue(1L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is also the version that append only docs has! 👍
Previously only index and delete operations are indexed into Lucene, therefore every segment should have both _id and _version terms as these operations contain both terms. However, this is no longer guaranteed after noop is also indexed into Lucene. A segment which contains only no-ops does not have neither _id or _version because a no-op does not contain these terms. This change adds a dummy version to no-ops and makes _id terms optional in PerThreadIDVersionAndSeqNoLookup. Relates #30226
Previously only index and delete operations are indexed into Lucene,
therefore every segment should have both _id and _version terms as these
operations contain both terms. However, this is no longer guaranteed
after noop is also indexed into Lucene. A segment which contains only
no-ops does not have neither _id or _version because a no-op does not
contain these terms.
This change adds a dummy version to no-ops and makes _id terms optional
in PerThreadIDVersionAndSeqNoLookup.
Relates #30226