-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post build index creation #11711
Post build index creation #11711
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11711 +/- ##
============================================
- Coverage 63.13% 63.09% -0.04%
+ Complexity 1118 1117 -1
============================================
Files 2342 2342
Lines 125883 125913 +30
Branches 19357 19359 +2
============================================
- Hits 79477 79446 -31
- Misses 40749 40811 +62
+ Partials 5657 5656 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 14 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
if (postSegCreationIndexes.size() > 0) { | ||
// Build other indexes | ||
Map<String, Object> props = new HashMap<>(); | ||
props.put(IndexLoadingConfig.READ_MODE_KEY, ReadMode.mmap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naive question: is mmap mode alway the best choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say so.. This method will be called by any component when building a segment (eg by the server when commiting consuming segment, by the minions when building a segment, by spark job when building building and pushing data etc). For example the PinotRecordReader always uses mmap (used by minion jobs to read segment and build new segments out of it) https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/PinotSegmentRecordReader.java#L116.
Any thoughts on this? I guess a new config can be used to specify the load type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! I think it's a good starting point if PinotRecordReader always uses mmap. We can revisit this later.
I approved the PR by accident 😛 |
cc @gortiz |
enum IndexBuildLifecycle { | ||
DURING_SEGMENT_CREATION, | ||
POST_SEGMENT_CREATION, | ||
CUSTOM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a javadoc explaining the meaning of each literal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just call itBuildTime
as this enum is defined as part of IndexType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said, I think it would be great to have a javadoc explaining the semantics of each lifecycle for each enum. Apart from that, I like the PR. Nice improvement!
enum IndexBuildLifecycle { | ||
DURING_SEGMENT_CREATION, | ||
POST_SEGMENT_CREATION, | ||
CUSTOM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just call itBuildTime
as this enum is defined as part of IndexType
@@ -313,6 +325,36 @@ private void handlePostCreation() | |||
buildStarTreeV2IfNecessary(segmentOutputDir); | |||
} | |||
|
|||
Set<IndexType> postSegCreationIndexes = IndexService.getInstance().getAllIndexes().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a bit more readability, wrap those new changes to a method like addIndexIfNecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It may be desirable for some index types to be build post segment file has been created. This PR allows for that.
The behaviour can be achieved for a custom index being added (using index-spi) by overriding the
getIndexBuildLifecycle
method to returnBuildLifecycle.POST_SEGMENT_CREATION
Release notes
Adds support for building indexes post segment file creation, allowing indexes that may depend on a completed segment to be built as part of the segment creation process.