-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compression configuration for aggregationConfigs to StartreeIndexConfigs #11744
Add compression configuration for aggregationConfigs to StartreeIndexConfigs #11744
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11744 +/- ##
============================================
+ Coverage 63.13% 63.18% +0.05%
- Complexity 1118 1141 +23
============================================
Files 2342 2344 +2
Lines 125883 126415 +532
Branches 19357 19450 +93
============================================
+ Hits 79477 79878 +401
- Misses 40749 40859 +110
- Partials 5657 5678 +21
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 146 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/StarTreeIndexConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/StarTreeIndexConfig.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add some tests to ensure the behavior is correct? We may consider using different codec for each aggregation in BaseStarTreeV2Test
or StarTreeClusterIntegrationTest
_dimensionsSplitOrder = dimensionsSplitOrder; | ||
_skipStarNodeCreationForDimensions = skipStarNodeCreationForDimensions; | ||
_functionColumnPairs = functionColumnPairs; | ||
_functionColumnPairs = functionColumnPairs != null ? functionColumnPairs : new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest treating empty as null
instead of treating null
as empty. In most of the cases, only one will be configured. We may also add this logic to _skipStarNodeCreationForDimensions
_functionColumnPairs = functionColumnPairs != null ? functionColumnPairs : new ArrayList<>(); | |
_skipStarNodeCreationForDimensions = CollectionUtils.isNotEmpty(skipStarNodeCreationForDimensions) ? skipStarNodeCreationForDimensions : null; | |
_functionColumnPairs = CollectionUtils.isNotEmpty(functionColumnPairs) ? functionColumnPairs : null; | |
_aggregationConfigs = CollectionUtils.isNotEmpty(aggregationConfigs) ? aggregationConfigs : null; | |
Preconditions.checkArgument(_functionColumnPairs != null || _aggregationConfigs != null, ...) |
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/StarTreeAggregationConfig.java
Outdated
Show resolved
Hide resolved
...egment-spi/src/main/java/org/apache/pinot/segment/spi/index/startree/StarTreeV2Metadata.java
Outdated
Show resolved
Hide resolved
...gment-spi/src/main/java/org/apache/pinot/segment/spi/index/startree/StarTreeV2Constants.java
Outdated
Show resolved
Hide resolved
|
||
// Adds current object to Configuration by reference | ||
public void addToConfiguration(Configuration configuration) { | ||
String configPrefix = StarTreeV2Constants.MetadataKey.AGGREGATION_CONFIG + "." + this.toColumnName(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest not using column name as the prefix because in the future we might want to deprecate it. Using index might be good: aggregate.0.function
, aggregate.0.column
, aggregate.0.codec
Configuration functionColPairsConfig = | ||
metadataProperties.subset(MetadataKey.AGGREGATION_CONFIG + "." + functionColumnPair); | ||
if (!functionColPairsConfig.isEmpty()) { | ||
_functionColumnPairs.add(AggregationFunctionColumnPair.fromConfiguration(functionColPairsConfig)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(CRITICAL) This will cause the same aggregation being added twice if the codec is different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can modify the equals to account for this, ignoring the compression type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the data structure to a treeMap with functionColumnPair as a string key to avoid duplicates.
Thanks for the review. Adressing everything :) |
Have you pushed the changes? |
Not yet, I'll try to do it by eod |
I applied a commit to introduce the
Please take a look and let me know if you have concern about the change |
Looks good from my side, and much cleaner. Thanks for helping. :) |
@t0mpere Thanks for the contribution! Could you please also help contribute the documentation for the new config? It can be added to this repo: git@github.com:pinot-contrib/pinot-docs.git |
Sure, I'm OOO at the moment I will take care in 2 weeks as soon as I will get back to my laptop. |
For reference, |
Adding
aggregationConfigs
to enable configuring compression.See: #9694
This change was tested manually with multiple segment refresh on
DOUBLE
andBYTES
metrics columns.