-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for leveraging StarTree index in conjunction with filtered aggregations #11886
Adds support for leveraging StarTree index in conjunction with filtered aggregations #11886
Conversation
|
||
// Use star-tree to solve the query if possible | ||
List<StarTreeV2> starTrees = _indexSegment.getStarTrees(); | ||
if (starTrees != null && !_queryContext.isSkipStarTree()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this PR would introduce a behaviour change (bugfix?) where AggregationPlanNode
included an additional predicate to check if null handling is enabled before allowing use of StarTree index to solve the query. This PR consolidates this logic to one method within OperatorUtils.java, where the null handling predicate is kept such that both GroupByPlanNode and AggregationPlanNode will ensure null handling is disabled before allowing use of StarTree index.
if (starTrees != null && !_queryContext.isSkipStarTree() && !_queryContext.isNullHandlingEnabled()) {
Marking as ready-for-review to get test results, but this should effectively be considered a draft at this point |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #11886 +/- ##
=========================================
Coverage 61.62% 61.63%
- Complexity 1150 1152 +2
=========================================
Files 2385 2385
Lines 129275 129374 +99
Branches 20016 20030 +14
=========================================
+ Hits 79668 79740 +72
- Misses 43806 43821 +15
- Partials 5801 5813 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
90d91fe
to
e873508
Compare
…e even when nullhandling enabled to check integration tests
not sure if this is related: #11899 |
pinot-core/src/main/java/org/apache/pinot/core/operator/query/OperatorUtils.java
Outdated
Show resolved
Hide resolved
… startree even when nullhandling enabled to check integration tests" This reverts commit 444dd5e.
@xiangfu0 @Jackie-Jiang This can now be considered ready for review, as it's operationally functional. I'll add tests, but wanted to flag that this is ready for review in case that helps with velocity etc. See screen shots below for examples. I noticed that this feature also provides an additional enhancement: being able to abuse Take for example the airlineStats example data. The following query would utilize ST: select
AirlineID,
MAX(ArrDelay) as max_delay
FROM airlineStats
GROUP BY AirlineID However, this would not (because select
AirlineID,
MAX(ArrDelay) as max_delay,
MIN(ArrDelay)
FROM airlineStats
GROUP BY AirlineID After this change, one could rewrite the query like so: select
AirlineID,
MAX(ArrDelay) FILTER (WHERE AirlineID != -1) AS max_delay,
MIN(ArrDelay)
FROM airlineStats
GROUP BY AirlineID This would allow the |
…support-filtered-aggs
@Jackie-Jiang @xiangfu0 this is now complete with tests and ready for review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see it fits to the current filtered aggregation architecture!
import org.apache.pinot.core.operator.BaseProjectOperator; | ||
|
||
|
||
public interface ProjectionPlanNode extends PlanNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest not adding this interface. We can do cast if necessary
|
||
public CombinedFilteredAggregationContext(BaseFilterOperator baseFilterOperator, | ||
List<Pair<Predicate, PredicateEvaluator>> predicateEvaluators, @Nullable FilterContext mainFilterContext, | ||
@Nonnull FilterContext subFilterContext, List<AggregationFunction> aggregationFunctions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor, convention) We don't usually use @Nonnull
, but only annotate @Nullable
and assume everything else as non-null
// Otherwise, there is no way to tell whether the 1st reload on server side is finished, | ||
// which may hit the race condition that the 1st reload finishes after the 2nd reload is fully done. | ||
// 10 seconds are still better than hitting race condition which will time out after 10 minutes. | ||
Thread.sleep(10_000L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid this kind of wait because it will waste resource on fast machine but flaky on slow machine. We don't need to add test in star-tree triggering test because the intention for this test is to just verify star-tree is created. We should add more tests into the StarTreeClusterIntegrationTest
@@ -38,7 +38,9 @@ | |||
import org.testng.annotations.BeforeClass; | |||
import org.testng.annotations.Test; | |||
|
|||
import static org.junit.Assert.assertFalse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong import
// Prevent instantiation, make checkstyle happy | ||
} | ||
|
||
public static ProjectionPlanNode maybeGetStartreeProjectionOperator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest moving this into the StarTreeUtils
as
@Nullable
public static BaseProjectOperator<?> createStarTreeBasedProjectOperator(...)
It returns null
when star-tree cannot be used. This way we don't need to worry about using instanceof
to identify whether it is star-tree based, and we don't need to create PlanNode
out of BaseFilterOperator
which should be avoided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case would you put the onus of creating the ProjectPlanNode
on the caller of createStarTreeBasedProjectOperator
if it returned null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Note that I'm suggesting directly creating operator instead of plan node, so the caller is responsible for creating the BaseProjectOperator
if star-tree cannot be used
import org.apache.pinot.core.query.aggregation.function.AggregationFunction; | ||
|
||
|
||
public class CombinedFilteredAggregationContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used as a inner helper wrapper class for AggregationFunctionUtils.buildFilteredAggregateProjectOperators()
. Suggest making it a private inner class, and we can directly access the fields within the class
… AggregationFunctionUtils
…support-filtered-aggs
@@ -82,6 +83,16 @@ private String generateAggregation(String metricColumn) { | |||
metricColumn); | |||
} | |||
|
|||
private String generateFilteredAggregation(String metricColumn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang these tests seem to consistently fail when run using the multi-stage engine. Are filtered aggregations supported in multi-stage, or not yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attempted fix in 990bf5a
@@ -214,46 +214,69 @@ public static Object getConvertedFinalResult(DataTable dataTable, ColumnDataType | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better, I was not liking the nested Pair usage haha. Thanks!
This PR aims to add support for leveraging StarTree index from filtered aggregations (i.e.
SELECT COUNT(*) FILTER (WHERE foo='bar')
), including filtered group-by aggregations. For queries with a mixture of star-tree-solvable aggregations and aggs which cannot be solved by star-tree, those aggs solvable via star-tree will do so iff all aggs in the same filter swimlane are solvable via star-tree (where swimlane means sameFILTER WHERE
expression, including lack ofFILTER WHERE
).tags: [
feature
]