-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No dictionary group by perf #8195
No dictionary group by perf #8195
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8195 +/- ##
============================================
- Coverage 71.33% 70.07% -1.26%
- Complexity 4308 4314 +6
============================================
Files 1623 1624 +1
Lines 84365 84883 +518
Branches 12657 12794 +137
============================================
- Hits 60183 59485 -698
- Misses 20050 21286 +1236
+ Partials 4132 4112 -20
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
6db76af
to
36c68fc
Compare
...rg/apache/pinot/core/query/aggregation/groupby/NoDictionaryMultiColumnGroupKeyGenerator.java
Outdated
Show resolved
Hide resolved
...rg/apache/pinot/core/query/aggregation/groupby/NoDictionaryMultiColumnGroupKeyGenerator.java
Show resolved
Hide resolved
36c68fc
to
40b5a78
Compare
40b5a78
to
9936d55
Compare
Added another benchmark with a group by on a raw string column. The effect is mostly drowned out by excessive String allocation (no interning 😢), but this branch still offers a significant improvement. Note that the majority of the branches are pruned by tiered compilation and are entirely predictable. Before
After
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR definitely improves the performance by reducing the unnecessary allocations. My concern is about whether adding the branches can hurt the performance comparing to reusing the keys
(even though the overall performance is still much better because of the reduced allocations). If we are confident that JVM can optimize the branches so that the overhead is negligible, then it is good to go.
I'm not too concerned about the branches that don't get pruned because the they are predictable (they go the same way for the entire block) but I think we could have the best of both worlds with code generation - if we can generate a tuple and code to lazily materialise the tuple for each shape we see in queries (of which I doubt there would be more than 20 patterns in a realistic deployment) we can have both low memory footprint and no conditionals. |
* generalise benchmark, add raw sum and raw group by * memory efficiency in no dictionary multi group by
This change is motivated by slow queries at one of our customers which group by a raw column, where 30GB was seen to be allocated by
NoDictionaryMultiColumnGroupKeyGenerator.generateKeyForBlock
, which is also where most of the method samples were taken:This PR starts by generalising one of our pre-existing benchmarks which does a good job of exercising the entire query execution. It is parameterised so different queries can be added easily, and the generated data is parameterised too so that columns with different cardinalities can be created.
Then, the actual improvement is made in the second commit. It transposes the group key generation since the
BlockValSet
s will be cached byDataBlockCache
anyway, then accumulates keys into a flyweight, which only needs to be allocated to memoize the group key on its first occurrence. This roughly halves average time and reduces allocation by at least a factor of 4:before:
after: