-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UltraLogLog support #11835
Add UltraLogLog support #11835
Conversation
UltraLogLog is a variant of HyperLogLog from dynatrace, with an implementation available in hash4j under the apache license. This adds support for using it in the ways you'd use HLL in Pinot. When using it against normal java types, wyhash 4 is used as the default hashing algorithm, when bringing your own serialized sketches you can use any. * supports `UltraLogLog` as a data type * the serialization format includes serializing the P value as the first byte in the data stream, so the width is known in streams * adds `DistinctCountULL` and `DistinctCountRawULL` for use in SQL * raw outputs Base64 encoded bytes that can be fed into `UltraLogLog.wrap` in other services * adds startree support * adds merge rollup support * new transformaction functions added * `toULL` allows turning data into a ULL * `fromULL` lets you import ULLs encoded as byte arrays outside pinot
Codecov Report
@@ Coverage Diff @@
## master #11835 +/- ##
============================================
- Coverage 62.87% 62.80% -0.07%
+ Complexity 1141 1140 -1
============================================
Files 2367 2373 +6
Lines 127888 128207 +319
Branches 19732 19787 +55
============================================
+ Hits 80414 80525 +111
- Misses 41752 41958 +206
- Partials 5722 5724 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 12 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Can you rebase to latest? |
done, updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, otherwise
Thanks for the contribution! |
UltraLogLog is a variant of HyperLogLog from dynatrace, with an implementation available in hash4j under the apache license.
This adds support for using it in the ways you'd use HLL in Pinot.
When using it against normal java types, wyhash 4 is used as the default hashing algorithm, when bringing your own serialized sketches you can use any.
UltraLogLog
as a data typeDistinctCountULL
andDistinctCountRawULL
for use in SQLUltraLogLog.wrap
in other servicestoULL
allows turning data into a ULLfromULL
lets you import ULLs encoded as byte arrays outside pinotRelease Notes
distinctCountULL
anddistinctCountRawULL
)