Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch. #15860

Merged
merged 2 commits into from
Feb 8, 2024

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Feb 7, 2024

The prior code from #15162 was reading only the low-order byte of an int representing the size of a coupon set. As a result, it would erroneously believe that a coupon set with a multiple of 256 elements was empty. At the default lgK (12) this is possible when the sketch has exactly 256 items in it. (At lgK = 12, before it reaches 512, it would be promoted from coupon set to full HLL.)

The prior code from apache#15162 was reading only the low-order byte of an int
representing the size of a coupon set. As a result, it would erroneously
believe that a coupon set with a multiple of 256 elements was empty.
return false;
}
// Based on org.apache.datasketches.hll.PreambleUtil.extractHashSetCount
int setCount = buf.get(position + 8); // get(HASH_SET_COUNT_INT)
// Endianness of buf doesn't matter, since we're checking for equality with zero.
int setCount = buf.getInt(position + 8); // getInt(HASH_SET_COUNT_INT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks right to me 👍

@imply-cheddar
Copy link
Contributor

The changes generally look good to me. We should probably go and ask the apache-datasketches folks if we can get an "isEffectivelyEmpty" method added to the Sketch so that we can remove all of this introspective logic to begin with. That should enable a much more future-proof way of dealing with this.

@abhishekrb19 abhishekrb19 added this to the 29.0.0 milestone Feb 8, 2024
@cryptoe cryptoe merged commit 21a97f4 into apache:master Feb 8, 2024
53 checks passed
@gianm gianm deleted the fix-hll-empty-detector branch February 8, 2024 03:03
adarshsanjeev pushed a commit to adarshsanjeev/druid that referenced this pull request Feb 8, 2024
…e#15860)

* Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch.

The prior code from apache#15162 was reading only the low-order byte of an int
representing the size of a coupon set. As a result, it would erroneously
believe that a coupon set with a multiple of 256 elements was empty.
LakshSingla pushed a commit to LakshSingla/druid that referenced this pull request Feb 8, 2024
…e#15860)

* Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch.

The prior code from apache#15162 was reading only the low-order byte of an int
representing the size of a coupon set. As a result, it would erroneously
believe that a coupon set with a multiple of 256 elements was empty.
LakshSingla pushed a commit that referenced this pull request Feb 8, 2024
… (#15861)

* Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch.

The prior code from #15162 was reading only the low-order byte of an int
representing the size of a coupon set. As a result, it would erroneously
believe that a coupon set with a multiple of 256 elements was empty.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants