-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for murmur3 as a partition function #12049
Add support for murmur3 as a partition function #12049
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12049 +/- ##
============================================
+ Coverage 61.63% 61.66% +0.03%
- Complexity 1151 1152 +1
============================================
Files 2389 2390 +1
Lines 129839 130036 +197
Branches 20085 20110 +25
============================================
+ Hits 80020 80181 +161
- Misses 43996 44023 +27
- Partials 5823 5832 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
c59fa5f
to
a604d31
Compare
...gment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public int getPartition(Object value) { | ||
return (Hashing.murmur3_32_fixed(_hashSeed).hashBytes(value.toString().getBytes(UTF_8)).asInt() & Integer.MAX_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks that murmur3 has murmur3_32 / murmur3_128
. Can we add TODO
comment to add 128 bit support when needed in the future?
* </ul> | ||
*/ | ||
@Test | ||
public void testMurmur3Partitioner() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the test comparing against some hand computed values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
* Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash | ||
*/ | ||
public class Murmur3PartitionFunction implements PartitionFunction { | ||
private static final String NAME = "Murmur3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interestingly we already have MURMUR3 in
* distributed with this work for additional information |
i wonder where the impl is and if we can reuse :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that we used the same implementation but for 128 bits here. So chose to use the same one, but the 32_x86 variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Also Guava already provides the murmur3 implementation (used in HashUtils
), do we need to copy it from another library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to add the support murmur3 that Debezium uses as default. Unfortunately, Debezium is using a special variant x64_32 that is not available for many libraries including Guava and Apache. I think that @aishikbh had to copy x64_32
variant implementation from infinispan
that Debezium is directly depend upon
https://github.com/infinispan/infinispan/blob/main/commons/all/src/main/java/org/infinispan/commons/hash/MurmurHash3.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snleee summarised it perfectly.
Just to add : murmur3
has 3 variants as part of the original implementation : x86 32 bits
, x64 128 bits
and x86 128 bits
and Guava supports these implementations. Since we are generating partition numbers x86 32 bits
should cover most of the use cases.
The issue comes from the users of Debezium. Debezium uses a variant of the x64 128
version reduced to 32 bits. I wanted to add that as well, as some users might want to use Debezium + Murmur3
. I have kept the x86 32 bits
variant as the default as that is part of the original implementation and x64 32 bits
as the configurable one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
148323c
to
b350dc1
Compare
@aishikbh For release note, can you add some example configuration for hash function (including the seed) |
sure I will do that. I plan to add to the documentation as well. |
94b9464
to
4203546
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we follow up on Jackie/Rong's question?
LGTM otherwise.
@Override | ||
public int getPartition(Object value) { | ||
if (_variant.equals("x86_32")) { | ||
return (murmurHash332BitsX86(value.toString().getBytes(UTF_8), _hashSeed) & Integer.MAX_VALUE) % _numPartitions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
murmur3Hash32BitsX64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. This name is better for readability.
…tests to verify the as well.
968b3f8
to
8101c43
Compare
@walterddr @Jackie-Jiang Are we good with this PR? @aishikbh answered on your question on why not reusing the function from guava lib. If there's no further question, I will merge this. |
Currently pinot supports murmur2 as a partition function. This PR adds support for murmur3 as a partition function.
Release Notes
seed
andvariant
for the hash infunctionConfig
field ofcolumnPartitionMap
for Murmur3.seed
is 0.Murmur3
:x86_32
andx64_32
configurable using thevariant
field infunctionConfig
x86_32
is Guava's version of the original implementation of Murmur3 32 bit hash for x86 architecture.x64_32
was not part of the original implementation but is a variant which reduces the bits fromx64_128
variant of the implementation. I have taken this implementation fromInfinispan
which is also used byDebezium
.x86_32
variant as it was part of the original implementation.functionConfig
;Here there is no functionConfig configured, so the
seed
value will be0
and variant will bex86_32
.Here the
seed
is configured as9001
but as no variant is provided,x86_32
will be picked up.Here the
variant
is mentioned so Murmur3 will use thex64_32
variant with9001
as seed.Debezium
andMurmur3
as partitioning function :byte[]
,String
orlong[]
columns.variant
should be set asx64_32
andseed
should be set as9001
.