-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow integer packing and unpacking of partial chunks #3681
Allow integer packing and unpacking of partial chunks #3681
Conversation
3ea6f03
to
b8fb1c1
Compare
f532892
to
0fff194
Compare
Some database sizes before/after the change (sizes in KB unless stated otherwise):
|
For the ldbc-sf01 dataset, I noticed that the database size is non-deterministic (it fluctuates between runs), @benjaminwinger @ray6080 could one of you explain why this is the case? I do notice that the number of pages goes down though (below is the Comment table as an example) Before
After
|
Benchmark ResultMaster commit hash:
|
Looks like the non-determinism was due to parallel copies changing how the data is arranged, the numbers have been updated with single-threaded copying. |
// TODO(bmwinger): Now I'm a bit confused here. Why do we always need to unpack the chunk | ||
// first? | ||
// Can't we just unpack the chunk if we need to modify only partial of it? | ||
fastunpack(chunkStart, chunk, header.bitWidth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should be able to address the comment here. You can do an unaligned pack first (like the one at the end) and then this loop can be modified to remove the unpack and just pack the 32-value chunks (which should be able to be simplified; the looping is a little complicated because it starts and ends at arbitrary points in the chunk, but if its only being used for full chunks the values can just be copied from the srcBuffer into the temporary chunk minus the offset, and then packed, before moving to the next 32-value chunk).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
3a5527b
to
ca96787
Compare
ca96787
to
80cca78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can u also report some numbers on COPY? Just want to know if extra paddings will introduce some regression over COPY. The benchmark server is not able to show diff of COPY on ldbc100 right now.
#include "common/utils.h" | ||
|
||
namespace kuzu::storage { | ||
namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need L9 here mespace {
? Also L111.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made an anonymous namespace for the internal functions, I can remove if it's not part of our coding standards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I don't think we have that convention, but not against adding that either. @benjaminwinger what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it. It's similar to making functions static, but somewhat more capable since you can also use it for classes/structs which are local to the translation unit.
80cca78
to
c805e5a
Compare
Benchmark ResultMaster commit hash:
|
7cb1820
to
44f0d81
Compare
Quick performance benchmark on a dataset with 100M nodes:
|
44f0d81
to
4245526
Compare
Benchmark ResultMaster commit hash:
|
c2ceb5f
to
c45cc9a
Compare
Benchmark ResultMaster commit hash:
|
c45cc9a
to
811a815
Compare
Benchmark ResultMaster commit hash:
|
Description
#2083
Reworked single-value bitpacking/unpacking for int128 to work for all integral types.
Used the above to allow packing of a number of values per page that isn't a multiple of 32 (by packing individual values at the end of each page).
Contributor agreement