Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store nulls as densely packed bitfields #1862

Merged
merged 1 commit into from
Aug 9, 2023
Merged

Store nulls as densely packed bitfields #1862

merged 1 commit into from
Aug 9, 2023

Conversation

benjaminwinger
Copy link
Collaborator

Instead of one value per byte.

I also modified NullMask so that modification is only possible through its functions, not via direct access to the underlying data, so that the mayContainNulls is always set if a non-null bit is copied into the data with NullMask::copyNullMask (which had caused issues since I had missed that requirement initially).

The possibility of values with sizes smaller than one byte breaks a lot of assumptions in the storage logic (a lot of parts rely on the idea of the number of bytes per value), but for this at least the fallout of that isn't too large. In NullNodeColumn I was able to just adjust the numValuesPerPage variable after it has been initially set in NodeColumn to an incorrect value based on the size of each value in bytes, but in ColumnChunk that wasn't possible, since the size is immediately used to create a buffer. Instead I adjusted the value size logic to use bits instead of bytes. It's a little messy right now, but while I'd prefer to keep the null-related specifics inside NullColumnChunk, I imagine that an overhaul of that logic will be necessary anyway once we introduce more complex compression schemes.

A NULL_ logical/physical type was introduced since nulls are now stored differently from bools, but that can be removed again when bools are stored in the same way.

@benjaminwinger benjaminwinger force-pushed the bitpacking branch 2 times, most recently from 84236ea to cfddbc5 Compare July 26, 2023 19:24
@ray6080 ray6080 force-pushed the node-group branch 2 times, most recently from df1bb2c to b7352e4 Compare August 1, 2023 07:05
@ray6080 ray6080 force-pushed the node-group branch 3 times, most recently from 00eacad to d03b06a Compare August 2, 2023 08:23
Base automatically changed from node-group to master August 2, 2023 11:18
@benjaminwinger benjaminwinger force-pushed the bitpacking branch 2 times, most recently from f7f6de2 to 661c923 Compare August 2, 2023 20:22
@codecov
Copy link

codecov bot commented Aug 2, 2023

Codecov Report

Patch coverage: 93.60% and project coverage change: -0.04% ⚠️

Comparison is base (924d22e) 89.78% compared to head (2414fe3) 89.75%.
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1862      +/-   ##
==========================================
- Coverage   89.78%   89.75%   -0.04%     
==========================================
  Files         869      869              
  Lines       31050    31155     +105     
==========================================
+ Hits        27878    27962      +84     
- Misses       3172     3193      +21     
Files Changed Coverage Δ
src/binder/bind/bind_updating_clause.cpp 96.12% <ø> (ø)
src/include/common/types/types.h 100.00% <ø> (ø)
src/include/common/vector/value_vector.h 100.00% <ø> (ø)
...rc/include/storage/storage_structure/lists/lists.h 97.18% <ø> (ø)
src/storage/store/node_column.cpp 78.41% <78.57%> (-0.44%) ⬇️
src/common/null_mask.cpp 94.20% <84.61%> (-5.80%) ⬇️
src/storage/copier/column_chunk.cpp 77.04% <95.83%> (-0.11%) ⬇️
src/common/types/types.cpp 92.92% <100.00%> (+2.75%) ⬆️
src/common/vector/value_vector.cpp 99.47% <100.00%> (+<0.01%) ⬆️
src/include/common/null_mask.h 100.00% <100.00%> (ø)
... and 14 more

... and 16 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have another quick go-over after your changes.

src/include/common/null_mask.h Outdated Show resolved Hide resolved
src/include/storage/copier/column_chunk.h Outdated Show resolved Hide resolved
src/storage/store/node_column.cpp Outdated Show resolved Hide resolved
src/storage/store/node_column.cpp Outdated Show resolved Hide resolved
src/include/storage/copier/column_chunk.h Outdated Show resolved Hide resolved
src/storage/copier/column_chunk.cpp Outdated Show resolved Hide resolved
src/storage/copier/column_chunk.cpp Outdated Show resolved Hide resolved
src/include/storage/copier/column_chunk.h Outdated Show resolved Hide resolved
src/include/storage/copier/column_chunk.h Show resolved Hide resolved
@benjaminwinger benjaminwinger force-pushed the bitpacking branch 2 times, most recently from eef1893 to 985f5e2 Compare August 8, 2023 22:23
instead of one value per byte
@ray6080 ray6080 merged commit 95d3ba3 into master Aug 9, 2023
10 checks passed
@ray6080 ray6080 deleted the bitpacking branch August 9, 2023 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants