-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store nulls as densely packed bitfields #1862
Conversation
84236ea
to
cfddbc5
Compare
df1bb2c
to
b7352e4
Compare
cfddbc5
to
c66c8ba
Compare
00eacad
to
d03b06a
Compare
f7f6de2
to
661c923
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1862 +/- ##
==========================================
- Coverage 89.78% 89.75% -0.04%
==========================================
Files 869 869
Lines 31050 31155 +105
==========================================
+ Hits 27878 27962 +84
- Misses 3172 3193 +21
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have another quick go-over after your changes.
661c923
to
744146b
Compare
eef1893
to
985f5e2
Compare
instead of one value per byte
985f5e2
to
2414fe3
Compare
Instead of one value per byte.
I also modified NullMask so that modification is only possible through its functions, not via direct access to the underlying data, so that the mayContainNulls is always set if a non-null bit is copied into the data with
NullMask::copyNullMask
(which had caused issues since I had missed that requirement initially).The possibility of values with sizes smaller than one byte breaks a lot of assumptions in the storage logic (a lot of parts rely on the idea of the number of bytes per value), but for this at least the fallout of that isn't too large. In NullNodeColumn I was able to just adjust the numValuesPerPage variable after it has been initially set in NodeColumn to an incorrect value based on the size of each value in bytes, but in ColumnChunk that wasn't possible, since the size is immediately used to create a buffer. Instead I adjusted the value size logic to use bits instead of bytes. It's a little messy right now, but while I'd prefer to keep the null-related specifics inside NullColumnChunk, I imagine that an overhaul of that logic will be necessary anyway once we introduce more complex compression schemes.
A
NULL_
logical/physical type was introduced since nulls are now stored differently from bools, but that can be removed again when bools are stored in the same way.