Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant compression #2516

Merged
merged 2 commits into from
Dec 5, 2023
Merged

Constant compression #2516

merged 2 commits into from
Dec 5, 2023

Conversation

benjaminwinger
Copy link
Collaborator

Constant compressed columns use INVALID_PAGE_IDX as their page index in the metadata, and reading from them uses the normal functions, but there's a special case for when we try to read from an INVALID_PAGE_IDX.

To get it working for nulls I had to set up out of place commits for the null column. This always happens when the parent column does an out of place commit (unnecessary, but it would take a lot of work to de-couple that I think), but we may now need to do an out-of-place commit for the null column when doing an in-place commit on its parent column. As such, I've removed in-place updates to the null column when we do in-place updates to the parent column, and have it otherwise doing an out-of-place update on the null column only if the main column does an in-place update to avoid duplicating work.

I'd added some bounds checking, and more fine-grained control over the compression choices at the ColumChunk level when debugging, which I've left in, but could be removed since I was able to get it working for every fixed-sized type.

Copy link

codecov bot commented Nov 29, 2023

Codecov Report

Attention: 26 lines in your changes are missing coverage. Please review.

Comparison is base (ad5a79f) 92.83% compared to head (9147b85) 92.91%.

Files Patch % Lines
src/storage/store/column.cpp 89.79% 10 Missing ⚠️
src/storage/compression/compression.cpp 86.53% 7 Missing ⚠️
src/include/storage/compression/compression.h 58.33% 5 Missing ⚠️
src/storage/store/column_chunk.cpp 84.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2516      +/-   ##
==========================================
+ Coverage   92.83%   92.91%   +0.07%     
==========================================
  Files        1025     1025              
  Lines       38324    38401      +77     
==========================================
+ Hits        35579    35679     +100     
+ Misses       2745     2722      -23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benjaminwinger
Copy link
Collaborator Author

Benchmark actually failed, despite the job status. The storage is not compatible with the old version since I'd changed the compression ID numbers to turn them into a bitfield. Maybe that should be changed anyway, since what's stored in the file doesn't need to be a bitfield, and that limits the total number of values. We could have a separate bitfield for configuration, but since it's not being used outside of the ColumnChunk, maybe it's a better idea to revert that part for now and leave more fine-grained compression settings to a later time.

@benjaminwinger benjaminwinger force-pushed the constant-compression branch 3 times, most recently from 5108229 to 4a15a0f Compare December 1, 2023 20:48
src/include/storage/compression/compression.h Outdated Show resolved Hide resolved
src/include/storage/compression/compression.h Outdated Show resolved Hide resolved
src/storage/compression/compression.cpp Outdated Show resolved Hide resolved
src/storage/store/struct_column.cpp Show resolved Hide resolved
src/storage/store/string_column_chunk.cpp Outdated Show resolved Hide resolved
src/storage/compression/compression.cpp Outdated Show resolved Hide resolved
@benjaminwinger benjaminwinger force-pushed the constant-compression branch 2 times, most recently from 1570f46 to de4f5ff Compare December 5, 2023 16:11
Removed various dedicated functions in place of using the compression interfaces
Disabled for nulls and the Rel table CSR offset column
@benjaminwinger benjaminwinger merged commit 10dc1ed into master Dec 5, 2023
14 checks passed
@benjaminwinger benjaminwinger deleted the constant-compression branch December 5, 2023 23:04
@benjaminwinger benjaminwinger mentioned this pull request Dec 18, 2023
@benjaminwinger benjaminwinger mentioned this pull request Jan 15, 2024
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants