-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework VAR_LIST storage layout #3060
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3060 +/- ##
==========================================
- Coverage 92.66% 92.63% -0.04%
==========================================
Files 1157 1157
Lines 43074 43182 +108
==========================================
+ Hits 39916 40003 +87
- Misses 3158 3179 +21 ☔ View full report in Codecov by Sentry. |
4a6d872
to
f7523d7
Compare
std::unique_ptr<Column> dataColumn; | ||
std::unique_ptr<VarListDataColumnChunk> tmpDataColumnChunk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a temp column chunk as a class field? Should remove it.
Can you put some benchmark numbers here?
|
2a87361
to
d93062f
Compare
d93062f
to
e4571a8
Compare
See #3093. |
Previously, we only stored the end offset in the VarList storage layout.
For an example layout for four lists of INT64: [4,7,8,12], null, [2, 3], []. We store as follows:
We need to ensure that the offset column must be in ascending order. So when we update [4,7,8,12] to [4,5,6,7,8], we need to rewrite the whole varlist column chunk, both offset and data column chunk.
It will take time to rewrite the column chunk. To solve it, we additionally store the length of VarList. So the data layout is just as follows.
When we update, we just append data in the back of the data column chunk and only update the offset and size column chunk. The varlist column chunk will be