Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework VAR_LIST storage layout #3060

Closed
wants to merge 1 commit into from
Closed

Conversation

hououou
Copy link
Collaborator

@hououou hououou commented Mar 14, 2024

Previously, we only stored the end offset in the VarList storage layout.
For an example layout for four lists of INT64: [4,7,8,12], null, [2, 3], []. We store as follows:

Offset column: [4, 4, 6, 6]
data column: [4, 7, 8, 12, 2, 3]

We need to ensure that the offset column must be in ascending order. So when we update [4,7,8,12] to [4,5,6,7,8], we need to rewrite the whole varlist column chunk, both offset and data column chunk.

Offset column: [5, 5, 7, 7]
data column: [4, 5, 6, 7, 8, 2, 3]

It will take time to rewrite the column chunk. To solve it, we additionally store the length of VarList. So the data layout is just as follows.

Offset column: [4, 4, 6, 6]
Size column: [4, 0, 2, 0]
data column: [4, 7, 8, 12, 2, 3]

When we update, we just append data in the back of the data column chunk and only update the offset and size column chunk. The varlist column chunk will be

Offset column: [10, 4, 6, 6]
Size column: [5, 0, 2, 0]
data column: [4, 7, 8, 12, 2, 3, 4, 5, 6, 7, 8]

Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 86.89320% with 27 lines in your changes are missing coverage. Please review.

Project coverage is 92.63%. Comparing base (c3decc2) to head (e4571a8).
Report is 4 commits behind head on master.

Files Patch % Lines
src/storage/store/var_list_column_chunk.cpp 72.36% 21 Missing ⚠️
src/storage/store/var_list_column.cpp 94.28% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3060      +/-   ##
==========================================
- Coverage   92.66%   92.63%   -0.04%     
==========================================
  Files        1157     1157              
  Lines       43074    43182     +108     
==========================================
+ Hits        39916    40003      +87     
- Misses       3158     3179      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hououou hououou requested a review from ray6080 March 14, 2024 19:13
@hououou hououou force-pushed the remove_var_list_index_column branch from 4a6d872 to f7523d7 Compare March 14, 2024 21:08
src/include/storage/store/var_list_column_chunk.h Outdated Show resolved Hide resolved
src/storage/store/var_list_column.cpp Outdated Show resolved Hide resolved
std::unique_ptr<Column> dataColumn;
std::unique_ptr<VarListDataColumnChunk> tmpDataColumnChunk;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need a temp column chunk as a class field? Should remove it.

src/storage/stats/table_statistics_collection.cpp Outdated Show resolved Hide resolved
src/storage/store/var_list_column.cpp Outdated Show resolved Hide resolved
src/storage/store/var_list_column_chunk.cpp Outdated Show resolved Hide resolved
src/storage/store/var_list_column_chunk.cpp Outdated Show resolved Hide resolved
src/storage/store/var_list_column_chunk.cpp Show resolved Hide resolved
src/storage/store/var_list_column_chunk.cpp Outdated Show resolved Hide resolved
src/storage/store/var_list_column_chunk.cpp Outdated Show resolved Hide resolved
@ray6080
Copy link
Contributor

ray6080 commented Mar 15, 2024

Can you put some benchmark numbers here?

  1. COPY node/rel with a var list property. You can populate some csv files of 10M tuples.
  2. See if you can also benchmark inserting 1000 of nodes/rels after COPY.

@hououou hououou force-pushed the remove_var_list_index_column branch 2 times, most recently from 2a87361 to d93062f Compare March 19, 2024 16:37
@hououou hououou force-pushed the remove_var_list_index_column branch from d93062f to e4571a8 Compare March 19, 2024 18:47
@hououou hououou changed the title Remove var list index column Rework VAR_LIST storage layout Mar 20, 2024
@ray6080
Copy link
Contributor

ray6080 commented Mar 23, 2024

See #3093.

@ray6080 ray6080 closed this Mar 23, 2024
@hououou hououou deleted the remove_var_list_index_column branch March 26, 2024 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants