Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeGroup list storage refactor #1885

Merged
merged 1 commit into from
Aug 3, 2023
Merged

NodeGroup list storage refactor #1885

merged 1 commit into from
Aug 3, 2023

Conversation

acquamarin
Copy link
Collaborator

@acquamarin acquamarin commented Aug 2, 2023

This PR refactors the list storage structure using a nested dataColumn design:
Each list contains two columns: offsetColumn, dataColumn.
OffsetColumn: stores the endOffset in the dataColumn for each list element.
DataColumn: stores the actual elements of the list, indexed by the offsetColumn.
For example: store [1,3,null,4],null,[5,7],[]:
OffsetColumn: [4, 4 6, 6]
NullColumn: [false, true, false, false]
DataColumn: [1,3, null, 4, 5, 7]
To read lists:
Note: we always read the current offset and the prev offset for the element. The current offset indicates the endOffset(exclusive) in the dataVector for the current element and the prev offset indicates the startOffset(inclusive) in the dataVector for the current element.
For example: we want to read the 3rd element, we read two offsets:
offset at 2nd pos: offsetColumn[2]: 4
offset at 3rd pos: offsetColumn[3]: 6
Then we can figure out that, the 3rd element's data starts at 4 and ends at 6 in the data column.

1st lst: offset 4 means the first list data ends at offset 4(exclusive) in the dataColumn. So the 1st list data is stored in position 0~4(which is [1,3,null,4]).
2nd lst: offset 4 menas the second list ends at offset 4(exclusive) in the dataColumn, however the nullColumn suggests that it is null. So the 2nd lst is null.
3rd lst: offset 6 means the third list data ends at offset 6(exclusive) and 4 means the third list data starts at offset4(inclusive). So the 3rd list data is [5,7].
4th lst: offset 6 means the fourth list dataends at offset 6(exclusive) and 6 means the third list data starts at offset6(inclusive). So the 4th list is an empty list.

@acquamarin acquamarin changed the title Node group list NodeGroup list storage refactor Aug 2, 2023
@codecov
Copy link

codecov bot commented Aug 2, 2023

Codecov Report

Patch coverage: 74.53% and project coverage change: -0.24% ⚠️

Comparison is base (bb4f187) 89.50% compared to head (cbc66db) 89.26%.

❗ Current head cbc66db differs from pull request most recent head 2925928. Consider uploading reports for the commit 2925928 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1885      +/-   ##
==========================================
- Coverage   89.50%   89.26%   -0.24%     
==========================================
  Files         834      831       -3     
  Lines       30482    30491       +9     
==========================================
- Hits        27283    27219      -64     
- Misses       3199     3272      +73     
Files Changed Coverage Δ
src/include/common/vector/auxiliary_buffer.h 92.30% <ø> (ø)
src/include/storage/copier/column_chunk.h 95.00% <ø> (+0.26%) ⬆️
src/include/storage/copier/struct_column_chunk.h 100.00% <ø> (ø)
...rc/include/storage/copier/var_sized_column_chunk.h 50.00% <ø> (-50.00%) ⬇️
...essor/operator/recursive_extend/recursive_join.cpp 94.55% <ø> (ø)
src/storage/store/struct_node_column.cpp 77.27% <0.00%> (-22.73%) ⬇️
src/storage/copier/struct_column_chunk.cpp 61.83% <16.66%> (-2.46%) ⬇️
src/storage/copier/column_chunk.cpp 63.83% <19.56%> (-11.86%) ⬇️
src/storage/copier/list_column_chunk.cpp 61.81% <61.81%> (ø)
src/storage/store/var_sized_node_column.cpp 98.11% <93.75%> (+4.28%) ⬆️
... and 9 more

... and 46 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@acquamarin acquamarin force-pushed the node-group-list branch 2 times, most recently from c44cd94 to 072b545 Compare August 3, 2023 01:51
@acquamarin acquamarin requested a review from ray6080 August 3, 2023 02:13
src/common/vector/auxiliary_buffer.cpp Show resolved Hide resolved
src/storage/copier/string_column_chunk.cpp Outdated Show resolved Hide resolved
src/include/storage/copier/list_column_chunk.h Outdated Show resolved Hide resolved
src/include/storage/copier/list_column_chunk.h Outdated Show resolved Hide resolved
src/include/storage/store/node_column.h Outdated Show resolved Hide resolved
src/storage/copier/table_copy_utils.cpp Show resolved Hide resolved
src/storage/copier/table_copy_utils.cpp Outdated Show resolved Hide resolved
src/storage/store/list_node_column.cpp Outdated Show resolved Hide resolved
src/storage/store/list_node_column.cpp Outdated Show resolved Hide resolved
src/storage/store/list_node_column.cpp Outdated Show resolved Hide resolved
@acquamarin acquamarin merged commit 8a49c40 into master Aug 3, 2023
10 checks passed
@acquamarin acquamarin deleted the node-group-list branch August 3, 2023 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants