Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework rel copy task scheduling and improve copy performance #1621

Merged
merged 1 commit into from
Jun 5, 2023

Conversation

ray6080
Copy link
Contributor

@ray6080 ray6080 commented Jun 4, 2023

Major changes:

  1. Change rel copier to follow morsel-driven task scheduling as node copier does.
  2. Improve rel copy performance by getting rid of unnecessary arrow toString calls.
  3. Separate NodeCopyExecutor and RelCopyExecutor. These two no longer extend from a base class. TableCopyExecutor is turned into a utils class TableCopyUtils.
  4. Move hash index initialization logic inside NodeCopier, which further simplifies NodeCopyExecutor.
  5. Overflow sorting logic is removed from rel copier, as I don't think we will need that when we move to the node-group based storage design. (This is debatable. If the reviewer thinks better not to remove it for now, I can add it back)

On my local laptop (M1 macbook air, 16GB RAM, 8 threads, default setting), the performance boost on copying ldbc10 knows and likesComment compared to v0.0.3 are:

likesComment knows
v0.0.3 56,811.2 6,071.1
current 12,686.0 (4.5x) 2,110.0 (2.9x)

@ray6080 ray6080 changed the title Rework rel copy task scheduling and data copy Rework rel copy task scheduling and improve copy performance Jun 4, 2023
@codecov
Copy link

codecov bot commented Jun 4, 2023

Codecov Report

Patch coverage: 76.72% and project coverage change: -9.82 ⚠️

Comparison is base (2369fa8) 92.00% compared to head (5720d06) 82.18%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1621      +/-   ##
==========================================
- Coverage   92.00%   82.18%   -9.82%     
==========================================
  Files         715      716       +1     
  Lines       26075    26042      -33     
==========================================
- Hits        23989    21402    -2587     
- Misses       2086     4640    +2554     
Impacted Files Coverage Δ
src/include/common/types/types.h 100.00% <ø> (ø)
src/include/processor/result/factorized_table.h 80.64% <ø> (-16.13%) ⬇️
src/include/transaction/transaction.h 100.00% <ø> (ø)
src/include/storage/copier/node_copier.h 66.66% <64.28%> (-26.89%) ⬇️
src/include/storage/copier/rel_copier.h 65.00% <65.00%> (ø)
...e/in_mem_storage_structure/in_mem_column_chunk.cpp 70.31% <65.07%> (-9.96%) ⬇️
.../storage/in_mem_storage_structure/in_mem_lists.cpp 69.56% <69.84%> (-2.60%) ⬇️
src/storage/copier/rel_copier.cpp 77.33% <77.33%> (ø)
...de/storage/in_mem_storage_structure/in_mem_lists.h 80.76% <80.00%> (-4.42%) ⬇️
src/storage/copier/rel_copy_executor.cpp 80.55% <81.00%> (-13.32%) ⬇️
... and 11 more

... and 181 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/include/transaction/transaction.h Show resolved Hide resolved
src/include/storage/copier/node_copier.h Show resolved Hide resolved
src/include/storage/copier/rel_copier.h Outdated Show resolved Hide resolved
src/include/storage/copier/rel_copier.h Outdated Show resolved Hide resolved
src/include/storage/copier/rel_copier.h Outdated Show resolved Hide resolved
src/include/storage/copier/rel_copier.h Outdated Show resolved Hide resolved
src/include/storage/copier/node_copier.h Outdated Show resolved Hide resolved
src/storage/copier/table_copy_utils.cpp Outdated Show resolved Hide resolved
: copyDescription{copyDescription}, outputDirectory{std::move(outputDirectory)},
taskScheduler{taskScheduler}, catalog{catalog},
tableSchema{catalog.getReadOnlyVersion()->getTableSchema(table->getTableID())},
table{table}, nodesStatisticsAndDeletedIDs{nodesStatisticsAndDeletedIDs}, numRows{0} {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unify the term usage of row and tuple. I would prefer tuple.

@ray6080 ray6080 merged commit 7b46a07 into master Jun 5, 2023
7 of 8 checks passed
@ray6080 ray6080 deleted the copy-rework branch June 5, 2023 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants