Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework node copy task #1475

Merged
merged 1 commit into from
Apr 20, 2023
Merged

Rework node copy task #1475

merged 1 commit into from
Apr 20, 2023

Conversation

ray6080
Copy link
Contributor

@ray6080 ray6080 commented Apr 20, 2023

This PR reworks node copy task to be consistent with the logic of processor task. Also, unifies csv, parquet and npy to go through the same task system.

@codecov
Copy link

codecov bot commented Apr 20, 2023

Codecov Report

Patch coverage: 93.84% and project coverage change: -0.37 ⚠️

Comparison is base (6def875) 92.16% compared to head (5dd5ef2) 91.79%.

❗ Current head 5dd5ef2 differs from pull request most recent head cb3fc1f. Consider uploading reports for the commit cb3fc1f to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1475      +/-   ##
==========================================
- Coverage   92.16%   91.79%   -0.37%     
==========================================
  Files         669      661       -8     
  Lines       23676    23163     -513     
==========================================
- Hits        21821    21263     -558     
- Misses       1855     1900      +45     
Impacted Files Coverage Δ
src/include/common/copier_config/copier_config.h 100.00% <ø> (ø)
src/include/common/types/types.h 75.00% <ø> (ø)
src/include/parser/copy_csv/copy_csv.h 100.00% <ø> (ø)
src/include/processor/operator/copy/copy.h 57.14% <ø> (ø)
...age/in_mem_storage_structure/in_mem_column_chunk.h 100.00% <ø> (ø)
...ge/in_mem_storage_structure/in_mem_node_column.cpp 100.00% <ø> (ø)
src/storage/copier/npy_reader.cpp 88.76% <80.55%> (+0.70%) ⬆️
src/storage/copier/node_copy_executor.cpp 90.16% <90.16%> (ø)
src/include/storage/copier/node_copier.h 91.83% <91.30%> (-8.17%) ⬇️
src/storage/copier/node_copier.cpp 93.38% <96.66%> (+3.68%) ⬆️
... and 13 more

... and 23 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/include/parser/copy_csv/copy_csv.h Outdated Show resolved Hide resolved
logger->info("Done initializing in memory columns.");
}

void NodeCopyExecutor::populateColumnsAndLists(processor::ExecutionContext* executionContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to go back and change these functions once a similar refactor is applied to RelCopier. This primary index logic should be specific to NodeCopyExecutor. Leave a TODO or open an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is specific to NodeCopyExecutor. I think #971 is the issue.

src/storage/copier/node_copy_executor.cpp Outdated Show resolved Hide resolved
src/storage/copier/node_copy_executor.cpp Outdated Show resolved Hide resolved
src/storage/copier/node_copy_executor.cpp Outdated Show resolved Hide resolved
auto nodeCopier = std::make_unique<NPYNodeCopier<T>>(std::move(sharedState),
pkIndex.get(), copyDescription, columns, i,
i == propertyIDToColumnIDMap[primaryKey.propertyID] ? 0 : INVALID_COLUMN_ID);
tasks.push_back(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably pull task.push_back out of the case switch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, npy is specially handled right now, as each file is assigned a task.

src/include/storage/copier/node_copier.h Outdated Show resolved Hide resolved

inline std::unique_ptr<NodeCopier<T>> clone() const override {
return std::make_unique<CSVNodeCopier<T>>(
this->sharedState, this->pkIndex, this->copyDesc, this->columns, this->pkColumnID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this-> and for other copiers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to access base class fields here.

common::CSVReaderConfig* csvReaderConfig;
catalog::TableSchema* tableSchema;
std::shared_ptr<arrow::csv::StreamingReader> reader;
std::string currentFilePath;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

track fileIdx instead of path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Actually I don' think I need to track this. Removed now.

@ray6080 ray6080 marked this pull request as ready for review April 20, 2023 14:48
@ray6080 ray6080 merged commit eca8fc9 into master Apr 20, 2023
@ray6080 ray6080 deleted the copy-rework branch April 20, 2023 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants