-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework node copy task #1475
Rework node copy task #1475
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1475 +/- ##
==========================================
- Coverage 92.16% 91.79% -0.37%
==========================================
Files 669 661 -8
Lines 23676 23163 -513
==========================================
- Hits 21821 21263 -558
- Misses 1855 1900 +45
... and 23 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
logger->info("Done initializing in memory columns."); | ||
} | ||
|
||
void NodeCopyExecutor::populateColumnsAndLists(processor::ExecutionContext* executionContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to go back and change these functions once a similar refactor is applied to RelCopier. This primary index logic should be specific to NodeCopyExecutor. Leave a TODO or open an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is specific to NodeCopyExecutor. I think #971 is the issue.
auto nodeCopier = std::make_unique<NPYNodeCopier<T>>(std::move(sharedState), | ||
pkIndex.get(), copyDescription, columns, i, | ||
i == propertyIDToColumnIDMap[primaryKey.propertyID] ? 0 : INVALID_COLUMN_ID); | ||
tasks.push_back( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can probably pull task.push_back out of the case switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, npy is specially handled right now, as each file is assigned a task.
|
||
inline std::unique_ptr<NodeCopier<T>> clone() const override { | ||
return std::make_unique<CSVNodeCopier<T>>( | ||
this->sharedState, this->pkIndex, this->copyDesc, this->columns, this->pkColumnID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this->
and for other copiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
is to access base class fields here.
common::CSVReaderConfig* csvReaderConfig; | ||
catalog::TableSchema* tableSchema; | ||
std::shared_ptr<arrow::csv::StreamingReader> reader; | ||
std::string currentFilePath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
track fileIdx instead of path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Actually I don' think I need to track this. Removed now.
This PR reworks node copy task to be consistent with the logic of processor task. Also, unifies csv, parquet and npy to go through the same task system.