-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor copy node #1590
Refactor copy node #1590
Conversation
acquamarin
commented
May 29, 2023
•
edited
Loading
edited
- Refactors the copy node pipeline, so we can reuse the existing processor task pipeline to execute copy node task.
- CopyNode queries will be compiled to two pipelines where the first pipeline does the actual copy and the second pipeline returns the copy message.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1590 +/- ##
==========================================
+ Coverage 91.66% 91.76% +0.10%
==========================================
Files 716 714 -2
Lines 26042 25907 -135
==========================================
- Hits 23871 23774 -97
+ Misses 2171 2133 -38
☔ View full report in Codecov by Sentry. |
src/include/planner/logical_plan/logical_operator/logical_copy.h
Outdated
Show resolved
Hide resolved
src/include/planner/logical_plan/logical_operator/logical_copy.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think maybe we should separate changes on ValueVector (arrow aux) and Copy operators. Both require more works as I can see. We can get the ValueVector part done first.
src/common/types/types.cpp
Outdated
@@ -249,6 +249,9 @@ void LogicalType::setPhysicalType() { | |||
case LogicalTypeID::STRUCT: { | |||
physicalType = PhysicalTypeID::STRUCT; | |||
} break; | |||
case LogicalTypeID::ARROW_DATA: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't like a good way to differentiate arrow array from others. Can we introduce something like a AuxiliaryDataType
?
@@ -87,6 +87,9 @@ uint32_t ValueVector::getDataTypeSize(const LogicalType& type) { | |||
case LogicalTypeID::VAR_LIST: { | |||
return sizeof(list_entry_t); | |||
} | |||
case LogicalTypeID::ARROW_COLUMN: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think this is a good idea. It doesn't make sense to me to have ARROW column/array as a logical type.
@@ -1,5 +1,6 @@ | |||
#pragma once | |||
|
|||
#include "arrow/array.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think of a way to get rid of this include. I hope we can go through common/arrow
. One way is to have a zero-copy conversion in copier from arrow's arrow_array to common's arrow_array.
@@ -43,6 +44,14 @@ class StructAuxiliaryBuffer : public AuxiliaryBuffer { | |||
std::vector<std::shared_ptr<ValueVector>> childrenVectors; | |||
}; | |||
|
|||
class ArrowColumnAuxiliaryBuffer : public AuxiliaryBuffer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best way to implement the arrow auxiliary is that we store the arrow array in the aux to keep its lifetime, and point to the aux array when possible from ValueVector. Let's discuss this a bit more offline.