Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy node tables from npy files #1396

Merged
merged 1 commit into from
Mar 28, 2023
Merged

Copy node tables from npy files #1396

merged 1 commit into from
Mar 28, 2023

Conversation

mewim
Copy link
Collaborator

@mewim mewim commented Mar 21, 2023

No description provided.

@mewim mewim force-pushed the copy-npy-1 branch 2 times, most recently from 46ebc4b to f5a2d4c Compare March 21, 2023 17:50
@mewim mewim changed the title Copy node tables from .npy files Copy node tables from npy files Mar 21, 2023
src/binder/bind/bind_copy.cpp Outdated Show resolved Hide resolved
src/binder/bind/bind_copy.cpp Outdated Show resolved Hide resolved
src/include/storage/copier/copy_node_npy.h Show resolved Hide resolved
src/include/storage/copier/npy_reader.h Outdated Show resolved Hide resolved
src/parser/transformer.cpp Outdated Show resolved Hide resolved
src/storage/copier/copy_node_npy.cpp Outdated Show resolved Hide resolved
src/storage/copier/npy_reader.cpp Show resolved Hide resolved
test/copy/copy_npy_test.cpp Outdated Show resolved Hide resolved
@ray6080 ray6080 self-requested a review March 23, 2023 01:48
@mewim mewim force-pushed the copy-npy-1 branch 3 times, most recently from 803d7c2 to b34614f Compare March 27, 2023 03:35
@codecov
Copy link

codecov bot commented Mar 27, 2023

Codecov Report

Patch coverage: 93.07% and no project coverage change.

Comparison is base (3f574a4) 92.71% compared to head (d740b29) 92.71%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #1396    +/-   ##
========================================
  Coverage   92.71%   92.71%            
========================================
  Files         642      646     +4     
  Lines       22102    22293   +191     
========================================
+ Hits        20491    20669   +178     
- Misses       1611     1624    +13     
Impacted Files Coverage Δ
src/include/binder/binder.h 100.00% <ø> (ø)
src/include/common/copier_config/copier_config.h 100.00% <ø> (ø)
src/include/common/utils.h 100.00% <ø> (ø)
src/include/parser/transformer.h 100.00% <ø> (ø)
...processor/operator/order_by/order_by_key_encoder.h 100.00% <ø> (ø)
src/include/storage/copier/copy_rel_arrow.h 100.00% <ø> (ø)
src/include/storage/copier/copy_structures_arrow.h 100.00% <ø> (ø)
src/include/storage/copier/copy_task.h 100.00% <ø> (ø)
src/processor/operator/copy/copy_rel.cpp 100.00% <ø> (ø)
...ocessor/operator/order_by/order_by_key_encoder.cpp 94.82% <ø> (-0.03%) ⬇️
... and 13 more

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an end-to-end test for a npy node file?

src/common/copier_config/copier_config.cpp Outdated Show resolved Hide resolved
src/include/storage/copier/copy_node_arrow.h Outdated Show resolved Hide resolved
src/include/storage/copier/copy_node_arrow.h Outdated Show resolved Hide resolved
src/include/storage/copier/copy_node_npy.h Show resolved Hide resolved
src/processor/operator/copy/copy_node.cpp Outdated Show resolved Hide resolved
src/processor/operator/copy/copy_node.cpp Outdated Show resolved Hide resolved
src/storage/copier/copy_node_arrow.cpp Outdated Show resolved Hide resolved
third_party/pyparse/pyparse.h Show resolved Hide resolved
third_party/pyparse/pyparse.h Show resolved Hide resolved
src/include/storage/copier/copy_node_arrow.h Outdated Show resolved Hide resolved
@@ -17,7 +17,10 @@ oC_Cypher
: SP ? oC_AnyCypherOption? SP? ( oC_Statement | kU_DDL | kU_CopyCSV ) ( SP? ';' )? SP? EOF ;

kU_CopyCSV
: COPY SP oC_SchemaName SP FROM SP kU_FilePaths ( SP? '(' SP? kU_ParsingOptions SP? ')' )? ;
: COPY SP oC_SchemaName kU_CopyCSV_Properties? SP FROM SP kU_FilePaths (SP? ',' SP? kU_FilePaths)* ( SP? '(' SP? kU_ParsingOptions SP? ')' )? ;
Copy link
Contributor

@andyfengHKU andyfengHKU Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a different rule for COPY NPY since it has a different interpretation of multiple files (aligned on column), so let's have

kU_CopyCSV (btw open an issue about renaming all CopyCSV to Copy because we are copying different file formats now) as it is and

kU_CopyNPY whose grammar is

COPY SP oC_SchemaName SP FROM NPY SP kU_FilePaths

We don't specify columns any more, instead we expect file paths to match all columns as defined in DDL (so that we have the same constraint as copy)

: COPY SP oC_SchemaName SP FROM SP kU_FilePaths ( SP? '(' SP? kU_ParsingOptions SP? ')' )? ;
: COPY SP oC_SchemaName kU_CopyCSV_Properties? SP FROM SP kU_FilePaths (SP? ',' SP? kU_FilePaths)* ( SP? '(' SP? kU_ParsingOptions SP? ')' )? ;

kU_CopyCSV_Properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

inline std::string getTableName() const { return tableName; }
inline std::unordered_map<std::string, std::unique_ptr<ParsedExpression>> const*
getParsingOptions() const {
return &parsingOptions;
}

private:
std::vector<std::string> filePaths;
std::vector<std::vector<std::string>> filePaths;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file can be reverted.

std::move(tableName), std::move(parsingOptions));
}

std::vector<std::vector<std::string>> Transformer::transformFilePaths(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this. And you will have a transformCopyNpy instead.

}
return boundFilePaths;
}

std::unordered_map<common::property_id_t, std::string> Binder::bindPropertyToNpyMap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is up to u. But I think you can simply bind to vector so that it aligns with the order in schema. Also we don't need to support multiple npy files to 1 column as an experimental feature.

@mewim mewim force-pushed the copy-npy-1 branch 6 times, most recently from 54c57db to d41588d Compare March 28, 2023 00:45
@mewim mewim merged commit 443269c into master Mar 28, 2023
@mewim mewim deleted the copy-npy-1 branch March 28, 2023 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants