Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate NPY copy into the existing COPY pipeline #1670

Closed
ray6080 opened this issue Jun 13, 2023 · 0 comments
Closed

Integrate NPY copy into the existing COPY pipeline #1670

ray6080 opened this issue Jun 13, 2023 · 0 comments
Assignees

Comments

@ray6080
Copy link
Contributor

ray6080 commented Jun 13, 2023

We support npy as a file format to be copied into node tables, specifically into fixed size array columns.
Currently, npy copy goes through NodeCopyExecutor, which is separate from our query processor. While copying of csv and parquet files goes through the query processor (in the issue title, I call it "COPY pipeline").
For simplicity of code, we should move npy copy to go through the query processor too.
Specifically, we need to remove NodeCopyExecutor, and move the logic of copying npy files to CopyNode and ReadFile operators.

Two main issues need to be addressed for this change:

  1. Change NpyReader to read a DataChunk at a time, so it can fit into ReadFile opeartor. The use of mmap can be removed since we read sequentially, which should be fine from the perspective of IO performance.
  2. NPY copy follows the syntax of COPY BY COLUMN, thus one file corresponds to one column, which is quite different from CSV and Parquet files. CSV and Parquet files cover all columns in a single file. So CopyNode should be aware of that npy is copying a DataChunk into one column at a time, instead of all columns.
@ray6080 ray6080 changed the title Integrate NPY copy into existing COPY pipeline Integrate NPY copy into the existing COPY pipeline Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants