Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet writer #2177

Merged
merged 1 commit into from
Oct 13, 2023
Merged

Parquet writer #2177

merged 1 commit into from
Oct 13, 2023

Conversation

acquamarin
Copy link
Collaborator

@acquamarin acquamarin commented Oct 10, 2023

Performance number on ldbc100 comment:

kuzu> copy (match (c:Comment) return c.ID, c.locationIP, c.browserUsed, c.content, c.length) to '/tmp/33.parquet';

-
(0 tuples)
Time: 10.81ms (compiling), 52615.40ms (executing)

@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Attention: 138 lines in your changes are missing coverage. Please review.

Comparison is base (abca5e6) 89.92% compared to head (d634e8d) 89.40%.
Report is 17 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2177      +/-   ##
==========================================
- Coverage   89.92%   89.40%   -0.52%     
==========================================
  Files         989     1007      +18     
  Lines       35637    36241     +604     
==========================================
+ Hits        32048    32403     +355     
- Misses       3589     3838     +249     
Files Coverage Δ
...rc/include/processor/operator/persistent/copy_to.h 100.00% <100.00%> (+6.66%) ⬆️
...de/processor/operator/persistent/copy_to_parquet.h 100.00% <100.00%> (ø)
...nclude/processor/operator/persistent/file_writer.h 100.00% <100.00%> (ø)
.../operator/persistent/reader/parquet/thrift_tools.h 59.70% <ø> (ø)
...persistent/writer/parquet/standard_column_writer.h 100.00% <100.00%> (ø)
...r/persistent/writer/parquet/struct_column_writer.h 100.00% <100.00%> (ø)
...persistent/writer/parquet/var_list_column_writer.h 100.00% <100.00%> (ø)
src/include/processor/operator/physical_operator.h 100.00% <ø> (ø)
src/include/processor/result/factorized_table.h 96.55% <ø> (ø)
src/optimizer/factorization_rewriter.cpp 100.00% <100.00%> (ø)
... and 18 more

... and 147 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@acquamarin acquamarin marked this pull request as ready for review October 11, 2023 22:40
Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In this PR, can we add some benchmark numbers on exporting to parquet file? perhaps on LDBC100 comment table.
  2. In following PRs, We should rewrite our testing for converting csv to parquet to use COPY TO parquet, which should also increase test coverage for parquet writer.

@acquamarin acquamarin merged commit f18499f into master Oct 13, 2023
11 checks passed
@acquamarin acquamarin deleted the parquet-writer branch October 13, 2023 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants