Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework CSV_TO_PARQUET testing feature #2993

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

manh9203
Copy link
Collaborator

@manh9203 manh9203 commented Mar 4, 2024

This pull request reworked CSV_TO_PARQUET() tests in the testing framework to use Kuzu's customized parquet file writer.

Testing workflow

  1. Create a directory dataset/parquet_temp_dataset-name
  2. Copy schema.cypher to the created directory
  3. Read and parse copy.cypher, extract the csv file names and table's names
  4. Create a new copy.cypher in the created directory with the new COPY commands and paths
  5. Create a temporary database, load .csv files to that database then export to .parquet files in the directory
  6. Set dataset path to parquet temp directory
  7. Remove parquet temp directory after all tests run

Disabled tests

  • Since exporting fixed-list to parquet is not supported yet, the TinySnbParquet.TinySnbParquet test is being disabled.
  • LDBCTest.LDBCInteractiveShortParquet and LSQBTest.LSQBTestParquet are disabled because they take too long to run on the CI. I tested them on my laptop and they are both passed.
  • ExtensionTest is temporary disabled. I'll enable it after this PR is merged and the extension is rebuilt.

@manh9203 manh9203 requested a review from ray6080 March 4, 2024 19:34
@manh9203 manh9203 linked an issue Mar 4, 2024 that may be closed by this pull request
Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 93.30%. Comparing base (e76c6ea) to head (a321444).
Report is 18 commits behind head on master.

Files Patch % Lines
src/common/file_system/file_system.cpp 0.00% 1 Missing ⚠️
src/common/file_system/local_file_system.cpp 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2993      +/-   ##
==========================================
- Coverage   93.52%   93.30%   -0.23%     
==========================================
  Files        1119     1124       +5     
  Lines       42991    42913      -78     
==========================================
- Hits        40209    40041     -168     
- Misses       2782     2872      +90     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

test/test_files/extension/extension.test Outdated Show resolved Hide resolved
test/test_files/tinysnb/parquet/tinysnb_parquet.test Outdated Show resolved Hide resolved
test/test_runner/csv_to_parquet_converter.cpp Outdated Show resolved Hide resolved
test/test_runner/csv_to_parquet_converter.cpp Show resolved Hide resolved
test/test_runner/csv_to_parquet_converter.cpp Show resolved Hide resolved
test/test_runner/csv_to_parquet_converter.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix the failed CI first? and don't forget to squash and rebase before merging it.

@manh9203 manh9203 force-pushed the rework-csv-to-parquet-on-tests branch from fc2de17 to a321444 Compare March 6, 2024 18:45
@manh9203 manh9203 merged commit 9e23995 into master Mar 6, 2024
14 of 15 checks passed
@manh9203 manh9203 deleted the rework-csv-to-parquet-on-tests branch March 6, 2024 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rework CSV_TO_PARQUET testing feature to use COPY TO PARQUET
2 participants