-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding parquet transcoding example #15420
Adding parquet transcoding example #15420
Conversation
Thank you @mhaseeb123 for writing this example! It has made it easy for me to do a ton of testing with Kaggle data 🤩 There are a few benchmarking items that we might want to address in the example, maybe with a comment, maybe with code change.. up to you.
|
Thank you @GregoryKimball for the feedback. I am thinking of not timing the first read to avoid both RMM pool growth, and nvcomp and cufile pitfalls. I will add a CLI option to clear OS cache before second and subsequent reads plus another CLI option for optional size metadata soon. |
Thank you @mhaseeb123 for discussing the example. I think we will address 1,2,3 above by untiming the first read, and then resolve 4 by adding a writer option setting the sizes metadata to OFF |
/ok to test |
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few more suggestions, looks good otherwise 👍
Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com>
Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com>
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approved; just one optional suggestion
Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com>
/ok to test |
/ok to test |
/merge |
Description
This PR adds a new example
parquet_io
tolibcudf/cpp/examples
instrumenting reading and writing parquet files with different column encodings (same for all columns for now) and compressions to close #15344. The example maybe elaborated and/or evolved as needed. #15348 should be merged before this PR to get all CMake updates needed to successfully build and run this example.Checklist