Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove protobuf and use parsed ORC statistics from libcudf #15564

Merged
merged 12 commits into from
Apr 19, 2024

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Apr 18, 2024

Description

This PR removes the cuDF Python dependencies on protobuf and protoc-wheel. Closes #15511.

The only use case for the protobuf dependency was reading ORC file/stripe statistics. However, we have code in libcudf that can do this without requiring protobuf.

In this PR, we expose the C++ code for parsing ORC statistics from libcudf to Cython and remove all references to protobuf.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. CMake CMake build issue conda labels Apr 18, 2024
@bdice bdice added improvement Improvement / enhancement to an existing function breaking Breaking change labels Apr 18, 2024
@bdice bdice self-assigned this Apr 18, 2024
Copy link
Contributor Author

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few TODO notes for myself on what I see as the weakest points of this PR. @vyasr You may have feedback on these TODOs since they're Cython related. I proposed some next steps, let me know if they sound reasonable.

python/cudf/cudf/_lib/orc.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/orc.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/orc.pyx Show resolved Hide resolved
@bdice bdice added this to the ORC continuous improvement milestone Apr 18, 2024
@bdice bdice marked this pull request as ready for review April 18, 2024 17:52
@bdice bdice requested review from a team as code owners April 18, 2024 17:52
Copy link
Contributor

@KyleFromNVIDIA KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as CMake codeowner (only CMake change is deletion of a file)

Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving C++ changes

Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this Bradley! I have a couple of small suggestions/questions.

python/cudf/cudf/_lib/variant.pxd Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/variant.pxd Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/cpp/io/orc_metadata.pxd Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/orc.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/orc.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/orc.pyx Show resolved Hide resolved
python/cudf/cudf/_lib/orc.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/orc.pyx Outdated Show resolved Hide resolved
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm heading out for a bit and I know that you want to get this in before you leave for vacation, so I'm going to go ahead and approve. It looks like all of my suggestions have been addressed other than the list comprehensions, which is just a cleanup task. Please do it if you can, but I trust that whatever version you end up with will be good enough. Thanks for this PR! Removing protobuf like this will be great.

@bdice
Copy link
Contributor Author

bdice commented Apr 19, 2024

/merge

@rapids-bot rapids-bot bot merged commit d37636d into rapidsai:branch-24.06 Apr 19, 2024
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change CMake CMake build issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA] Improve protobuf compatibility
6 participants