Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize patching to support vendoring #38

Merged
merged 16 commits into from
Apr 4, 2024

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Apr 4, 2024

This PR makes a number of significant changes to the patching infrastructure:

  1. This PR reorganizes the patching logic to be based on a more principled approach. Rather than maintaining lists of patch functions that are each responsible for filtering modules to apply themselves to, patches are organized in the patches directory in a tree structure matching dask itself. Patches are found and run by importing the same relative paths within the patches directory corresponding to a particular dask or distributed module.
  2. It adds proper support for patching submodules. Previously the loader was being disabled whenever a real dask module was being imported, but this is problematic because if some dask modules import others they will pre-populate sys.modules with the real modules and therefore the loader will never be used for loading a patched version of the submodule.
  3. Patches are no longer just functions applied to modules, they are arbitrary functions executed when a module is imported. As a result, a wider range of modifications is possible than was previously allowed. In particular:
  4. The more general functions allow the entire module import to be hijacked and redirected to other modules.
  5. The new framework is used to vendor a patched version of the accessor.py module in dask, which resolves the issues observed in Fix dask.dataframe import error for Python 3.11.9 dask/dask#11035.

@vyasr vyasr added feature request New feature or request non-breaking Introduces a non-breaking change labels Apr 4, 2024
@vyasr vyasr self-assigned this Apr 4, 2024
@vyasr vyasr requested a review from a team as a code owner April 4, 2024 01:18
Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow - This is impressive. I'm a bit scared of the complexity here, but I doubt there is a much simpler way to achieve what we need.

rapids_dask_dependency/dask_loader.py Show resolved Hide resolved
@rjzamora
Copy link
Member

rjzamora commented Apr 4, 2024

I wonder if the following sequence of events make sense?

  1. Port/merge Check test exit codes. #37 to 24.06
  2. Test/review this PR a bit further (simple things are working for me locally)
  3. (optional?) Remove the __rdd_patch_accessor.py file from this PR, since it only applies to 24.04 (Edit: Lawrences suggestion accounts for this)
  4. Backport all patching/testing to 24.04
  5. (optional?) Add __rdd_patch_accessor.py file to 24.04

Copy link

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this and have only one minor suggestion to restrict the scope of patching the vendored file.

rapids_dask_dependency/patches/dask/dataframe/accessor.py Outdated Show resolved Hide resolved
rapids_dask_dependency/dask_loader.py Show resolved Hide resolved
@rjzamora rjzamora mentioned this pull request Apr 4, 2024
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, now that the patch is applied only to the affected Python/Dask versions. (I have not verified that this passes locally.)

…ssor.py

Co-authored-by: Richard (Rick) Zamora <rzamora217@gmail.com>
Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is heroic work @vyasr - Thanks for saving the day!

@rjzamora
Copy link
Member

rjzamora commented Apr 4, 2024

\merge

@rjzamora rjzamora merged commit c1a5fb8 into rapidsai:branch-24.06 Apr 4, 2024
5 checks passed
@raydouglass
Copy link
Member

This PR was reverted via force push so that it can be merged via the bot instead (in #39).

@vyasr vyasr deleted the feat/generalized_patching branch April 4, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants