Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast_slow_proxy: Don't import assert_eq at top-level #16063

Merged
merged 8 commits into from
Jun 25, 2024

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Jun 19, 2024

Description

The testing._utils module imports pytest, which is not advertised as a default run dependency of cudf, so we must avoid importing it in the proxy wrappers at top-level.

Since what we need in the proxy wrappers for pandas debugging is the assert_eq function (which does not need pytest), move it to testing.testing (where it more naturally fits with the other assertion functions anyway). This removes the need for pytest when running the fast-slow-proxy wrappers.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wence- wence- requested a review from a team as a code owner June 19, 2024 15:28
@wence- wence- requested review from bdice and Matt711 June 19, 2024 15:28
@github-actions github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Jun 19, 2024
@bdice
Copy link
Contributor

bdice commented Jun 20, 2024

The assert_eq function does not use pytest. I think a better fix here is to rework the structure of testing/_utils.py and move assert_eq to a separate file that doesn't require pytest (or move the bits of testing/_utils.py that require pytest into another location).

@wence-
Copy link
Contributor Author

wence- commented Jun 24, 2024

The assert_eq function does not use pytest. I think a better fix here is to rework the structure of testing/_utils.py and move assert_eq to a separate file that doesn't require pytest (or move the bits of testing/_utils.py that require pytest into another location).

I have moved assert_eq into cudf/testing/testing.py (which doesn't need pytest), this aligns with the other assert_XXX_equal functions that were already there. cc2fbf7 then migrates all of the test files to use this new import location.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a better solution, I think. The one change worth noting is that cudf.testing.assert_eq is now a public API. (It’s not in the private _utils.py.) Do we need to add it to the API docs?

python/cudf/cudf/pandas/fast_slow_proxy.py Outdated Show resolved Hide resolved
@wence-
Copy link
Contributor Author

wence- commented Jun 24, 2024

This is a better solution, I think. The one change worth noting is that cudf.testing.assert_eq is now a public API. (It’s not in the private _utils.py.) Do we need to add it to the API docs?

I have added (and formatted/extended the docstring).

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wence-!

@wence-
Copy link
Contributor Author

wence- commented Jun 24, 2024

/merge

@wence- wence- requested a review from a team as a code owner June 24, 2024 15:30
@wence- wence- added bug Something isn't working non-breaking Non-breaking change labels Jun 24, 2024
@vyasr
Copy link
Contributor

vyasr commented Jun 24, 2024

We moved assert_eq it to _utils to avoid it appearing public because other libraries (like cugraph) were relying on its behavior in ways that we didn't want to guarantee stability for (see e.g. rapidsai/cugraph#1693). Do we think that has changed? Is it an API that we actually want to support other libraries using?

If so, we may still want to consider using a different module so that cudf.testing remains a reasonable mirror of pandas.testing instead of including an API for which pandas has no counterpart.

(this isn't important enough to me to be worth blocking this PR though)

@bdice
Copy link
Contributor

bdice commented Jun 24, 2024

We moved assert_eq it to _utils to avoid it appearing public because other libraries (like cugraph) were relying on its behavior in ways that we didn't want to guarantee stability for (see e.g. rapidsai/cugraph#1693). Do we think that has changed? Is it an API that we actually want to support other libraries using?

I did not recall that history, thanks for the reminder.

I am slightly in favor of having a function like assert_eq as a public API. It seems like a crucial utility for libraries consuming cudf, especially when writing tests. I do not want for libraries like cuml or cugraph to have to reinvent such functionality just to be able to test their own libraries when consuming/producing cudf DataFrame or Series objects. Perhaps there is some middle ground where we have some public testing utilities in a module like cudf.testing.utils (to help separate it from being an exact mirror of pandas.testing) and some private testing utilities in cudf.testing._utils (meant only for cudf's own tests)? I don't know. This PR feels like it is moving in the right general direction so I'm happy to merge it in the current state.

@rapids-bot rapids-bot bot merged commit 1bc1f45 into rapidsai:branch-24.08 Jun 25, 2024
75 checks passed
@vyasr
Copy link
Contributor

vyasr commented Jun 25, 2024

The question is what APIs other libraries need. There are already APIs for comparing DataFrame and Series objects (like assert_frame_equal), and we have to_pandas methods to generate pandas objects for comparison. assert_eq is really an internal convenience because we have historically been inconsistent about argument orders and what types are fed in and many APIs only partially supported proper metadata so things like dtypes could get lost. assert_eq will even do things like trying to automatically handle numpy arrays. I'd prefer not to export that baggage.

@wence- wence- deleted the wence/fix/16062 branch June 25, 2024 19:42
pentschev added a commit to pentschev/dask-cuda that referenced this pull request Jun 26, 2024
rapidsai/cudf#16063 has updated the import
location of `assert_eq` to the public `cudf.testing.assert_eq`, this
change updates imports accordingly.
pentschev added a commit to pentschev/ucx-py that referenced this pull request Jun 26, 2024
rapidsai/cudf#16063 has updated the import
location of `assert_eq` to the public `cudf.testing.assert_eq`, this
change updates imports accordingly.
pentschev added a commit to pentschev/ucx-py that referenced this pull request Jun 26, 2024
rapidsai/cudf#16063 has updated the import
location of `assert_eq` to the public `cudf.testing.assert_eq`, this
change updates imports accordingly.
pentschev added a commit to pentschev/ucxx that referenced this pull request Jun 26, 2024
rapidsai/cudf#16063 has updated the import
location of `assert_eq` to the public `cudf.testing.assert_eq`, this
change updates imports accordingly.
@wence-
Copy link
Contributor Author

wence- commented Jun 26, 2024

We moved assert_eq it to _utils to avoid it appearing public because other libraries (like cugraph) were relying on its behavior in ways that we didn't want to guarantee stability for (see e.g. rapidsai/cugraph#1693). Do we think that has changed? Is it an API that we actually want to support other libraries using?

Turns out this wasn't enough of a deterrence:

@bdice
Copy link
Contributor

bdice commented Jun 26, 2024

If we want other libraries to adopt a certain pattern (like not using assert_eq), we should first adopt that pattern in cudf’s own tests. Many other libraries learn how to write tests by reading cudf’s tests, so we might as well make this a public utility.

rapids-bot bot pushed a commit to rapidsai/ucx-py that referenced this pull request Jun 26, 2024
rapidsai/cudf#16063 has updated the import location of `assert_eq` to the public `cudf.testing.assert_eq`, this change updates imports accordingly.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #1050
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this pull request Jun 26, 2024
rapidsai/cudf#16063 has updated the import location of `assert_eq` to the public `cudf.testing.assert_eq`, this change updates imports accordingly.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #1353
rapids-bot bot pushed a commit to rapidsai/ucxx that referenced this pull request Jun 26, 2024
rapidsai/cudf#16063 has updated the import location of `assert_eq` to the public `cudf.testing.assert_eq`, this change updates imports accordingly.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Mads R. B. Kristensen (https://github.com/madsbk)

URL: #239
@vyasr
Copy link
Contributor

vyasr commented Jun 26, 2024

There's nothing wrong with having utilities that are only usable within cudf's tests and not elsewhere. That's not happening here in large part because the utility is actually part of the cudf package. If we instead implemented in conftest or a utils module within the tests directory itself, both of which are fairly common patterns, then that would fit the expectations of external users looking at cudf's tests for guidance.

rapids-bot bot pushed a commit to rapidsai/cuspatial that referenced this pull request Jul 8, 2024
An upstream change in cudf uncovered a bug in `_GeoSeriesUtility._from_data` where `False` was being passed to `cudf.Series(index=)` which is an invalid value. The only valid `index` values are an actual `cudf.Index` or `None`, so setting to `None` as a more appropriate default value

Also updates the location of `assert_eq` which moved in rapidsai/cudf#16063

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - Paul Taylor (https://github.com/trxcllnt)
  - Mark Harris (https://github.com/harrism)
  - Bradley Dice (https://github.com/bdice)

URL: #1400
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf.pandas Issues specific to cudf.pandas non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] cuDF Pandas now requires pytest at runtime
4 participants