Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPAT: Utilize copy keyword in __array__ #57739

Open
mroeschke opened this issue Mar 5, 2024 · 8 comments · May be fixed by #60046
Open

COMPAT: Utilize copy keyword in __array__ #57739

mroeschke opened this issue Mar 5, 2024 · 8 comments · May be fixed by #60046
Labels
Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@mroeschke
Copy link
Member

#57172 add a copy keyword that will be passed in with __array__ in Numpy 2.0. Currently it is unused but it would be useful to respect that option when passed

@mroeschke mroeschke added the Compat pandas objects compatability with Numpy or Python functions label Mar 5, 2024
@jorisvandenbossche jorisvandenbossche added this to the 3.0 milestone Mar 6, 2024
@jorisvandenbossche
Copy link
Member

Linking the numpy issue I opened to discuss the expected semantics for this: numpy/numpy#25941 (in several places we now also ignore the dtype keyword, but for copy that's something we shouldn't do)

@rhshadrach
Copy link
Member

I believe this came up again in a new issue. Should this be a blocker for 3.0?

@mroeschke
Copy link
Member Author

mroeschke commented Oct 1, 2024

IMO I don't think this "bug" should block a release, but would be very nice to fix for 3.0

@mg3146
Copy link

mg3146 commented Oct 2, 2024

Hi - this is caused fairly large blocking issues in a production environment for us. If there was a way to roll a fix out , that would be great...

@seberg
Copy link
Contributor

seberg commented Oct 3, 2024

This is a very simple thing to hot-fix @mg3146 you could probably do a PR yourself even. I agree it is a relatively large issue (because it can lead to silently modified dataframes on existing code).
Making a defensive copy may not be always ideal, but is totally fine. Only copy is False might be tricky, but copy=False is a new feature, so hopefully not a big deal in practice.

If you can't (or do not want) guess copy=False correctly, I think it is fine to just reject it always. That might be a small argument for fixing it in 3.0: avoid breaking code that thinks copy=False work when you turn it into an error.

@bashtage
Copy link
Contributor

bashtage commented Oct 8, 2024

This bug is affecting statsmodels. Internally we mostly use ndarrays for predictability, and then wrap results as needed for pandas inputs. To do this, we often call np.array(list | DataFrame | Series | nparray, copy=True) when it is efficient that we own a copy of the data, e.g, when needing to sort. Since pandas does not return a copy, we end up modifying the original data, which is highly undesirable.

@mg3146
Copy link

mg3146 commented Oct 11, 2024

This is a very simple thing to hot-fix @mg3146 you could probably do a PR yourself even. I agree it is a relatively large issue (because it can lead to silently modified dataframes on existing code). Making a defensive copy may not be always ideal, but is totally fine. Only copy is False might be tricky, but copy=False is a new feature, so hopefully not a big deal in practice.

If you can't (or do not want) guess copy=False correctly, I think it is fine to just reject it always. That might be a small argument for fixing it in 3.0: avoid breaking code that thinks copy=False work when you turn it into an error.

sorry - I honestly dont know what half this means...

@ajfriend
Copy link

This bug is affecting matplotlib code. Computing a colormap on a Series object has the unexpected behavior that the Series object gets modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants