Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: make setitem-inplace preserve dtype when possible with PandasArray, IntegerArray, FloatingArray #39044

Closed
wants to merge 23 commits into from

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

xref #38896 (doesn't close). In general df[:, "A"] = foo tries to operate in-place before falling back to casting. This makes sure we do that when foo is a PandasArray, IntegerArray, or FloatingArray

I guess could/should do the same for BooleanArray

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will really need to look closely. this is adding non-trivial code.

pandas/core/dtypes/missing.py Outdated Show resolved Hide resolved
pandas/core/dtypes/missing.py Show resolved Hide resolved
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 16, 2021
pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
pandas/tests/extension/test_floating.py Outdated Show resolved Hide resolved
@@ -28,6 +32,31 @@ def dtype(request):
return PandasDtype(np.dtype(request.param))


orig_setitem = pd.core.internals.Block.setitem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use monkeypatch instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does use monkeypatch. the monkeypatched method calls the original method

@jbrockmendel
Copy link
Member Author

rebased + green

@jreback jreback added this to the 1.3 milestone Jan 28, 2021
@jbrockmendel
Copy link
Member Author

ok maybe if you can add a doc-string to the function will be more obvious that we are trying to match the len here

docstring updated + green

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you summarize which behaviours are changed?
Eg from the description I would suppose that

df = pd.DataFrame({'a': [1, 2, 3]})
df.loc[:, 'a'] = pd.array([3, 4, 5])

changed behaviour to preserve the original dtype, but I don't see that directly tested?

Does this need a whatsnew?

@@ -193,7 +197,20 @@ class TestGetitem(base.BaseGetitemTests):


class TestSetitem(base.BaseSetitemTests):
pass
def test_setitem_series(self, data, full_indexer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indicate here why it is overriding the base class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added

if not data._mask.any():
# GH#38896 like we do with ndarray, we set the values inplace
# but cast to the new numpy dtype
expected = pd.Series(data.to_numpy(data.dtype.numpy_dtype), name="data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we converting to the numpy dtype here? That's also not the original dtype?

@jreback
Copy link
Contributor

jreback commented Feb 7, 2021

am ok with this, @jorisvandenbossche if any addl comments.

@jorisvandenbossche
Copy link
Member

There are some questions in my last review above for which I was still waiting on an answer

@jbrockmendel
Copy link
Member Author

let's stick a pin in this until #39163 goes in; it will be easier to address outstanding questions/comments at that point

@jbrockmendel
Copy link
Member Author

mothballing until after #39163

@jbrockmendel jbrockmendel added the Mothballed Temporarily-closed PR the author plans to return to label Feb 18, 2021
@jbrockmendel jbrockmendel removed the Mothballed Temporarily-closed PR the author plans to return to label Jul 16, 2021
@jbrockmendel jbrockmendel deleted the bug-38896 branch July 16, 2021 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants