Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST/API: test column indexing copy/view semantics #35906

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Aug 26, 2020

This is adding some tests for current behaviour related to the discussions in #33457 and #35417 (and also adds the tests from #35266 which got never merged). I think it is good to have some more explicit tests of current behaviour, while we are considering changing things.

Note: some semantics I am testing are certainly debatable, but I was experimenting a bit with what is happening on current master.

)
original_arr = df.EA.array
subset = df[["int", "EA"]]
assert subset.EA.array is original_arr
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On pandas 1.0 and older, this was apparently taking a copy of the EA (we noticed this in a geopandas test where we were testing against multiple pandas versions)

col = df["int"]
with pd.option_context("chained_assignment", "warn"):
with tm.assert_produces_warning(pd.core.common.SettingWithCopyWarning):
col[0] = 10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights a difference between a column backed by ExtensionBlock or consolidated block -> with EA, we don't raise a SettingWithCopyWarning for this case.

(I am personally actually fine with how it behaves for EA)

@jorisvandenbossche
Copy link
Member Author

@jbrockmendel
Copy link
Member

some semantics I am testing are certainly debatable, but I was experimenting a bit with what is happening on current master.

I'm generally positive on this PR, lean towards only testing explicitly-desired behavior

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Aug 26, 2020

lean towards only testing explicitly-desired behavior

And do you find the currently tested behaviours desired?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stylistic nit but I think the tests are in the right direction

"EA": pd.array([1, 2, None], dtype="Int64"),
}
)
original_arr = df.EA.array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use bracket notation instead of dot?

@WillAyd WillAyd added ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 26, 2020
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glancing through the tests I think I agree with the expected behavior asserted by them.

expected = pd.array([10, 2, None], dtype="Int64")
tm.assert_extension_array_equal(subset["EA"].array, expected)

# TODO this way it doesn't modify subset - is this expected?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure here. The (or an) issue is that we allow setting to change the dtype, which for EAs means changing the type. So there's no way for

s = pd.Series(pd.array([1, 2]))
s.iloc[0] = 0.5

to mutate the array inplace, and so I wouldn't expect your subset to be mutated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That example currently raises an error, because mutating inplace for EAs doesn't allow changing the dtype (or at least for the nullable EAs, and which is a difference with normal dtypes that we should actually decide if this is desired (I think so) and if so, test and document as a difference)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -706,6 +706,15 @@ def test_iloc_setitem_categorical_updates_inplace(self):
expected = pd.Categorical(["C", "B", "A"])
tm.assert_categorical_equal(cat, expected)

# __setitem__ under the other hand does not work in-place
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this another test?

# TODO this way it doesn't modify subset - is this expected?
# df.iloc[0, 3] = 10
# expected = pd.array([10, 2, None], dtype="Int64")
# tm.assert_extension_array_equal(subset['EA'].array, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u make this a separate test and xfail (agree we should prob allow this)

col = df["EA"]
assert col.array is original_arr
col[0] = 10
expected = pd.array([10, 2, None], dtype="Int64")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this mutates the original? can u assert that as well

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2020

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 8, 2020

df[1] = cat[::-1]

expected = pd.Categorical(["A", "B", "C"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check whether the array that ends up in df[1] is the same object on the RHS of L713 (Block.setitem is weird)

# ensure original array was not modified
assert original_arr is not df.EA.array
expected = pd.array([1, 2, None], dtype="Int64")
tm.assert_extension_array_equal(original_arr, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also check whether df["EA"] views the same data on the RHS of L1129 vs a copy

}
)
original_arr = df.EA.array
subset = df[["int", "EA"]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe parametrizing over several equivalent forms of doing this selection?

original_arr = df.EA.array
subset = df[["int", "EA"]]
assert subset.EA.array is original_arr
# check that they view the same data by modifying
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make a helper for this

@simonjayhawkins
Copy link
Member

@jorisvandenbossche can you merge master and address comments

@simonjayhawkins
Copy link
Member

@jorisvandenbossche closing as stale. reopen when ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants