Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain inplace operation on series #36498

Conversation

samilAyoub
Copy link
Contributor

@samilAyoub samilAyoub commented Sep 20, 2020

@pep8speaks
Copy link

pep8speaks commented Sep 20, 2020

Hello @samilAyoub! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-25 19:30:33 UTC

@dsaxton dsaxton added Copy / view semantics Regression Functionality that used to work in a prior pandas version labels Sep 20, 2020
@dsaxton dsaxton added this to the 1.1.3 milestone Sep 20, 2020
@@ -93,8 +93,6 @@ def _wrap_inplace_method(method):

def f(self, other):
result = method(self, other)
# Delete cacher
self._reset_cacher()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added to fix an existing bug (see #30501) so removing it would likely cause a regression

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this have been the incorrect location for the original fix? it appears there were a couple of iterations in #30501.

@@ -1297,3 +1297,10 @@ def test_dataframe_div_silenced():
)
with tm.assert_produces_warning(None):
pdf1.div(pdf2, fill_value=0)


def test_inplace_arithmetic_operation_on_series_updating_parent_dataframe():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test passes for me on master. is there another case that fails?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think need to test df["A"] is s which is also True on 1.1.2

>>> pd.__version__
'1.1.2'
>>>
>>> import pandas.testing as tm
>>>
>>> df = pd.DataFrame({"A": [1, 2, 3]})
>>> s = df["A"]
>>> s += 1
>>> tm.assert_series_equal(df["A"], s)
>>>
>>> s
0    2
1    3
2    4
Name: A, dtype: int64
>>>
>>> df["A"]
0    2
1    3
2    4
Name: A, dtype: int64
>>>
>>> df
   A
0  1
1  2
2  3
>>> df["A"] is s
True
>>>

but the output of __repr__ and also other ops is incorrect

so should also compare using assert_frame_equal

>>> tm.assert_frame_equal(df, pd.DataFrame({"A": [2, 3, 4]}))
Traceback (most recent call last):
...
AssertionError: DataFrame.iloc[:, 0] (column name="A") are different

DataFrame.iloc[:, 0] (column name="A") values are different (100.0 %)
[index]: [0, 1, 2]
[left]:  [1, 2, 3]
[right]: [2, 3, 4]
>>>

but this should pass with the fix here

@jbrockmendel
Copy link
Member

It looks like the underlying problem is

ser = pd.Series([1, 2, 3])
vals = ser._values
ser += 1
assert ser._values is vals   # <-- Fails!

@simonjayhawkins
Copy link
Member

It looks like the underlying problem is

but that also occurs in 1.0.5 (and on 0.25.3)

>>> pd.__version__
'1.0.5'
>>>
>>> ser = pd.Series([1, 2, 3])
>>> vals = ser._values
>>>
>>> ser += 1
>>> assert ser._values is vals  # <-- Fails!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError

so maybe not directly related to regression?

@@ -888,22 +888,6 @@ def test_identity_slice_returns_new_object(self):
original_series[:3] = [7, 8, 9]
assert all(sliced_series[:3] == [7, 8, 9])

def test_loc_copy_vs_view(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm pls revert this test.

@simonjayhawkins
Copy link
Member

@jbrockmendel I think ready if you can have a look. Thanks.

@jbrockmendel
Copy link
Member

It looks like the underlying problem is

but that also occurs in 1.0.5 (and on 0.25.3)

So fixing this would fix two bugs instead of one. Have you taken a look at what it would take to fix this? I suspect it may be simpler than the implementation here.

df = pd.DataFrame({"A": [1, 2, 3]})
s = df["A"]
s += 1
assert s is df["A"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also assert df['A'].to_numpy().base is s.to_numpy().base

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we do this we no longer need the cacher machinery. At first glance his appears a big change so will open a separate PR as an alternative to this one.

@@ -888,22 +888,6 @@ def test_identity_slice_returns_new_object(self):
original_series[:3] = [7, 8, 9]
assert all(sliced_series[:3] == [7, 8, 9])

def test_loc_copy_vs_view(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm pls revert this test.

@jreback
Copy link
Contributor

jreback commented Sep 26, 2020

@simonjayhawkins this may need some more work, don't block on for 1.1.3

@jreback jreback modified the milestones: 1.1.3, 1.1.4 Sep 26, 2020
@simonjayhawkins
Copy link
Member

It looks like the underlying problem is

but that also occurs in 1.0.5 (and on 0.25.3)

So fixing this would fix two bugs instead of one. Have you taken a look at what it would take to fix this?

It appears to me that the issue is that an inplace operation on the Series deletes the weakref to the dataframe, i.e. the fix in #30501 was incorrect

I suspect it may be simpler than the implementation here.

This is simple. revert the incorrect fix. remove the test that has since been added for a issue that was not actually fixed. and finally ensure that dropped columns are not added back to original dataframe (i.e. what imo should have been done in #30501 originally).

Including fixes for the deleted test, i.e. issue #15631 would only make the change more invasive?

I'm not sure I understand how a more invasive change would be a backport candidate.

@jbrockmendel
Copy link
Member

@simonjayhawkins your appraisal seems reasonable for backporting purposes. i'll take a look a the ser.values issue and see how that affects options for longer-term solutions.

@jreback
Copy link
Contributor

jreback commented Oct 14, 2020

@samilAyoub

can we just revert the original. (and pls merge master as well)

@jreback
Copy link
Contributor

jreback commented Oct 26, 2020

@simonjayhawkins this PR is stale, but I think if we revert the original commit is best here.

@simonjayhawkins
Copy link
Member

@simonjayhawkins this PR is stale, but I think if we revert the original commit is best here.

if we do that, I think we would still need to disable or remove test_loc_copy_vs_view see #15631 (comment)

@jreback
Copy link
Contributor

jreback commented Oct 26, 2020

@simonjayhawkins this PR is stale, but I think if we revert the original commit is best here.

if we do that, I think we would still need to disable or remove test_loc_copy_vs_view see #15631 (comment)

right that's what i mean revert that patch

@jorisvandenbossche
Copy link
Member

I opened a PR with a revert of the original patch: #37497

@simonjayhawkins
Copy link
Member

I opened a PR with a revert of the original patch: #37497

test_loc_copy_vs_view will fail.

@jorisvandenbossche
Copy link
Member

Ah, yes, thanks for the note, added an xfail
(I am personally not really sure the test is actually correct, though, but that's another discussion ;-))

@jorisvandenbossche
Copy link
Member

This PR can probably be claused now with #37508 merged?

@samilAyoub sorry about sidetracking this PR, but wanted to go fast yesterday to be able to get a fix into 1.1.4 release today

@simonjayhawkins
Copy link
Member

Thanks @samilAyoub for working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: inplace arithmetic operation on Series no longer updating parent DataFrame
7 participants