Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: groupby then resample on column gives incorrect results if the index is out of order #59408

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

aram-cinnamon
Copy link
Contributor

@WillAyd
Copy link
Member

WillAyd commented Aug 9, 2024

This PR looks good to me. I don't think the CI failures are related so @aram-cinnamon if you merge in the main branch will likely fix those.

@rhshadrach @mroeschke any comments?

@rhshadrach
Copy link
Member

rhshadrach commented Aug 10, 2024

Removed

I think the example below "works" on main (i.e. doesn't raise - but it still gives incorrect results as in the issue), but would raise if we merge this PR.

df = pd.DataFrame(dict(
    datetime=[pd.to_datetime('2024-07-30T00:00Z'), pd.to_datetime('2024-07-30T00:01Z')],
    group=['A', 'A'],
    value=[100, 200],
), index=pd.Index([1, 0], name="datetime"))

result = df.groupby('group').resample('1min', on='datetime').aggregate(dict(value='sum'))
print(result)

@aram-cinnamon - can you confirm?

Ignore me - I missed the drop=True in the reset_index.

pandas/core/resample.py Outdated Show resolved Hide resolved
@aram-cinnamon aram-cinnamon force-pushed the groupby-then-resample-if-index-out-of-order branch from 5baf8eb to 3ff8145 Compare August 10, 2024 14:44
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on comment above.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much better - a question below.

pandas/core/resample.py Outdated Show resolved Hide resolved
@aram-cinnamon aram-cinnamon force-pushed the groupby-then-resample-if-index-out-of-order branch from 70e71f7 to 55fe96b Compare August 17, 2024 22:39
@rhshadrach
Copy link
Member

aram-cinnamon force-pushed the groupby-then-resample-if-index-out-of-order branch from 70e71f7 to 55fe96b

@aram-cinnamon - not sure if you're aware, but once a review has been done, force-pushing means that reviewers need to review the entire PR again rather than being able to look at your subsequent changes.

Comment on lines +352 to +356
elif isinstance(obj.index, RangeIndex):
ax = self._grouper.take(obj.index)
else:
# GH 59350
ax = self._grouper
Copy link
Member

@rhshadrach rhshadrach Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we need to do a .take in the RangeIndex case, but not otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, returning to this. I put in that condition to make this test pass: def test_groupby_resample_on_api_with_getitem(self): which was apparently added as part of #17813
I have not had a chance to look too deeply.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding index=list("xyzwt") to the DataFrame in that test makes the op fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the issue happens on pandas.core.resample:1559. We pass only the group x but the entire axis self.ax. I wonder if splitting the axis as we do data on L967 there and passing it to f would resolve.

tm.assert_frame_equal(result, expected)


def test_groupby_resample_then_groupby_is_reused_when_index_is_out_of_order():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test can be removed. In general I do not think we can sustainably test pairs of operations.

tm.assert_frame_equal(result_1, result_3)


def test_groupby_resample_then_groupby_is_reused_when_index_is_set_from_column():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

tm.assert_frame_equal(result_1, result_3)


def test_groupby_resample_then_groupby_is_reused_when_groupby_selection_is_not_none():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: groupby then resample on column gives incorrect results if the index is out of order
3 participants