-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: groupby
then resample
on column gives incorrect results if the index is out of order
#59408
base: main
Are you sure you want to change the base?
BUG: groupby
then resample
on column gives incorrect results if the index is out of order
#59408
Conversation
This PR looks good to me. I don't think the CI failures are related so @aram-cinnamon if you merge in the main branch will likely fix those. @rhshadrach @mroeschke any comments? |
RemovedI think the example below "works" on main (i.e. doesn't raise - but it still gives incorrect results as in the issue), but would raise if we merge this PR. df = pd.DataFrame(dict(
datetime=[pd.to_datetime('2024-07-30T00:00Z'), pd.to_datetime('2024-07-30T00:01Z')],
group=['A', 'A'],
value=[100, 200],
), index=pd.Index([1, 0], name="datetime"))
result = df.groupby('group').resample('1min', on='datetime').aggregate(dict(value='sum'))
print(result) @aram-cinnamon - can you confirm? Ignore me - I missed the |
5baf8eb
to
3ff8145
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes based on comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking much better - a question below.
70e71f7
to
55fe96b
Compare
@aram-cinnamon - not sure if you're aware, but once a review has been done, force-pushing means that reviewers need to review the entire PR again rather than being able to look at your subsequent changes. |
elif isinstance(obj.index, RangeIndex): | ||
ax = self._grouper.take(obj.index) | ||
else: | ||
# GH 59350 | ||
ax = self._grouper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we need to do a .take
in the RangeIndex
case, but not otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, returning to this. I put in that condition to make this test pass: def test_groupby_resample_on_api_with_getitem(self):
which was apparently added as part of #17813
I have not had a chance to look too deeply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding index=list("xyzwt")
to the DataFrame in that test makes the op fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the issue happens on pandas.core.resample:1559
. We pass only the group x
but the entire axis self.ax
. I wonder if splitting the axis as we do data
on L967 there and passing it to f
would resolve.
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_groupby_resample_then_groupby_is_reused_when_index_is_out_of_order(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test can be removed. In general I do not think we can sustainably test pairs of operations.
tm.assert_frame_equal(result_1, result_3) | ||
|
||
|
||
def test_groupby_resample_then_groupby_is_reused_when_index_is_set_from_column(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same.
tm.assert_frame_equal(result_1, result_3) | ||
|
||
|
||
def test_groupby_resample_then_groupby_is_reused_when_groupby_selection_is_not_none(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.