Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: aggregations were getting overwritten if they had the same name #30858

Merged
merged 34 commits into from
Jul 14, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
20049c1
:bug: aggregations were getting overwritten if they had the same name
Jan 9, 2020
ab685fd
:art: shorten test for the sake of legibility
Jan 21, 2020
e38e450
:art: handle empty in , make whatsnewentry public-facing
Jan 21, 2020
cb849a2
:pencil: move whatsnew entry to v1.1.0
Jan 23, 2020
521bc1d
remove accidentally added whatsnewentry
MarcoGorelli Feb 2, 2020
ec93c4f
Merge branch 'master' into multiple-aggregations
MarcoGorelli Mar 3, 2020
6f9aac8
Update v1.1.0.rst
MarcoGorelli Mar 3, 2020
a8e9121
remove dataframe constructor
Mar 4, 2020
b857c6d
Dict instead of Mapping
Mar 4, 2020
44d00df
Merge branch 'master' into multiple-aggregations
MarcoGorelli Mar 5, 2020
523effb
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli Mar 15, 2020
552063a
remove no longer necessary setting of random seed
MarcoGorelli Mar 15, 2020
5e2e7d2
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli Apr 19, 2020
40f7e31
don't return slice in concat
MarcoGorelli Apr 19, 2020
f8f2d7f
Add test containing ohlc
MarcoGorelli Apr 19, 2020
dba7dde
Add named aggregation resample test, add to whatsnew
MarcoGorelli Apr 19, 2020
1b43ed1
revert empty line change
MarcoGorelli Apr 19, 2020
868a680
remove 30092 from whatsnew as the issue is already fixed in 1.0.3 and…
MarcoGorelli Apr 19, 2020
5d7f3db
Merge branch 'master' into multiple-aggregations
MarcoGorelli May 2, 2020
14b2402
catch performancewarning in test
MarcoGorelli May 2, 2020
829dce8
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli May 3, 2020
3469f5d
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli May 9, 2020
862b39e
make test same as in OP
MarcoGorelli May 10, 2020
5e3f333
make test match OP exactly
MarcoGorelli May 10, 2020
e7629f3
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli May 13, 2020
51158ef
split into two tests
MarcoGorelli May 18, 2020
447dfea
split into two tests
MarcoGorelli May 18, 2020
2693956
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli May 18, 2020
aa988a4
add test with namedtuple
MarcoGorelli May 27, 2020
7a62f5f
better layout
MarcoGorelli May 27, 2020
d80ddc5
better layout
MarcoGorelli May 27, 2020
4f954d4
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli Jun 27, 2020
62d91d1
dont special case empty output
MarcoGorelli Jun 27, 2020
fb3ba5c
Merge remote-tracking branch 'upstream/master' into multiple-aggregat…
MarcoGorelli Jul 14, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@ Reshaping
- Bug in :func:`crosstab` when inputs are two Series and have tuple names, the output will keep dummy MultiIndex as columns. (:issue:`18321`)
- :meth:`DataFrame.pivot` can now take lists for ``index`` and ``columns`` arguments (:issue:`21425`)
- Bug in :func:`concat` where the resulting indices are not copied when ``copy=True`` (:issue:`29879`)
- Bug in :meth:`SeriesGroupBy.aggregate` was resulting in aggregations being overwritten when they shared the same name (:issue:`30092`)
- :meth:`DataFrame.replace` and :meth:`Series.replace` will raise a ``TypeError`` if ``to_replace`` is not an expected type. Previously the ``replace`` would fail silently (:issue:`18634`)


Expand Down
15 changes: 8 additions & 7 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,8 +311,8 @@ def _aggregate_multiple_funcs(self, arg):

arg = zip(columns, arg)

results = {}
for name, func in arg:
results: Mapping[base.OutputKey, Union[Series, DataFrame]] = {}
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
for idx, (name, func) in enumerate(arg):
obj = self

# reset the cache so that we
Expand All @@ -321,13 +321,12 @@ def _aggregate_multiple_funcs(self, arg):
obj = copy.copy(obj)
obj._reset_cache()
obj._selection = name
results[name] = obj.aggregate(func)
results[base.OutputKey(label=name, position=idx)] = obj.aggregate(func)
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved

if any(isinstance(x, DataFrame) for x in results.values()):
# let higher level handle
return results

return DataFrame(results, columns=columns)
return {key.label: value for key, value in results.items()}
jreback marked this conversation as resolved.
Show resolved Hide resolved
return DataFrame(self._wrap_aggregated_output(results), columns=columns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the DataFrame constructor still required here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test

pytest pandas/tests/groupby/aggregate/test_aggregate.py::test_aggregate_item_by_item

when we get here we have

(Pdb) results
{OutputKey(label='<lambda>', position=0): A
bar    3
foo    5
Name: B, dtype: int64}
(Pdb) self._wrap_aggregated_output(results)
A
bar    3
foo    5
Name: <lambda>, dtype: int64
(Pdb) type(self._wrap_aggregated_output(results))
<class 'pandas.core.series.Series'>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd have updated with a call to .to_frame (if necessary)


def _wrap_series_output(
self, output: Mapping[base.OutputKey, Union[Series, np.ndarray]], index: Index
Expand Down Expand Up @@ -358,8 +357,10 @@ def _wrap_series_output(
if len(output) > 1:
result = DataFrame(indexed_output, index=index)
result.columns = columns
else:
elif not columns.empty:
result = Series(indexed_output[0], index=index, name=columns[0])
else:
jreback marked this conversation as resolved.
Show resolved Hide resolved
result = DataFrame()

return result

Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/groupby/aggregate/test_aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,24 @@ def test_agg_multiple_functions_maintain_order(df):
tm.assert_index_equal(result.columns, exp_cols)


def test_agg_multiple_functions_same_name(df):
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
# GH 30880
np.random.seed(1)
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
df = tm.makeTimeDataFrame()
result = df.resample("3D").agg(
{"A": [functools.partial(np.std, ddof=0), functools.partial(np.std, ddof=1)]}
)
expected_index = pd.date_range("2000-01-03", "2000-02-11", freq="3D")
expected_columns = pd.MultiIndex.from_tuples([("A", "std"), ("A", "std")])
expected_values = np.array(
[df.resample("3D").A.std(ddof=i).values for i in range(2)]
).T
expected = pd.DataFrame(
expected_values, columns=expected_columns, index=expected_index
)
tm.assert_frame_equal(result, expected)


def test_multiple_functions_tuples_and_non_tuples(df):
# #1359
funcs = [("foo", "mean"), "std"]
Expand Down