Skip to content

Commit

Permalink
DEPR: deprecate relableling dicts in groupby.agg (pandas-dev#15931)
Browse files Browse the repository at this point in the history
* DEPR: deprecate relabling dictionarys in groupby.agg
  • Loading branch information
jreback authored Apr 13, 2017
1 parent 7b8a6b1 commit 1c4dacb
Show file tree
Hide file tree
Showing 12 changed files with 418 additions and 121 deletions.
8 changes: 0 additions & 8 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -610,14 +610,6 @@ aggregation with, outputting a DataFrame:
r['A'].agg([np.sum, np.mean, np.std])
If a dict is passed, the keys will be used to name the columns. Otherwise the
function's name (stored in the function object) will be used.

.. ipython:: python
r['A'].agg({'result1' : np.sum,
'result2' : np.mean})
On a widowed DataFrame, you can pass a list of functions to apply to each
column, which produces an aggregated result with a hierarchical index:

Expand Down
32 changes: 22 additions & 10 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -502,31 +502,43 @@ index are the group names and whose values are the sizes of each group.
Applying multiple functions at once
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With grouped Series you can also pass a list or dict of functions to do
With grouped ``Series`` you can also pass a list or dict of functions to do
aggregation with, outputting a DataFrame:

.. ipython:: python
grouped = df.groupby('A')
grouped['C'].agg([np.sum, np.mean, np.std])
If a dict is passed, the keys will be used to name the columns. Otherwise the
function's name (stored in the function object) will be used.
On a grouped ``DataFrame``, you can pass a list of functions to apply to each
column, which produces an aggregated result with a hierarchical index:

.. ipython:: python
grouped['D'].agg({'result1' : np.sum,
'result2' : np.mean})
grouped.agg([np.sum, np.mean, np.std])
On a grouped DataFrame, you can pass a list of functions to apply to each
column, which produces an aggregated result with a hierarchical index:
The resulting aggregations are named for the functions themselves. If you
need to rename, then you can add in a chained operation for a ``Series`` like this:

.. ipython:: python
grouped.agg([np.sum, np.mean, np.std])
(grouped['C'].agg([np.sum, np.mean, np.std])
.rename(columns={'sum': 'foo',
'mean': 'bar',
'std': 'baz'})
)
For a grouped ``DataFrame``, you can rename in a similar manner:

.. ipython:: python
(grouped.agg([np.sum, np.mean, np.std])
.rename(columns={'sum': 'foo',
'mean': 'bar',
'std': 'baz'})
)
Passing a dict of functions has different behavior by default, see the next
section.
Applying different functions to DataFrame columns
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
8 changes: 0 additions & 8 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1549,14 +1549,6 @@ You can pass a list or dict of functions to do aggregation with, outputting a Da
r['A'].agg([np.sum, np.mean, np.std])
If a dict is passed, the keys will be used to name the columns. Otherwise the
function's name (stored in the function object) will be used.

.. ipython:: python
r['A'].agg({'result1' : np.sum,
'result2' : np.mean})
On a resampled DataFrame, you can pass a list of functions to apply to each
column, which produces an aggregated result with a hierarchical index:

Expand Down
82 changes: 82 additions & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -456,6 +456,88 @@ Convert to an xarray DataArray

p.to_xarray()

.. _whatsnew_0200.api_breaking.deprecate_group_agg_dict:

Deprecate groupby.agg() with a dictionary when renaming
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``.groupby(..).agg(..)``, ``.rolling(..).agg(..)``, and ``.resample(..).agg(..)`` syntax can accept a variable of inputs, including scalars,
list, and a dict of column names to scalars or lists. This provides a useful syntax for constructing multiple
(potentially different) aggregations.

However, ``.agg(..)`` can *also* accept a dict that allows 'renaming' of the result columns. This is a complicated and confusing syntax, as well as not consistent
between ``Series`` and ``DataFrame``. We are deprecating this 'renaming' functionaility.

1) We are deprecating passing a dict to a grouped/rolled/resampled ``Series``. This allowed
one to ``rename`` the resulting aggregation, but this had a completely different
meaning than passing a dictionary to a grouped ``DataFrame``, which accepts column-to-aggregations.
2) We are deprecating passing a dict-of-dicts to a grouped/rolled/resampled ``DataFrame`` in a similar manner.

This is an illustrative example:

.. ipython:: python

df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
'B': range(5),
'C': range(5)})
df

Here is a typical useful syntax for computing different aggregations for different columns. This
is a natural (and useful) syntax. We aggregate from the dict-to-list by taking the specified
columns and applying the list of functions. This returns a ``MultiIndex`` for the columns.

.. ipython:: python

df.groupby('A').agg({'B': 'sum', 'C': 'min'})

Here's an example of the first deprecation (1), passing a dict to a grouped ``Series``. This
is a combination aggregation & renaming:

.. code-block:: ipython

In [6]: df.groupby('A').B.agg({'foo': 'count'})
FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version

Out[6]:
foo
A
1 3
2 2

You can accomplish the same operation, more idiomatically by:

.. ipython:: python

df.groupby('A').B.agg(['count']).rename({'count': 'foo'})


Here's an example of the second deprecation (2), passing a dict-of-dict to a grouped ``DataFrame``:

.. code-block:: python

In [23]: (df.groupby('A')
.agg({'B': {'foo': 'sum'}, 'C': {'bar': 'min'}})
)
FutureWarning: using a dict with renaming is deprecated and will be removed in a future version

Out[23]:
B C
foo bar
A
1 3 0
2 7 3


You can accomplish nearly the same by:

.. ipython:: python

(df.groupby('A')
.agg({'B': 'sum', 'C': 'min'})
.rename(columns={'B': 'foo', 'C': 'bar'})
)

.. _whatsnew.api_breaking.io_compat:

Possible incompat for HDF5 formats for pandas < 0.13.0
Expand Down
Loading

0 comments on commit 1c4dacb

Please sign in to comment.