DEPR: deprecate relableling dicts in groupby.agg (pandas-dev#15931)

* DEPR: deprecate relabling dictionarys in groupby.agg
jreback · Apr 13, 2017 · 1c4dacb · 1c4dacb
1 parent 7b8a6b1
commit 1c4dacb
Show file tree

Hide file tree

Showing 12 changed files with 418 additions and 121 deletions.
diff --git a/doc/source/computation.rst b/doc/source/computation.rst
@@ -610,14 +610,6 @@ aggregation with, outputting a DataFrame:
 
    r['A'].agg([np.sum, np.mean, np.std])
 
-If a dict is passed, the keys will be used to name the columns. Otherwise the
-function's name (stored in the function object) will be used.
-
-.. ipython:: python
-
-   r['A'].agg({'result1' : np.sum,
-               'result2' : np.mean})
-
 On a widowed DataFrame, you can pass a list of functions to apply to each
 column, which produces an aggregated result with a hierarchical index:
 

diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst
@@ -502,31 +502,43 @@ index are the group names and whose values are the sizes of each group.
 Applying multiple functions at once
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-With grouped Series you can also pass a list or dict of functions to do
+With grouped ``Series`` you can also pass a list or dict of functions to do
 aggregation with, outputting a DataFrame:
 
 .. ipython:: python
 
    grouped = df.groupby('A')
    grouped['C'].agg([np.sum, np.mean, np.std])
 
-If a dict is passed, the keys will be used to name the columns. Otherwise the
-function's name (stored in the function object) will be used.
+On a grouped ``DataFrame``, you can pass a list of functions to apply to each
+column, which produces an aggregated result with a hierarchical index:
 
 .. ipython:: python
 
-   grouped['D'].agg({'result1' : np.sum,
-                     'result2' : np.mean})
+   grouped.agg([np.sum, np.mean, np.std])
 
-On a grouped DataFrame, you can pass a list of functions to apply to each
-column, which produces an aggregated result with a hierarchical index:
+
+The resulting aggregations are named for the functions themselves. If you
+need to rename, then you can add in a chained operation for a ``Series`` like this:
 
 .. ipython:: python
 
-   grouped.agg([np.sum, np.mean, np.std])
+   (grouped['C'].agg([np.sum, np.mean, np.std])
+                .rename(columns={'sum': 'foo',
+                                 'mean': 'bar',
+                                 'std': 'baz'})
+   )
+
+For a grouped ``DataFrame``, you can rename in a similar manner:
+
+.. ipython:: python
+
+   (grouped.agg([np.sum, np.mean, np.std])
+           .rename(columns={'sum': 'foo',
+                            'mean': 'bar',
+                            'std': 'baz'})
+    )
 
-Passing a dict of functions has different behavior by default, see the next
-section.
 
 Applying different functions to DataFrame columns
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst
@@ -1549,14 +1549,6 @@ You can pass a list or dict of functions to do aggregation with, outputting a Da
 
    r['A'].agg([np.sum, np.mean, np.std])
 
-If a dict is passed, the keys will be used to name the columns. Otherwise the
-function's name (stored in the function object) will be used.
-
-.. ipython:: python
-
-   r['A'].agg({'result1' : np.sum,
-               'result2' : np.mean})
-
 On a resampled DataFrame, you can pass a list of functions to apply to each
 column, which produces an aggregated result with a hierarchical index:
 

diff --git a/doc/source/whatsnew/v0.20.0.txt b/doc/source/whatsnew/v0.20.0.txt
@@ -456,6 +456,88 @@ Convert to an xarray DataArray
 
    p.to_xarray()
 
+.. _whatsnew_0200.api_breaking.deprecate_group_agg_dict:
+
+Deprecate groupby.agg() with a dictionary when renaming
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``.groupby(..).agg(..)``, ``.rolling(..).agg(..)``, and ``.resample(..).agg(..)``  syntax can accept a variable of inputs, including scalars,
+list, and a dict of column names to scalars or lists. This provides a useful syntax for constructing multiple
+(potentially different) aggregations.
+
+However, ``.agg(..)`` can *also* accept a dict that allows 'renaming' of the result columns. This is a complicated and confusing syntax, as well as not consistent
+between ``Series`` and ``DataFrame``. We are deprecating this 'renaming' functionaility.
+
+1) We are deprecating passing a dict to a grouped/rolled/resampled ``Series``. This allowed
+one to ``rename`` the resulting aggregation, but this had a completely different
+meaning than passing a dictionary to a grouped ``DataFrame``, which accepts column-to-aggregations.
+2) We are deprecating passing a dict-of-dicts to a grouped/rolled/resampled ``DataFrame`` in a similar manner.
+
+This is an illustrative example:
+
+.. ipython:: python
+
+    df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
+                       'B': range(5),
+                       'C': range(5)})
+    df
+
+Here is a typical useful syntax for computing different aggregations for different columns. This
+is a natural (and useful) syntax. We aggregate from the dict-to-list by taking the specified
+columns and applying the list of functions. This returns a ``MultiIndex`` for the columns.
+
+.. ipython:: python
+
+   df.groupby('A').agg({'B': 'sum', 'C': 'min'})
+
+Here's an example of the first deprecation (1), passing a dict to a grouped ``Series``. This
+is a combination aggregation & renaming:
+
+.. code-block:: ipython
+
+   In [6]: df.groupby('A').B.agg({'foo': 'count'})
+   FutureWarning: using a dict on a Series for aggregation
+   is deprecated and will be removed in a future version
+
+   Out[6]:
+      foo
+   A
+   1    3
+   2    2
+
+You can accomplish the same operation, more idiomatically by:
+
+.. ipython:: python
+
+   df.groupby('A').B.agg(['count']).rename({'count': 'foo'})
+
+
+Here's an example of the second deprecation (2), passing a dict-of-dict to a grouped ``DataFrame``:
+
+.. code-block:: python
+
+   In [23]: (df.groupby('A')
+               .agg({'B': {'foo': 'sum'}, 'C': {'bar': 'min'}})
+            )
+   FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
+
+   Out[23]:
+        B   C
+      foo bar
+   A
+   1   3   0
+   2   7   3
+
+
+You can accomplish nearly the same by:
+
+.. ipython:: python
+
+   (df.groupby('A')
+      .agg({'B': 'sum', 'C': 'min'})
+      .rename(columns={'B': 'foo', 'C': 'bar'})
+   )
+
 .. _whatsnew.api_breaking.io_compat:
 
 Possible incompat for HDF5 formats for pandas < 0.13.0