Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/BUG: .apply will correctly infer output shape when axis=1 #18577

Merged
merged 12 commits into from
Feb 7, 2018
10 changes: 8 additions & 2 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -793,8 +793,14 @@ The :meth:`~DataFrame.apply` method will also dispatch on a string method name.
df.apply('mean')
df.apply('mean', axis=1)

Depending on the return type of the function passed to :meth:`~DataFrame.apply`,
the result will either be of lower dimension or the same dimension.
The return type of the function passed to :meth:`~DataFrame.apply` affects the
Copy link
Member

@jorisvandenbossche jorisvandenbossche Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"return type" -> "default return type" (and maybe after the two bullet points mention that it can be controlled with the result_type keyword (without going into details) ?)

type of the ultimate output from DataFrame.apply

* If the applied function returns a ``Series``, the ultimate output is a ``DataFrame``.
The columns match the index of the ``Series`` returned by the applied function.
* If the applied function returns any other type, the ultimate output is a ``Series``.
* A ``result_type`` kwarg is accepted with the options: ``reduce``, ``broadcast``, and ``expand``.
These will determine how list-likes return results expand (or not) to a ``DataFrame``.

:meth:`~DataFrame.apply` combined with some cleverness can be used to answer many questions
about a data set. For example, suppose we wanted to extract the date where the
Expand Down
73 changes: 71 additions & 2 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ Previous Behavior:
4 NaN
dtype: float64

Current Behavior
Current Behavior:

.. ipython:: python

Expand All @@ -167,7 +167,7 @@ Previous Behavior:
3 2.5
dtype: float64

Current Behavior
Current Behavior:

.. ipython:: python

Expand Down Expand Up @@ -332,6 +332,73 @@ Convert to an xarray DataArray

p.to_xarray()

.. _whatsnew_0230.api_breaking.apply:

Apply Changes
~~~~~~~~~~~~~

:func:`DataFrame.apply` was inconsistent when applying an arbitrary user-defined-function that returned a list-like with ``axis=1``. Several bugs and inconsistencies
are resolved. If the applied function returns a Series, then pandas will return a DataFrame; otherwise a Series will be returned, this includes the case
where a list-like (e.g. ``tuple`` or ``list`` is returned), (:issue:`16353`, :issue:`17437`, :issue:`17970`, :issue:`17348`, :issue:`17892`, :issue:`18573`,
:issue:`17602`, :issue:`18775`, :issue:`18901`, :issue:`18919`)

.. ipython:: python

df = pd.DataFrame(np.tile(np.arange(3), 6).reshape(6, -1) + 1, columns=['A', 'B', 'C'])
df

Previous Behavior. If the returned shape happened to match the original columns, this would return a ``DataFrame``.
If the return shape did not match, a ``Series`` with lists was returned.

.. code-block:: python

In [3]: df.apply(lambda x: [1, 2, 3], axis=1)
Out[3]:
A B C
0 1 2 3
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 3
5 1 2 3

In [4]: df.apply(lambda x: [1, 2], axis=1)
Out[4]:
0 [1, 2]
1 [1, 2]
2 [1, 2]
3 [1, 2]
4 [1, 2]
5 [1, 2]
dtype: object


New Behavior. The behavior is consistent. These will *always* return a ``Series``.

.. ipython:: python

df.apply(lambda x: [1, 2, 3], axis=1)
df.apply(lambda x: [1, 2], axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add an example of how to get the previous behavior? It would just be df.apply(lambda x: pd.Series([1, 2, 3], axis=1) right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check this one?
The example below is not exactly the same as the previous behaviour (in case the length matched)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to repeat: I think we also need to give a way how to keep the existing behaviour (can be in addition to this example).
But now that you added result_type='broadcast, I suppose it is this?


To have expanded columns, you can use ``result_type='expand'``

.. ipython:: python

df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')

To have broadcast the result across, you can use ``result_type='broadcast'``. The shape
must match the original columns.

.. ipython:: python

df.apply(lambda x: [1, 2, 3], axis=1, result_type='broadcast')

Returning a ``Series`` allows one to control the exact return structure and column names:

.. ipython:: python

df.apply(lambda x: Series([1, 2, 3], index=x.index), axis=1)


.. _whatsnew_0230.api_breaking.build_changes:

Expand Down Expand Up @@ -456,6 +523,8 @@ Deprecations
- The ``is_copy`` attribute is deprecated and will be removed in a future version (:issue:`18801`).
- ``IntervalIndex.from_intervals`` is deprecated in favor of the :class:`IntervalIndex` constructor (:issue:`19263`)
- :func:``DataFrame.from_items`` is deprecated. Use :func:``DataFrame.from_dict()`` instead, or :func:``DataFrame.from_dict(OrderedDict())`` if you wish to preserve the key order (:issue:`17320`)
- The ``broadcast`` parameter of ``.apply()`` is removed in favor of ``result_type='broadcast'`` (:issue:`18577`)
- The ``reduce`` parameter of ``.apply()`` is removed in favor of ``result_type='reduce'`` (:issue:`18577`)

.. _whatsnew_0230.prior_deprecations:

Expand Down
Loading