Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: str.cat will align on Series #20347

Merged
merged 10 commits into from
May 2, 2018
80 changes: 70 additions & 10 deletions doc/source/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -247,27 +247,87 @@ Missing values on either side will result in missing values in the result as wel
s.str.cat(t)
s.str.cat(t, na_rep='-')

Series are *not* aligned on their index before concatenation:
Concatenating a Series and something array-like into a Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. versionadded:: 0.23.0

The parameter ``others`` can also be two-dimensional. In this case, the number or rows must match the lengths of the calling ``Series`` (or ``Index``).

.. ipython:: python

u = pd.Series(['b', 'd', 'e', 'c'], index=[1, 3, 4, 2])
# without alignment
d = pd.concat([t, s], axis=1)
s
d
s.str.cat(d, na_rep='-')

Concatenating a Series and an indexed object into a Series, with alignment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. versionadded:: 0.23.0

For concatenation with a ``Series`` or ``DataFrame``, it is possible to align the indexes before concatenation by setting
the ``join``-keyword.

.. ipython:: python

u = pd.Series(['b', 'd', 'a', 'c'], index=[1, 3, 0, 2])
s
u
s.str.cat(u)
# with separate alignment
v, w = s.align(u)
v.str.cat(w, na_rep='-')
s.str.cat(u, join='left')

.. warning::

If the ``join`` keyword is not passed, the method :meth:`~Series.str.cat` will currently fall back to the behavior before version 0.23.0 (i.e. no alignment),
but a ``FutureWarning`` will be raised if any of the involved indexes differ, since this default will change to ``join='left'`` in a future version.

The usual options are available for ``join`` (one of ``'left', 'outer', 'inner', 'right'``).
In particular, alignment also means that the different lengths do not need to coincide anymore.

.. ipython:: python

v = pd.Series(['z', 'a', 'b', 'd', 'e'], index=[-1, 0, 1, 3, 4])
s
v
s.str.cat(v, join='left', na_rep='-')
s.str.cat(v, join='outer', na_rep='-')

The same alignment can be used when ``others`` is a ``DataFrame``:

.. ipython:: python

f = d.loc[[3, 2, 1, 0], :]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show f here

s
f
s.str.cat(f, join='left', na_rep='-')

Concatenating a Series and many objects into a Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

List-likes (excluding iterators, ``dict``-views, etc.) can be arbitrarily combined in a list.
All elements of the list must match in length to the calling ``Series`` (resp. ``Index``):
All one-dimensional list-likes can be arbitrarily combined in a list-like container (including iterators, ``dict``-views, etc.):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what happens with a dict? Do we use the keys, or does it transform to a series and get aligned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about d.keys() or d.values(). For passing a dict directly, currently the keys would get read (as that's what x in d would return).

Copy link
Contributor Author

@h-vetinari h-vetinari May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could of course add another safety that maps d to d.values().

However, I tend to think that this could be left to the python-skills of the user as well -- a dictionary is not "list-like" from the POV of normal python usage (whereas its keys and values are, see d = dict(zip(keys, values))).


.. ipython:: python

s
u
s.str.cat([u, pd.Index(u.values), ['A', 'B', 'C', 'D'], map(int, u.index)], na_rep='-')

All elements must match in length to the calling ``Series`` (or ``Index``), except those having an index if ``join`` is not None:

.. ipython:: python

v
s.str.cat([u, v, ['A', 'B', 'C', 'D']], join='outer', na_rep='-')

If using ``join='right'`` on a list of ``others`` that contains different indexes,
the union of these indexes will be used as the basis for the final concatenation:

.. ipython:: python

x = pd.Series([1, 2, 3, 4], index=['A', 'B', 'C', 'D'])
s.str.cat([['A', 'B', 'C', 'D'], s, s.values, x.index])
u.loc[[3]]
v.loc[[-1, 0]]
s.str.cat([u.loc[[3]], v.loc[[-1, 0]]], join='right', na_rep='-')

Indexing with ``.str``
----------------------
Expand Down
18 changes: 18 additions & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,24 @@ The :func:`DataFrame.assign` now accepts dependent keyword arguments for python

df.assign(A=df.A+1, C= lambda df: df.A* -1)

.. _whatsnew_0230.enhancements.str_cat_align:

``Series.str.cat`` has gained the ``join`` kwarg
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, :meth:`Series.str.cat` did not -- in contrast to most of ``pandas`` -- align :class:`Series` on their index before concatenation (see :issue:`18657`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a reference to the text.rst doc section (top of the .cat section is ok)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference is already there after the examples - or do you wanna open with it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be near the top

The method has now gained a keyword ``join`` to control the manner of alignment, see examples below and in :ref:`here <text.concatenate>`.

In v.0.23 `join` will default to None (meaning no alignment), but this default will change to ``'left'`` in a future version of pandas.

.. ipython:: python

s = pd.Series(['a', 'b', 'c', 'd'])
t = pd.Series(['b', 'd', 'e', 'c'], index=[1, 3, 4, 2])
s.str.cat(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too long examples here

s.str.cat(t, join='left', na_rep='-')

Furthermore, meth:`Series.str.cat` now works for ``CategoricalIndex`` as well (previously raised a ``ValueError``; see :issue:`20842`).

.. _whatsnew_0230.enhancements.astype_category:

Expand Down
Loading