Skip to content

Commit

Permalink
ENH/DEPR: add .sorted() method for API consistency, pandas-dev#9816, p…
Browse files Browse the repository at this point in the history
…andas-dev#8239

DEPR: remove of na_last from Series.order/Series.sort, xref pandas-dev#5231
  • Loading branch information
jreback committed Aug 18, 2015
1 parent 13cb1a7 commit 13d2d71
Show file tree
Hide file tree
Showing 23 changed files with 794 additions and 488 deletions.
9 changes: 3 additions & 6 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -434,9 +434,8 @@ Reshaping, sorting
:toctree: generated/

Series.argsort
Series.order
Series.reorder_levels
Series.sort
Series.sort_values
Series.sort_index
Series.sortlevel
Series.swaplevel
Expand Down Expand Up @@ -908,7 +907,7 @@ Reshaping, sorting, transposing

DataFrame.pivot
DataFrame.reorder_levels
DataFrame.sort
DataFrame.sort_values
DataFrame.sort_index
DataFrame.sortlevel
DataFrame.nlargest
Expand Down Expand Up @@ -1293,7 +1292,6 @@ Modifying and Computations
Index.insert
Index.min
Index.max
Index.order
Index.reindex
Index.repeat
Index.take
Expand All @@ -1319,8 +1317,7 @@ Sorting
:toctree: generated/

Index.argsort
Index.order
Index.sort
Index.sort_values

Time-specific operations
~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
45 changes: 31 additions & 14 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1418,39 +1418,56 @@ description.

.. _basics.sorting:

Sorting by index and value
--------------------------
Sorting
-------

.. warning::

The sorting API is substantially changed in 0.17.0, see :ref:`here <whatsnew_0170.api_breaking.sorting>` for these changes.
In particular, all sorting methods now return a new object by default, and **DO NOT** operate in-place (except by passing ``inplace=True``).

There are two obvious kinds of sorting that you may be interested in: sorting
by label and sorting by actual values. The primary method for sorting axis
labels (indexes) across data structures is the :meth:`~DataFrame.sort_index` method.
by label and sorting by actual values.

By Index
~~~~~~~~

The primary method for sorting axis
labels (indexes) are the ``Series.sort_index()`` and the ``DataFrame.sort_index()`` methods.

.. ipython:: python
unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'],
columns=['three', 'two', 'one'])
# DataFrame
unsorted_df.sort_index()
unsorted_df.sort_index(ascending=False)
unsorted_df.sort_index(axis=1)
:meth:`DataFrame.sort_index` can accept an optional ``by`` argument for ``axis=0``
# Series
unsorted_df['three'].sort_index()
By Values
~~~~~~~~~

The :meth:`Series.sort_values` and :meth:`DataFrame.sort_values` are the entry points for **value** sorting (that is the values in a column or row).
:meth:`DataFrame.sort_values` can accept an optional ``by`` argument for ``axis=0``
which will use an arbitrary vector or a column name of the DataFrame to
determine the sort order:

.. ipython:: python
df1 = pd.DataFrame({'one':[2,1,1,1],'two':[1,3,2,4],'three':[5,4,3,2]})
df1.sort_index(by='two')
df1.sort_values(by='two')
The ``by`` argument can take a list of column names, e.g.:

.. ipython:: python
df1[['one', 'two', 'three']].sort_index(by=['one','two'])
Series has the method :meth:`~Series.order` (analogous to `R's order function
<http://stat.ethz.ch/R-manual/R-patched/library/base/html/order.html>`__) which
sorts by value, with special treatment of NA values via the ``na_position``
These methods have special treatment of NA values via the ``na_position``
argument:

.. ipython:: python
Expand All @@ -1459,11 +1476,11 @@ argument:
s.order()
s.order(na_position='first')
.. note::
:meth:`Series.sort` sorts a Series by value in-place. This is to provide
compatibility with NumPy methods which expect the ``ndarray.sort``
behavior. :meth:`Series.order` returns a copy of the sorted data.
.. _basics.searchsorted:

searchsorted
~~~~~~~~~~~~

Series has the :meth:`~Series.searchsorted` method, which works similar to
:meth:`numpy.ndarray.searchsorted`.
Expand Down Expand Up @@ -1493,7 +1510,7 @@ faster than sorting the entire Series and calling ``head(n)`` on the result.
s = pd.Series(np.random.permutation(10))
s
s.order()
s.sort_values()
s.nsmallest(3)
s.nlargest(3)
Expand Down
62 changes: 61 additions & 1 deletion doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ users upgrade to this version.
Highlights include:

- Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`
- The sorting API has been revamped to remove some long-time inconsistencies, see :ref:`here <whatsnew_0170.api_breaking.sorting>`
- The default for ``to_datetime`` will now be to ``raise`` when presented with unparseable formats,
previously this would return the original input, see :ref:`here <whatsnew_0170.api_breaking.to_datetime>`
- The default for ``dropna`` in ``HDFStore`` has changed to ``False``, to store by default all rows even
Expand Down Expand Up @@ -187,6 +188,65 @@ Other enhancements
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0170.api_breaking.sorting:

Changes to sorting API
^^^^^^^^^^^^^^^^^^^^^^

The sorting API has had some longtime inconsistencies. (:issue:`9816`,:issue:`8239`).

Here is a summary of the **prior** to 0.17.0 API

- ``Series.sort`` is **INPLACE** while ``DataFrame.sort`` returns a new object.
- ``Series.order`` returned a new object
- It was possible to use ``Series/DataFrame.sort_index`` to sort by **values** by passing the ``by`` keyword.
- ``Series/DataFrame.sortlevel`` worked only on a ``MultiIndex`` for sorting by index.

To address these issues, we have revamped the API:

- We have introduced a new method, :meth:`DataFrame.sort_values`, which is the merger of ``DataFrame.sort()``, ``Series.sort()``,
and ``Series.order``, to handle sorting of **values**.
- The existing method ``Series.sort()`` has been deprecated and will be removed in a
future version of pandas.
- The ``by`` argument of ``DataFrame.sort_index()`` has been deprecated and will be removed in a future version of pandas.
- The methods ``DataFrame.sort()``, ``Series.order()``, will not be recommended to use and will carry a deprecation warning
in the doc-string.
- The existing method ``.sort_index()`` will gain the ``level`` keyword to enable level sorting.

We now have two distinct and non-overlapping methods of sorting. A ``*`` marks items that
will show a ``FutureWarning``.

To sort by the **values**:

================================= ====================================
Previous Replacement
================================= ====================================
\*``Series.order()`` ``Series.sort_values()``
\*``Series.sort()`` ``Series.sort_values(inplace=True)``
\*``DataFrame.sort(columns=...)`` ``DataFrame.sort_values(by=...)``
================================= ====================================

To sort by the **index**:

================================= ====================================
Previous Equivalent
================================= ====================================
``Series.sort_index()`` ``Series.sort_index()``
``Series.sortlevel(level=...)`` ``Series.sort_index(level=...``)
``DataFrame.sort_index()`` ``DataFrame.sort_index()``
``DataFrame.sortlevel(level=...)`` ``DataFrame.sort_index(level=...)``
\*``DataFrame.sort()`` ``DataFrame.sort_index()``
================================== ====================================

We have also deprecated and changed similar methods in two Series-like classes, ``Index`` and ``Categorical``.

================================== ====================================
Previous Replacement
================================== ====================================
\*``Index.order()`` ``Index.sort_values()``
\*``Categorical.order()`` ``Categorical.sort_values``
================================== ====================================

.. _whatsnew_0170.api_breaking.to_datetime:

Changes to to_datetime and to_timedelta
Expand Down Expand Up @@ -570,7 +630,7 @@ Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- Remove use of some deprecated numpy comparison operations, mainly in tests. (:issue:`10569`)

- Removal of ``na_last`` parameters from ``Series.order()`` and ``Series.sort()``, in favor of ``na_position``, xref (:issue:`5231`)

.. _whatsnew_0170.performance:

Expand Down
6 changes: 2 additions & 4 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,9 +262,7 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
result.index = bins[:-1]

if sort:
result.sort()
if not ascending:
result = result[::-1]
result = result.sort_values(ascending=ascending)

if normalize:
result = result / float(values.size)
Expand Down Expand Up @@ -497,7 +495,7 @@ def select_n_slow(dropped, n, take_last, method):
reverse_it = take_last or method == 'nlargest'
ascending = method == 'nsmallest'
slc = np.s_[::-1] if reverse_it else np.s_[:]
return dropped[slc].order(ascending=ascending).head(n)
return dropped[slc].sort_values(ascending=ascending).head(n)


_select_methods = {'nsmallest': nsmallest, 'nlargest': nlargest}
Expand Down
43 changes: 37 additions & 6 deletions pandas/core/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1083,7 +1083,7 @@ def argsort(self, ascending=True, **kwargs):
result = result[::-1]
return result

def order(self, inplace=False, ascending=True, na_position='last'):
def sort_values(self, inplace=False, ascending=True, na_position='last'):
""" Sorts the Category by category value returning a new Categorical by default.
Only ordered Categoricals can be sorted!
Expand All @@ -1092,10 +1092,10 @@ def order(self, inplace=False, ascending=True, na_position='last'):
Parameters
----------
ascending : boolean, default True
Sort ascending. Passing False sorts descending
inplace : boolean, default False
Do operation in place.
ascending : boolean, default True
Sort ascending. Passing False sorts descending
na_position : {'first', 'last'} (optional, default='last')
'first' puts NaNs at the beginning
'last' puts NaNs at the end
Expand Down Expand Up @@ -1139,6 +1139,37 @@ def order(self, inplace=False, ascending=True, na_position='last'):
return Categorical(values=codes,categories=self.categories, ordered=self.ordered,
fastpath=True)

def order(self, inplace=False, ascending=True, na_position='last'):
"""
DEPRECATED: use :meth:`Categorical.sort_values`
Sorts the Category by category value returning a new Categorical by default.
Only ordered Categoricals can be sorted!
Categorical.sort is the equivalent but sorts the Categorical inplace.
Parameters
----------
inplace : boolean, default False
Do operation in place.
ascending : boolean, default True
Sort ascending. Passing False sorts descending
na_position : {'first', 'last'} (optional, default='last')
'first' puts NaNs at the beginning
'last' puts NaNs at the end
Returns
-------
y : Category or None
See Also
--------
Category.sort
"""
warn("order is deprecated, use sort_values(...)",
FutureWarning, stacklevel=2)
return self.sort_values(inplace=inplace, ascending=ascending, na_position=na_position)

def sort(self, inplace=True, ascending=True, na_position='last'):
""" Sorts the Category inplace by category value.
Expand All @@ -1163,10 +1194,10 @@ def sort(self, inplace=True, ascending=True, na_position='last'):
See Also
--------
Category.order
Category.sort_values
"""
return self.order(inplace=inplace, ascending=ascending,
na_position=na_position)
return self.sort_values(inplace=inplace, ascending=ascending,
na_position=na_position)

def ravel(self, order='C'):
""" Return a flattened (numpy) array.
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -2155,6 +2155,9 @@ def _mut_exclusive(**kwargs):
return val2


def _not_none(*args):
return (arg for arg in args if arg is not None)

def _any_none(*args):
for arg in args:
if arg is None:
Expand Down
Loading

0 comments on commit 13d2d71

Please sign in to comment.