ENH/DEPR: add .sorted() method for API consistency, pandas-dev#9816, p…

…andas-dev#8239 DEPR: remove of na_last from Series.order/Series.sort, xref pandas-dev#5231
jreback · Aug 18, 2015 · 13d2d71 · 13d2d71
1 parent 13cb1a7
commit 13d2d71
Show file tree

Hide file tree

Showing 23 changed files with 794 additions and 488 deletions.
diff --git a/doc/source/api.rst b/doc/source/api.rst
@@ -434,9 +434,8 @@ Reshaping, sorting
    :toctree: generated/
 
    Series.argsort
-   Series.order
    Series.reorder_levels
-   Series.sort
+   Series.sort_values
    Series.sort_index
    Series.sortlevel
    Series.swaplevel
@@ -908,7 +907,7 @@ Reshaping, sorting, transposing
 
    DataFrame.pivot
    DataFrame.reorder_levels
-   DataFrame.sort
+   DataFrame.sort_values
    DataFrame.sort_index
    DataFrame.sortlevel
    DataFrame.nlargest
@@ -1293,7 +1292,6 @@ Modifying and Computations
    Index.insert
    Index.min
    Index.max
-   Index.order
    Index.reindex
    Index.repeat
    Index.take
@@ -1319,8 +1317,7 @@ Sorting
    :toctree: generated/
 
    Index.argsort
-   Index.order
-   Index.sort
+   Index.sort_values
 
 Time-specific operations
 ~~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/doc/source/basics.rst b/doc/source/basics.rst
@@ -1418,39 +1418,56 @@ description.
 
 .. _basics.sorting:
 
-Sorting by index and value
---------------------------
+Sorting
+-------
+
+.. warning::
+
+   The sorting API is substantially changed in 0.17.0, see :ref:`here <whatsnew_0170.api_breaking.sorting>` for these changes.
+   In particular, all sorting methods now return a new object by default, and **DO NOT** operate in-place (except by passing ``inplace=True``).
 
 There are two obvious kinds of sorting that you may be interested in: sorting
-by label and sorting by actual values. The primary method for sorting axis
-labels (indexes) across data structures is the :meth:`~DataFrame.sort_index` method.
+by label and sorting by actual values.
+
+By Index
+~~~~~~~~
+
+The primary method for sorting axis
+labels (indexes) are the ``Series.sort_index()`` and the ``DataFrame.sort_index()`` methods.
 
 .. ipython:: python
 
    unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'],
                             columns=['three', 'two', 'one'])
+
+   # DataFrame
    unsorted_df.sort_index()
    unsorted_df.sort_index(ascending=False)
    unsorted_df.sort_index(axis=1)
 
-:meth:`DataFrame.sort_index` can accept an optional ``by`` argument for ``axis=0``
+   # Series
+   unsorted_df['three'].sort_index()
+
+By Values
+~~~~~~~~~
+
+The :meth:`Series.sort_values` and :meth:`DataFrame.sort_values` are the entry points for **value** sorting (that is the values in a column or row).
+:meth:`DataFrame.sort_values` can accept an optional ``by`` argument for ``axis=0``
 which will use an arbitrary vector or a column name of the DataFrame to
 determine the sort order:
 
 .. ipython:: python
 
    df1 = pd.DataFrame({'one':[2,1,1,1],'two':[1,3,2,4],'three':[5,4,3,2]})
-   df1.sort_index(by='two')
+   df1.sort_values(by='two')
 
 The ``by`` argument can take a list of column names, e.g.:
 
 .. ipython:: python
 
    df1[['one', 'two', 'three']].sort_index(by=['one','two'])
 
-Series has the method :meth:`~Series.order` (analogous to `R's order function
-<http://stat.ethz.ch/R-manual/R-patched/library/base/html/order.html>`__) which
-sorts by value, with special treatment of NA values via the ``na_position``
+These methods have special treatment of NA values via the ``na_position``
 argument:
 
 .. ipython:: python
@@ -1459,11 +1476,11 @@ argument:
    s.order()
    s.order(na_position='first')
 
-.. note::
 
-   :meth:`Series.sort` sorts a Series by value in-place. This is to provide
-   compatibility with NumPy methods which expect the ``ndarray.sort``
-   behavior. :meth:`Series.order` returns a copy of the sorted data.
+.. _basics.searchsorted:
+
+searchsorted
+~~~~~~~~~~~~
 
 Series has the :meth:`~Series.searchsorted` method, which works similar to
 :meth:`numpy.ndarray.searchsorted`.
@@ -1493,7 +1510,7 @@ faster than sorting the entire Series and calling ``head(n)`` on the result.
 
    s = pd.Series(np.random.permutation(10))
    s
-   s.order()
+   s.sort_values()
    s.nsmallest(3)
    s.nlargest(3)
 

diff --git a/doc/source/whatsnew/v0.17.0.txt b/doc/source/whatsnew/v0.17.0.txt
@@ -14,6 +14,7 @@ users upgrade to this version.
 Highlights include:
 
 - Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`
+- The sorting API has been revamped to remove some long-time inconsistencies, see :ref:`here <whatsnew_0170.api_breaking.sorting>`
 - The default for ``to_datetime`` will now be to ``raise`` when presented with unparseable formats,
   previously this would return the original input, see :ref:`here <whatsnew_0170.api_breaking.to_datetime>`
 - The default for ``dropna`` in ``HDFStore`` has changed to ``False``, to store by default all rows even
@@ -187,6 +188,65 @@ Other enhancements
 Backwards incompatible API changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+.. _whatsnew_0170.api_breaking.sorting:
+
+Changes to sorting API
+^^^^^^^^^^^^^^^^^^^^^^
+
+The sorting API has had some longtime inconsistencies. (:issue:`9816`,:issue:`8239`).
+
+Here is a summary of the **prior** to 0.17.0 API
+
+- ``Series.sort`` is **INPLACE** while ``DataFrame.sort`` returns a new object.
+- ``Series.order`` returned a new object
+- It was possible to use ``Series/DataFrame.sort_index`` to sort by **values** by passing the ``by`` keyword.
+- ``Series/DataFrame.sortlevel`` worked only on a ``MultiIndex`` for sorting by index.
+
+To address these issues, we have revamped the API:
+
+- We have introduced a new method, :meth:`DataFrame.sort_values`, which is the merger of ``DataFrame.sort()``, ``Series.sort()``,
+  and ``Series.order``, to handle sorting of **values**.
+- The existing method ``Series.sort()`` has been deprecated and will be removed in a
+  future version of pandas.
+- The ``by`` argument of ``DataFrame.sort_index()`` has been deprecated and will be removed in a future version of pandas.
+- The methods ``DataFrame.sort()``, ``Series.order()``, will not be recommended to use and will carry a deprecation warning
+  in the doc-string.
+- The existing method ``.sort_index()`` will gain the ``level`` keyword to enable level sorting.
+
+We now have two distinct and non-overlapping methods of sorting. A ``*`` marks items that
+will show a ``FutureWarning``.
+
+To sort by the **values**:
+
+=================================     ====================================
+Previous                              Replacement
+=================================     ====================================
+\*``Series.order()``                   ``Series.sort_values()``
+\*``Series.sort()``                    ``Series.sort_values(inplace=True)``
+\*``DataFrame.sort(columns=...)``      ``DataFrame.sort_values(by=...)``
+=================================     ====================================
+
+To sort by the **index**:
+
+=================================     ====================================
+Previous                              Equivalent
+=================================     ====================================
+``Series.sort_index()``               ``Series.sort_index()``
+``Series.sortlevel(level=...)``       ``Series.sort_index(level=...``)
+``DataFrame.sort_index()``            ``DataFrame.sort_index()``
+``DataFrame.sortlevel(level=...)``    ``DataFrame.sort_index(level=...)``
+\*``DataFrame.sort()``                 ``DataFrame.sort_index()``
+==================================    ====================================
+
+We have also deprecated and changed similar methods in two Series-like classes, ``Index`` and ``Categorical``.
+
+==================================    ====================================
+Previous                              Replacement
+==================================    ====================================
+\*``Index.order()``                     ``Index.sort_values()``
+\*``Categorical.order()``               ``Categorical.sort_values``
+==================================    ====================================
+
 .. _whatsnew_0170.api_breaking.to_datetime:
 
 Changes to to_datetime and to_timedelta
@@ -570,7 +630,7 @@ Removal of prior version deprecations/changes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 - Remove use of some deprecated numpy comparison operations, mainly in tests. (:issue:`10569`)
-
+- Removal of ``na_last`` parameters from ``Series.order()`` and ``Series.sort()``, in favor of ``na_position``, xref (:issue:`5231`)
 
 .. _whatsnew_0170.performance:
 

diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
@@ -262,9 +262,7 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
             result.index = bins[:-1]
 
     if sort:
-        result.sort()
-        if not ascending:
-            result = result[::-1]
+        result = result.sort_values(ascending=ascending)
 
     if normalize:
         result = result / float(values.size)
@@ -497,7 +495,7 @@ def select_n_slow(dropped, n, take_last, method):
     reverse_it = take_last or method == 'nlargest'
     ascending = method == 'nsmallest'
     slc = np.s_[::-1] if reverse_it else np.s_[:]
-    return dropped[slc].order(ascending=ascending).head(n)
+    return dropped[slc].sort_values(ascending=ascending).head(n)
 
 
 _select_methods = {'nsmallest': nsmallest, 'nlargest': nlargest}

diff --git a/pandas/core/categorical.py b/pandas/core/categorical.py
@@ -1083,7 +1083,7 @@ def argsort(self, ascending=True, **kwargs):
             result = result[::-1]
         return result
 
-    def order(self, inplace=False, ascending=True, na_position='last'):
+    def sort_values(self, inplace=False, ascending=True, na_position='last'):
         """ Sorts the Category by category value returning a new Categorical by default.
 
         Only ordered Categoricals can be sorted!
@@ -1092,10 +1092,10 @@ def order(self, inplace=False, ascending=True, na_position='last'):
 
         Parameters
         ----------
-        ascending : boolean, default True
-            Sort ascending. Passing False sorts descending
         inplace : boolean, default False
             Do operation in place.
+        ascending : boolean, default True
+            Sort ascending. Passing False sorts descending
         na_position : {'first', 'last'} (optional, default='last')
             'first' puts NaNs at the beginning
             'last' puts NaNs at the end
@@ -1139,6 +1139,37 @@ def order(self, inplace=False, ascending=True, na_position='last'):
             return Categorical(values=codes,categories=self.categories, ordered=self.ordered,
                                fastpath=True)
 
+    def order(self, inplace=False, ascending=True, na_position='last'):
+        """
+        DEPRECATED: use :meth:`Categorical.sort_values`
+
+        Sorts the Category by category value returning a new Categorical by default.
+
+        Only ordered Categoricals can be sorted!
+
+        Categorical.sort is the equivalent but sorts the Categorical inplace.
+
+        Parameters
+        ----------
+        inplace : boolean, default False
+            Do operation in place.
+        ascending : boolean, default True
+            Sort ascending. Passing False sorts descending
+        na_position : {'first', 'last'} (optional, default='last')
+            'first' puts NaNs at the beginning
+            'last' puts NaNs at the end
+
+        Returns
+        -------
+        y : Category or None
+
+        See Also
+        --------
+        Category.sort
+        """
+        warn("order is deprecated, use sort_values(...)",
+             FutureWarning, stacklevel=2)
+        return self.sort_values(inplace=inplace, ascending=ascending, na_position=na_position)
 
     def sort(self, inplace=True, ascending=True, na_position='last'):
         """ Sorts the Category inplace by category value.
@@ -1163,10 +1194,10 @@ def sort(self, inplace=True, ascending=True, na_position='last'):
 
         See Also
         --------
-        Category.order
+        Category.sort_values
         """
-        return self.order(inplace=inplace, ascending=ascending,
-                na_position=na_position)
+        return self.sort_values(inplace=inplace, ascending=ascending,
+                                na_position=na_position)
 
     def ravel(self, order='C'):
         """ Return a flattened (numpy) array.

diff --git a/pandas/core/common.py b/pandas/core/common.py
@@ -2155,6 +2155,9 @@ def _mut_exclusive(**kwargs):
         return val2
 
 
+def _not_none(*args):
+    return (arg for arg in args if arg is not None)
+
 def _any_none(*args):
     for arg in args:
         if arg is None: