Skip to content

Commit

Permalink
DOC: some 0.24.0 whatsnew clean-up (pandas-dev#24911)
Browse files Browse the repository at this point in the history
  • Loading branch information
jorisvandenbossche authored and Pingviinituutti committed Feb 28, 2019
1 parent afe508e commit 1a19e46
Showing 1 changed file with 62 additions and 54 deletions.
116 changes: 62 additions & 54 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,34 @@ What's New in 0.24.0 (January XX, 2019)

{{ header }}

These are the changes in pandas 0.24.0. See :ref:`release` for a full changelog
including other versions of pandas.
This is a major release from 0.23.4 and includes a number of API changes, new
features, enhancements, and performance improvements along with a large number
of bug fixes.

Enhancements
~~~~~~~~~~~~
Highlights include:

Highlights include

* :ref:`Optional Nullable Integer Support <whatsnew_0240.enhancements.intna>`
* :ref:`Optional Integer NA Support <whatsnew_0240.enhancements.intna>`
* :ref:`New APIs for accessing the array backing a Series or Index <whatsnew_0240.values_api>`
* :ref:`A new top-level method for creating arrays <whatsnew_0240.enhancements.array>`
* :ref:`Store Interval and Period data in a Series or DataFrame <whatsnew_0240.enhancements.interval>`
* :ref:`Support for joining on two MultiIndexes <whatsnew_0240.enhancements.join_with_two_multiindexes>`


Check the :ref:`API Changes <whatsnew_0240.api_breaking>` and :ref:`deprecations <whatsnew_0240.deprecations>` before updating.

These are the changes in pandas 0.24.0. See :ref:`release` for a full changelog
including other versions of pandas.


Enhancements
~~~~~~~~~~~~

.. _whatsnew_0240.enhancements.intna:

Optional Integer NA Support
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled through the use of :ref:`extension types <extending.extension-types>`.
Here is an example of the usage.

.. note::

Expand Down Expand Up @@ -65,7 +72,7 @@ Operations on these dtypes will propagate ``NaN`` as other pandas operations.
# coerce when needed
s + 0.01
These dtypes can operate as part of of ``DataFrame``.
These dtypes can operate as part of a ``DataFrame``.

.. ipython:: python
Expand All @@ -74,7 +81,7 @@ These dtypes can operate as part of of ``DataFrame``.
df.dtypes
These dtypes can be merged & reshaped & casted.
These dtypes can be merged, reshaped, and casted.

.. ipython:: python
Expand Down Expand Up @@ -117,6 +124,7 @@ a new ndarray of period objects each time.

.. ipython:: python
idx.values
id(idx.values)
id(idx.values)
Expand All @@ -129,7 +137,7 @@ If you need an actual NumPy array, use :meth:`Series.to_numpy` or :meth:`Index.t
For Series and Indexes backed by normal NumPy arrays, :attr:`Series.array` will return a
new :class:`arrays.PandasArray`, which is a thin (no-copy) wrapper around a
:class:`numpy.ndarray`. :class:`arrays.PandasArray` isn't especially useful on its own,
:class:`numpy.ndarray`. :class:`~arrays.PandasArray` isn't especially useful on its own,
but it does provide the same interface as any extension array defined in pandas or by
a third-party library.

Expand All @@ -147,14 +155,13 @@ See :ref:`Dtypes <basics.dtypes>` and :ref:`Attributes and Underlying Data <basi

.. _whatsnew_0240.enhancements.array:

Array
^^^^^
``pandas.array``: a new top-level method for creating arrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A new top-level method :func:`array` has been added for creating 1-dimensional arrays (:issue:`22860`).
This can be used to create any :ref:`extension array <extending.extension-types>`, including
extension arrays registered by :ref:`3rd party libraries <ecosystem.extensions>`. See

See :ref:`Dtypes <basics.dtypes>` for more on extension arrays.
extension arrays registered by :ref:`3rd party libraries <ecosystem.extensions>`.
See the :ref:`dtypes docs <basics.dtypes>` for more on extension arrays.

.. ipython:: python
Expand All @@ -163,15 +170,15 @@ See :ref:`Dtypes <basics.dtypes>` for more on extension arrays.
Passing data for which there isn't dedicated extension type (e.g. float, integer, etc.)
will return a new :class:`arrays.PandasArray`, which is just a thin (no-copy)
wrapper around a :class:`numpy.ndarray` that satisfies the extension array interface.
wrapper around a :class:`numpy.ndarray` that satisfies the pandas extension array interface.

.. ipython:: python
pd.array([1, 2, 3])
On their own, a :class:`arrays.PandasArray` isn't a very useful object.
On their own, a :class:`~arrays.PandasArray` isn't a very useful object.
But if you need write low-level code that works generically for any
:class:`~pandas.api.extensions.ExtensionArray`, :class:`arrays.PandasArray`
:class:`~pandas.api.extensions.ExtensionArray`, :class:`~arrays.PandasArray`
satisfies that need.

Notice that by default, if no ``dtype`` is specified, the dtype of the returned
Expand Down Expand Up @@ -202,7 +209,7 @@ For periods:

.. ipython:: python
pser = pd.Series(pd.date_range("2000", freq="D", periods=5))
pser = pd.Series(pd.period_range("2000", freq="D", periods=5))
pser
pser.dtype
Expand Down Expand Up @@ -267,23 +274,6 @@ For earlier versions this can be done using the following.
pd.merge(left.reset_index(), right.reset_index(),
on=['key'], how='inner').set_index(['key', 'X', 'Y'])
.. _whatsnew_0240.enhancements.extension_array_operators:

``ExtensionArray`` operator support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison
operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``:

1. Define each of the operators on your ``ExtensionArray`` subclass.
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ``ExtensionArray``.

See the :ref:`ExtensionArray Operator Support
<extending.extension.operator>` documentation section for details on both
ways of adding operator support.

.. _whatsnew_0240.enhancements.read_html:

``read_html`` Enhancements
Expand Down Expand Up @@ -343,15 +333,15 @@ convenient way to apply users' predefined styling functions, and can help reduce
df.style.pipe(format_and_align).set_caption('Summary of results.')
Similar methods already exist for other classes in pandas, including :meth:`DataFrame.pipe`,
:meth:`pandas.core.groupby.GroupBy.pipe`, and :meth:`pandas.core.resample.Resampler.pipe`.
:meth:`GroupBy.pipe() <pandas.core.groupby.GroupBy.pipe>`, and :meth:`Resampler.pipe() <pandas.core.resample.Resampler.pipe>`.

.. _whatsnew_0240.enhancements.rename_axis:

Renaming names in a MultiIndex
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`DataFrame.rename_axis` now supports ``index`` and ``columns`` arguments
and :func:`Series.rename_axis` supports ``index`` argument (:issue:`19978`)
and :func:`Series.rename_axis` supports ``index`` argument (:issue:`19978`).

This change allows a dictionary to be passed so that some of the names
of a ``MultiIndex`` can be changed.
Expand Down Expand Up @@ -379,13 +369,13 @@ Other Enhancements
- :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing
the user to override the engine's default behavior to include or omit the
dataframe's indexes from the resulting Parquet file. (:issue:`20768`)
- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`)
- :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`)
- :func:`DataFrame.to_string` now accepts ``decimal`` as an argument, allowing the user to specify which decimal separator should be used in the output. (:issue:`23614`)
- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`)
- :func:`DataFrame.to_html` now accepts ``render_links`` as an argument, allowing the user to generate HTML with links to any URLs that appear in the DataFrame.
See the :ref:`section on writing HTML <io.html>` in the IO docs for example usage. (:issue:`2679`)
- :func:`pandas.read_csv` now supports pandas extension types as an argument to ``dtype``, allowing the user to use pandas extension types when reading CSVs. (:issue:`23228`)
- :meth:`DataFrame.shift` :meth:`Series.shift`, :meth:`ExtensionArray.shift`, :meth:`SparseArray.shift`, :meth:`Period.shift`, :meth:`GroupBy.shift`, :meth:`Categorical.shift`, :meth:`NDFrame.shift` and :meth:`Block.shift` now accept `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`)
- The :meth:`~DataFrame.shift` method now accepts `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`)
- :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`)
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether ``NaN``/``NaT`` values should be considered (:issue:`17534`)
- :func:`DataFrame.to_csv` and :func:`Series.to_csv` now support the ``compression`` keyword when a file handle is passed. (:issue:`21227`)
Expand All @@ -407,18 +397,19 @@ Other Enhancements
The default compression for ``to_csv``, ``to_json``, and ``to_pickle`` methods has been updated to ``'infer'`` (:issue:`22004`).
- :meth:`DataFrame.to_sql` now supports writing ``TIMESTAMP WITH TIME ZONE`` types for supported databases. For databases that don't support timezones, datetime data will be stored as timezone unaware local timestamps. See the :ref:`io.sql_datetime_data` for implications (:issue:`9086`).
- :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`)
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`)
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` objects in the constructor (:issue:`2193`)
- :class:`DatetimeIndex` has gained the :attr:`DatetimeIndex.timetz` attribute. This returns the local time with timezone information. (:issue:`21358`)
- :meth:`Timestamp.round`, :meth:`Timestamp.ceil`, and :meth:`Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
- :meth:`Timestamp.round`, :meth:`Timestamp.ceil`, and :meth:`Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`)
- :class:`pandas.core.resample.Resampler` now is iterable like :class:`pandas.core.groupby.GroupBy` (:issue:`15314`).
- :meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, and :meth:`~Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp`
now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
and a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`)
- The result of :meth:`~DataFrame.resample` is now iterable similar to ``groupby()`` (:issue:`15314`).
- :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`pandas.core.resample.Resampler.quantile` (:issue:`15023`).
- :meth:`DataFrame.resample` and :meth:`Series.resample` with a :class:`PeriodIndex` will now respect the ``base`` argument in the same fashion as with a :class:`DatetimeIndex`. (:issue:`23882`)
- :meth:`pandas.api.types.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``,
all instances of ``set`` will not be considered "list-like" anymore (:issue:`23061`)
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
- :meth:`Categorical.from_codes` now can take a ``dtype`` parameter as an alternative to passing ``categories`` and ``ordered`` (:issue:`24398`).
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- New attribute ``__git_version__`` will return git commit sha of current build (:issue:`21295`).
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :func:`read_fwf` now accepts keyword ``infer_nrows`` (:issue:`15138`).
Expand All @@ -433,7 +424,7 @@ Other Enhancements
- :class:`IntervalIndex` has gained the :attr:`~IntervalIndex.is_overlapping` attribute to indicate if the ``IntervalIndex`` contains any overlapping intervals (:issue:`23309`)
- :func:`pandas.DataFrame.to_sql` has gained the ``method`` argument to control SQL insertion clause. See the :ref:`insertion method <io.sql.method>` section in the documentation. (:issue:`8953`)
- :meth:`DataFrame.corrwith` now supports Spearman's rank correlation, Kendall's tau as well as callable correlation methods. (:issue:`21925`)
- :meth:`DataFrame.to_json`, :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and :meth:`DataFrame.to_XXX` etc. now support tilde(~) in path argument. (:issue:`23473`)
- :meth:`DataFrame.to_json`, :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and other export methods now support tilde(~) in path argument. (:issue:`23473`)

.. _whatsnew_0240.api_breaking:

Expand All @@ -445,8 +436,8 @@ Pandas 0.24.0 includes a number of API breaking changes.

.. _whatsnew_0240.api_breaking.deps:

Dependencies have increased minimum versions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Increased minimum versions for dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We have updated our minimum supported versions of dependencies (:issue:`21242`, :issue:`18742`, :issue:`23774`, :issue:`24767`).
If installed, we now require:
Expand Down Expand Up @@ -1174,17 +1165,19 @@ Other API Changes

.. _whatsnew_0240.api.extension:

ExtensionType Changes
~~~~~~~~~~~~~~~~~~~~~
Extension Type Changes
~~~~~~~~~~~~~~~~~~~~~~

**Equality and Hashability**

Pandas now requires that extension dtypes be hashable. The base class implements
Pandas now requires that extension dtypes be hashable (i.e. the respective
``ExtensionDtype`` objects; hashability is not a requirement for the values
of the corresponding ``ExtensionArray``). The base class implements
a default ``__eq__`` and ``__hash__``. If you have a parametrized dtype, you should
update the ``ExtensionDtype._metadata`` tuple to match the signature of your
``__init__`` method. See :class:`pandas.api.extensions.ExtensionDtype` for more (:issue:`22476`).

**Reshaping changes**
**New and changed methods**

- :meth:`~pandas.api.types.ExtensionArray.dropna` has been added (:issue:`21185`)
- :meth:`~pandas.api.types.ExtensionArray.repeat` has been added (:issue:`24349`)
Expand All @@ -1202,9 +1195,25 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
- Added :meth:`pandas.api.types.register_extension_dtype` to register an extension type with pandas (:issue:`22664`)
- Updated the ``.type`` attribute for ``PeriodDtype``, ``DatetimeTZDtype``, and ``IntervalDtype`` to be instances of the dtype (``Period``, ``Timestamp``, and ``Interval`` respectively) (:issue:`22938`)

.. _whatsnew_0240.enhancements.extension_array_operators:

**Operator support**

A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison
operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``:

1. Define each of the operators on your ``ExtensionArray`` subclass.
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ``ExtensionArray``.

See the :ref:`ExtensionArray Operator Support
<extending.extension.operator>` documentation section for details on both
ways of adding operator support.

**Other changes**

- A default repr for :class:`pandas.api.extensions.ExtensionArray` is now provided (:issue:`23601`).
- :meth:`ExtensionArray._formatting_values` is deprecated. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`)
- An ``ExtensionArray`` with a boolean dtype now works correctly as a boolean indexer. :meth:`pandas.api.types.is_bool_dtype` now properly considers them boolean (:issue:`22326`)

**Bug Fixes**
Expand Down Expand Up @@ -1253,7 +1262,6 @@ Deprecations
- The methods :meth:`DataFrame.update` and :meth:`Panel.update` have deprecated the ``raise_conflict=False|True`` keyword in favor of ``errors='ignore'|'raise'`` (:issue:`23585`)
- The methods :meth:`Series.str.partition` and :meth:`Series.str.rpartition` have deprecated the ``pat`` keyword in favor of ``sep`` (:issue:`22676`)
- Deprecated the ``nthreads`` keyword of :func:`pandas.read_feather` in favor of ``use_threads`` to reflect the changes in ``pyarrow>=0.11.0``. (:issue:`23053`)
- :meth:`ExtensionArray._formatting_values` is deprecated. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`)
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
- Constructing a :class:`TimedeltaIndex` from data with ``datetime64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23539`)
- Constructing a :class:`DatetimeIndex` from data with ``timedelta64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23675`)
Expand Down

0 comments on commit 1a19e46

Please sign in to comment.