Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: some 0.24.0 whatsnew clean-up #24911

Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 58 additions & 52 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,34 @@ What's New in 0.24.0 (January XX, 2019)

{{ header }}

These are the changes in pandas 0.24.0. See :ref:`release` for a full changelog
including other versions of pandas.

Enhancements
~~~~~~~~~~~~
This is a major release from 0.23.4 and includes a number of API changes, new
features, enhancements, and performance improvements along with a large number
of bug fixes.

Highlights include
Highlights include:

* :ref:`Optional Nullable Integer Support <whatsnew_0240.enhancements.intna>`
* :ref:`Optional Integer NA Support <whatsnew_0240.enhancements.intna>`
* :ref:`New APIs for accessing the array backing a Series or Index <whatsnew_0240.values_api>`
* :ref:`A new top-level method for creating arrays <whatsnew_0240.enhancements.array>`
* :ref:`Store Interval and Period data in a Series or DataFrame <whatsnew_0240.enhancements.interval>`
* :ref:`Support for joining on two MultiIndexes <whatsnew_0240.enhancements.join_with_two_multiindexes>`


Check the :ref:`API Changes <whatsnew_0240.api_breaking>` and :ref:`deprecations <whatsnew_0240.deprecations>` before updating.

These are the changes in pandas 0.24.0. See :ref:`release` for a full changelog
including other versions of pandas.


Enhancements
~~~~~~~~~~~~

.. _whatsnew_0240.enhancements.intna:

Optional Integer NA Support
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled through the use of :ref:`extension types <extending.extension-types>`.
Here is an example of the usage.

We can construct a ``Series`` with the specified dtype. The dtype string ``Int64`` is a pandas ``ExtensionDtype``. Specifying a list or array using the traditional missing value
marker of ``np.nan`` will infer to integer dtype. The display of the ``Series`` will also use the ``NaN`` to indicate missing values in string outputs. (:issue:`20700`, :issue:`20747`, :issue:`22441`, :issue:`21789`, :issue:`22346`)
Expand Down Expand Up @@ -60,7 +67,7 @@ Operations on these dtypes will propagate ``NaN`` as other pandas operations.
# coerce when needed
s + 0.01

These dtypes can operate as part of of ``DataFrame``.
These dtypes can operate as part of a ``DataFrame``.

.. ipython:: python

Expand All @@ -69,7 +76,7 @@ These dtypes can operate as part of of ``DataFrame``.
df.dtypes


These dtypes can be merged & reshaped & casted.
These dtypes can be merged, reshaped and casted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the Oxford comma (instead of none). Anyone else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole whatsnew file is probably full of grammatical questionable sentences, but OK, since I am updating it anyway, will edit :-)


.. ipython:: python

Expand Down Expand Up @@ -112,6 +119,7 @@ a new ndarray of period objects each time.

.. ipython:: python

idx.values
id(idx.values)
id(idx.values)

Expand All @@ -124,7 +132,7 @@ If you need an actual NumPy array, use :meth:`Series.to_numpy` or :meth:`Index.t

For Series and Indexes backed by normal NumPy arrays, :attr:`Series.array` will return a
new :class:`arrays.PandasArray`, which is a thin (no-copy) wrapper around a
:class:`numpy.ndarray`. :class:`arrays.PandasArray` isn't especially useful on its own,
:class:`numpy.ndarray`. :class:`~arrays.PandasArray` isn't especially useful on its own,
but it does provide the same interface as any extension array defined in pandas or by
a third-party library.

Expand All @@ -142,14 +150,13 @@ See :ref:`Dtypes <basics.dtypes>` and :ref:`Attributes and Underlying Data <basi

.. _whatsnew_0240.enhancements.array:

Array
^^^^^
``pandas.array``: a new top-level method for creating arrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A new top-level method :func:`array` has been added for creating 1-dimensional arrays (:issue:`22860`).
This can be used to create any :ref:`extension array <extending.extension-types>`, including
extension arrays registered by :ref:`3rd party libraries <ecosystem.extensions>`. See

See :ref:`Dtypes <basics.dtypes>` for more on extension arrays.
extension arrays registered by :ref:`3rd party libraries <ecosystem.extensions>`.
See the :ref:`dtypes docs <basics.dtypes>` for more on extension arrays.

.. ipython:: python

Expand All @@ -158,15 +165,15 @@ See :ref:`Dtypes <basics.dtypes>` for more on extension arrays.

Passing data for which there isn't dedicated extension type (e.g. float, integer, etc.)
will return a new :class:`arrays.PandasArray`, which is just a thin (no-copy)
wrapper around a :class:`numpy.ndarray` that satisfies the extension array interface.
wrapper around a :class:`numpy.ndarray` that satisfies the pandas extension array interface.

.. ipython:: python

pd.array([1, 2, 3])

On their own, a :class:`arrays.PandasArray` isn't a very useful object.
On their own, a :class:`~arrays.PandasArray` isn't a very useful object.
But if you need write low-level code that works generically for any
:class:`~pandas.api.extensions.ExtensionArray`, :class:`arrays.PandasArray`
:class:`~pandas.api.extensions.ExtensionArray`, :class:`~arrays.PandasArray`
satisfies that need.

Notice that by default, if no ``dtype`` is specified, the dtype of the returned
Expand Down Expand Up @@ -197,7 +204,7 @@ For periods:

.. ipython:: python

pser = pd.Series(pd.date_range("2000", freq="D", periods=5))
pser = pd.Series(pd.period_range("2000", freq="D", periods=5))
pser
pser.dtype

Expand Down Expand Up @@ -259,23 +266,6 @@ For earlier versions this can be done using the following.
pd.merge(left.reset_index(), right.reset_index(),
on=['key'], how='inner').set_index(['key', 'X', 'Y'])


.. _whatsnew_0240.enhancements.extension_array_operators:

``ExtensionArray`` operator support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison
operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``:

1. Define each of the operators on your ``ExtensionArray`` subclass.
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ``ExtensionArray``.

See the :ref:`ExtensionArray Operator Support
<extending.extension.operator>` documentation section for details on both
ways of adding operator support.

.. _whatsnew_0240.enhancements.read_html:

``read_html`` Enhancements
Expand Down Expand Up @@ -335,15 +325,15 @@ convenient way to apply users' predefined styling functions, and can help reduce
df.style.pipe(format_and_align).set_caption('Summary of results.')

Similar methods already exist for other classes in pandas, including :meth:`DataFrame.pipe`,
:meth:`pandas.core.groupby.GroupBy.pipe`, and :meth:`pandas.core.resample.Resampler.pipe`.
:meth:`GroupBy.pipe() <pandas.core.groupby.GroupBy.pipe>`, and :meth:`Resampler.pipe() <pandas.core.resample.Resampler.pipe>`.

.. _whatsnew_0240.enhancements.rename_axis:

Renaming names in a MultiIndex
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`DataFrame.rename_axis` now supports ``index`` and ``columns`` arguments
and :func:`Series.rename_axis` supports ``index`` argument (:issue:`19978`)
and :func:`Series.rename_axis` supports ``index`` argument (:issue:`19978`).

This change allows a dictionary to be passed so that some of the names
of a ``MultiIndex`` can be changed.
Expand Down Expand Up @@ -371,13 +361,13 @@ Other Enhancements
- :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing
the user to override the engine's default behavior to include or omit the
dataframe's indexes from the resulting Parquet file. (:issue:`20768`)
- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`)
- :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`)
- :func:`DataFrame.to_string` now accepts ``decimal`` as an argument, allowing the user to specify which decimal separator should be used in the output. (:issue:`23614`)
- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`)
- :func:`DataFrame.to_html` now accepts ``render_links`` as an argument, allowing the user to generate HTML with links to any URLs that appear in the DataFrame.
See the :ref:`section on writing HTML <io.html>` in the IO docs for example usage. (:issue:`2679`)
- :func:`pandas.read_csv` now supports pandas extension types as an argument to ``dtype``, allowing the user to use pandas extension types when reading CSVs. (:issue:`23228`)
- :meth:`DataFrame.shift` :meth:`Series.shift`, :meth:`ExtensionArray.shift`, :meth:`SparseArray.shift`, :meth:`Period.shift`, :meth:`GroupBy.shift`, :meth:`Categorical.shift`, :meth:`NDFrame.shift` and :meth:`Block.shift` now accept `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`)
- The :meth:`~DataFrame.shift` method now accepts `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`)
- :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`)
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether ``NaN``/``NaT`` values should be considered (:issue:`17534`)
- :func:`DataFrame.to_csv` and :func:`Series.to_csv` now support the ``compression`` keyword when a file handle is passed. (:issue:`21227`)
Expand All @@ -399,18 +389,19 @@ Other Enhancements
The default compression for ``to_csv``, ``to_json``, and ``to_pickle`` methods has been updated to ``'infer'`` (:issue:`22004`).
- :meth:`DataFrame.to_sql` now supports writing ``TIMESTAMP WITH TIME ZONE`` types for supported databases. For databases that don't support timezones, datetime data will be stored as timezone unaware local timestamps. See the :ref:`io.sql_datetime_data` for implications (:issue:`9086`).
- :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`)
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`)
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` objects in the constructor (:issue:`2193`)
- :class:`DatetimeIndex` has gained the :attr:`DatetimeIndex.timetz` attribute. This returns the local time with timezone information. (:issue:`21358`)
- :meth:`Timestamp.round`, :meth:`Timestamp.ceil`, and :meth:`Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
- :meth:`Timestamp.round`, :meth:`Timestamp.ceil`, and :meth:`Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`)
- :class:`pandas.core.resample.Resampler` now is iterable like :class:`pandas.core.groupby.GroupBy` (:issue:`15314`).
- :meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, and :meth:`~Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp`
now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
and a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`)
- The result of :meth:`~DataFrame.resample` is now iterable similar to ``groupby()`` (:issue:`15314`).
- :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`pandas.core.resample.Resampler.quantile` (:issue:`15023`).
- :meth:`DataFrame.resample` and :meth:`Series.resample` with a :class:`PeriodIndex` will now respect the ``base`` argument in the same fashion as with a :class:`DatetimeIndex`. (:issue:`23882`)
- :meth:`pandas.api.types.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``,
all instances of ``set`` will not be considered "list-like" anymore (:issue:`23061`)
- :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`).
- :meth:`Categorical.from_codes` now can take a ``dtype`` parameter as an alternative to passing ``categories`` and ``ordered`` (:issue:`24398`).
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- New attribute ``__git_version__`` will return git commit sha of current build (:issue:`21295`).
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :func:`read_fwf` now accepts keyword ``infer_nrows`` (:issue:`15138`).
Expand All @@ -426,7 +417,7 @@ Other Enhancements
- :class:`IntervalIndex` has gained the :attr:`~IntervalIndex.is_overlapping` attribute to indicate if the ``IntervalIndex`` contains any overlapping intervals (:issue:`23309`)
- :func:`pandas.DataFrame.to_sql` has gained the ``method`` argument to control SQL insertion clause. See the :ref:`insertion method <io.sql.method>` section in the documentation. (:issue:`8953`)
- :meth:`DataFrame.corrwith` now supports Spearman's rank correlation, Kendall's tau as well as callable correlation methods. (:issue:`21925`)
- :meth:`DataFrame.to_json`, :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and :meth:`DataFrame.to_XXX` etc. now support tilde(~) in path argument. (:issue:`23473`)
- :meth:`DataFrame.to_json`, :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and other export methods now support tilde(~) in path argument. (:issue:`23473`)

.. _whatsnew_0240.api_breaking:

Expand All @@ -438,8 +429,8 @@ Pandas 0.24.0 includes a number of API breaking changes.

.. _whatsnew_0240.api_breaking.deps:

Dependencies have increased minimum versions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Increased minimum versions for dependencies
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We have updated our minimum supported versions of dependencies (:issue:`21242`, :issue:`18742`, :issue:`23774`, :issue:`24767`).
If installed, we now require:
Expand Down Expand Up @@ -1167,8 +1158,8 @@ Other API Changes

.. _whatsnew_0240.api.extension:

ExtensionType Changes
~~~~~~~~~~~~~~~~~~~~~
Extension Type Changes
~~~~~~~~~~~~~~~~~~~~~~

**Equality and Hashability**

Expand All @@ -1177,7 +1168,7 @@ a default ``__eq__`` and ``__hash__``. If you have a parametrized dtype, you sho
update the ``ExtensionDtype._metadata`` tuple to match the signature of your
``__init__`` method. See :class:`pandas.api.extensions.ExtensionDtype` for more (:issue:`22476`).
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved

**Reshaping changes**
**New and changed methods**

- :meth:`~pandas.api.types.ExtensionArray.dropna` has been added (:issue:`21185`)
- :meth:`~pandas.api.types.ExtensionArray.repeat` has been added (:issue:`24349`)
Expand All @@ -1195,6 +1186,21 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
- Added :meth:`pandas.api.types.register_extension_dtype` to register an extension type with pandas (:issue:`22664`)
- Updated the ``.type`` attribute for ``PeriodDtype``, ``DatetimeTZDtype``, and ``IntervalDtype`` to be instances of the dtype (``Period``, ``Timestamp``, and ``Interval`` respectively) (:issue:`22938`)

.. _whatsnew_0240.enhancements.extension_array_operators:

**Operator support**

A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison
operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``:

1. Define each of the operators on your ``ExtensionArray`` subclass.
2. Use an operator implementation from pandas that depends on operators that are already defined
on the underlying elements (scalars) of the ``ExtensionArray``.

See the :ref:`ExtensionArray Operator Support
<extending.extension.operator>` documentation section for details on both
ways of adding operator support.

**Other changes**

- A default repr for :class:`pandas.api.extensions.ExtensionArray` is now provided (:issue:`23601`).
Expand Down