diff --git a/doc/source/whatsnew/v0.24.0.rst b/doc/source/whatsnew/v0.24.0.rst index 9198c610f0f44..b0d3fead40ef0 100644 --- a/doc/source/whatsnew/v0.24.0.rst +++ b/doc/source/whatsnew/v0.24.0.rst @@ -10,27 +10,34 @@ What's New in 0.24.0 (January XX, 2019) {{ header }} -These are the changes in pandas 0.24.0. See :ref:`release` for a full changelog -including other versions of pandas. +This is a major release from 0.23.4 and includes a number of API changes, new +features, enhancements, and performance improvements along with a large number +of bug fixes. -Enhancements -~~~~~~~~~~~~ +Highlights include: -Highlights include - -* :ref:`Optional Nullable Integer Support ` +* :ref:`Optional Integer NA Support ` * :ref:`New APIs for accessing the array backing a Series or Index ` * :ref:`A new top-level method for creating arrays ` * :ref:`Store Interval and Period data in a Series or DataFrame ` * :ref:`Support for joining on two MultiIndexes ` + +Check the :ref:`API Changes ` and :ref:`deprecations ` before updating. + +These are the changes in pandas 0.24.0. See :ref:`release` for a full changelog +including other versions of pandas. + + +Enhancements +~~~~~~~~~~~~ + .. _whatsnew_0240.enhancements.intna: Optional Integer NA Support ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled through the use of :ref:`extension types `. -Here is an example of the usage. We can construct a ``Series`` with the specified dtype. The dtype string ``Int64`` is a pandas ``ExtensionDtype``. Specifying a list or array using the traditional missing value marker of ``np.nan`` will infer to integer dtype. The display of the ``Series`` will also use the ``NaN`` to indicate missing values in string outputs. (:issue:`20700`, :issue:`20747`, :issue:`22441`, :issue:`21789`, :issue:`22346`) @@ -60,7 +67,7 @@ Operations on these dtypes will propagate ``NaN`` as other pandas operations. # coerce when needed s + 0.01 -These dtypes can operate as part of of ``DataFrame``. +These dtypes can operate as part of a ``DataFrame``. .. ipython:: python @@ -69,7 +76,7 @@ These dtypes can operate as part of of ``DataFrame``. df.dtypes -These dtypes can be merged & reshaped & casted. +These dtypes can be merged, reshaped, and casted. .. ipython:: python @@ -112,6 +119,7 @@ a new ndarray of period objects each time. .. ipython:: python + idx.values id(idx.values) id(idx.values) @@ -124,7 +132,7 @@ If you need an actual NumPy array, use :meth:`Series.to_numpy` or :meth:`Index.t For Series and Indexes backed by normal NumPy arrays, :attr:`Series.array` will return a new :class:`arrays.PandasArray`, which is a thin (no-copy) wrapper around a -:class:`numpy.ndarray`. :class:`arrays.PandasArray` isn't especially useful on its own, +:class:`numpy.ndarray`. :class:`~arrays.PandasArray` isn't especially useful on its own, but it does provide the same interface as any extension array defined in pandas or by a third-party library. @@ -142,14 +150,13 @@ See :ref:`Dtypes ` and :ref:`Attributes and Underlying Data `, including -extension arrays registered by :ref:`3rd party libraries `. See - -See :ref:`Dtypes ` for more on extension arrays. +extension arrays registered by :ref:`3rd party libraries `. +See the :ref:`dtypes docs ` for more on extension arrays. .. ipython:: python @@ -158,15 +165,15 @@ See :ref:`Dtypes ` for more on extension arrays. Passing data for which there isn't dedicated extension type (e.g. float, integer, etc.) will return a new :class:`arrays.PandasArray`, which is just a thin (no-copy) -wrapper around a :class:`numpy.ndarray` that satisfies the extension array interface. +wrapper around a :class:`numpy.ndarray` that satisfies the pandas extension array interface. .. ipython:: python pd.array([1, 2, 3]) -On their own, a :class:`arrays.PandasArray` isn't a very useful object. +On their own, a :class:`~arrays.PandasArray` isn't a very useful object. But if you need write low-level code that works generically for any -:class:`~pandas.api.extensions.ExtensionArray`, :class:`arrays.PandasArray` +:class:`~pandas.api.extensions.ExtensionArray`, :class:`~arrays.PandasArray` satisfies that need. Notice that by default, if no ``dtype`` is specified, the dtype of the returned @@ -197,7 +204,7 @@ For periods: .. ipython:: python - pser = pd.Series(pd.date_range("2000", freq="D", periods=5)) + pser = pd.Series(pd.period_range("2000", freq="D", periods=5)) pser pser.dtype @@ -259,23 +266,6 @@ For earlier versions this can be done using the following. pd.merge(left.reset_index(), right.reset_index(), on=['key'], how='inner').set_index(['key', 'X', 'Y']) - -.. _whatsnew_0240.enhancements.extension_array_operators: - -``ExtensionArray`` operator support -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison -operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``: - -1. Define each of the operators on your ``ExtensionArray`` subclass. -2. Use an operator implementation from pandas that depends on operators that are already defined - on the underlying elements (scalars) of the ``ExtensionArray``. - -See the :ref:`ExtensionArray Operator Support -` documentation section for details on both -ways of adding operator support. - .. _whatsnew_0240.enhancements.read_html: ``read_html`` Enhancements @@ -335,7 +325,7 @@ convenient way to apply users' predefined styling functions, and can help reduce df.style.pipe(format_and_align).set_caption('Summary of results.') Similar methods already exist for other classes in pandas, including :meth:`DataFrame.pipe`, -:meth:`pandas.core.groupby.GroupBy.pipe`, and :meth:`pandas.core.resample.Resampler.pipe`. +:meth:`GroupBy.pipe() `, and :meth:`Resampler.pipe() `. .. _whatsnew_0240.enhancements.rename_axis: @@ -343,7 +333,7 @@ Renaming names in a MultiIndex ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :func:`DataFrame.rename_axis` now supports ``index`` and ``columns`` arguments -and :func:`Series.rename_axis` supports ``index`` argument (:issue:`19978`) +and :func:`Series.rename_axis` supports ``index`` argument (:issue:`19978`). This change allows a dictionary to be passed so that some of the names of a ``MultiIndex`` can be changed. @@ -371,13 +361,13 @@ Other Enhancements - :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing the user to override the engine's default behavior to include or omit the dataframe's indexes from the resulting Parquet file. (:issue:`20768`) +- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`) - :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`) - :func:`DataFrame.to_string` now accepts ``decimal`` as an argument, allowing the user to specify which decimal separator should be used in the output. (:issue:`23614`) -- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`) - :func:`DataFrame.to_html` now accepts ``render_links`` as an argument, allowing the user to generate HTML with links to any URLs that appear in the DataFrame. See the :ref:`section on writing HTML ` in the IO docs for example usage. (:issue:`2679`) - :func:`pandas.read_csv` now supports pandas extension types as an argument to ``dtype``, allowing the user to use pandas extension types when reading CSVs. (:issue:`23228`) -- :meth:`DataFrame.shift` :meth:`Series.shift`, :meth:`ExtensionArray.shift`, :meth:`SparseArray.shift`, :meth:`Period.shift`, :meth:`GroupBy.shift`, :meth:`Categorical.shift`, :meth:`NDFrame.shift` and :meth:`Block.shift` now accept `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`) +- The :meth:`~DataFrame.shift` method now accepts `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`) - :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`) - :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether ``NaN``/``NaT`` values should be considered (:issue:`17534`) - :func:`DataFrame.to_csv` and :func:`Series.to_csv` now support the ``compression`` keyword when a file handle is passed. (:issue:`21227`) @@ -399,18 +389,19 @@ Other Enhancements The default compression for ``to_csv``, ``to_json``, and ``to_pickle`` methods has been updated to ``'infer'`` (:issue:`22004`). - :meth:`DataFrame.to_sql` now supports writing ``TIMESTAMP WITH TIME ZONE`` types for supported databases. For databases that don't support timezones, datetime data will be stored as timezone unaware local timestamps. See the :ref:`io.sql_datetime_data` for implications (:issue:`9086`). - :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`) -- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`) +- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` objects in the constructor (:issue:`2193`) - :class:`DatetimeIndex` has gained the :attr:`DatetimeIndex.timetz` attribute. This returns the local time with timezone information. (:issue:`21358`) -- :meth:`Timestamp.round`, :meth:`Timestamp.ceil`, and :meth:`Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`) -- :meth:`Timestamp.round`, :meth:`Timestamp.ceil`, and :meth:`Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`) -- :class:`pandas.core.resample.Resampler` now is iterable like :class:`pandas.core.groupby.GroupBy` (:issue:`15314`). +- :meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, and :meth:`~Timestamp.floor` for :class:`DatetimeIndex` and :class:`Timestamp` + now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`) + and a ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`22647`) +- The result of :meth:`~DataFrame.resample` is now iterable similar to ``groupby()`` (:issue:`15314`). - :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`pandas.core.resample.Resampler.quantile` (:issue:`15023`). - :meth:`DataFrame.resample` and :meth:`Series.resample` with a :class:`PeriodIndex` will now respect the ``base`` argument in the same fashion as with a :class:`DatetimeIndex`. (:issue:`23882`) - :meth:`pandas.api.types.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``, all instances of ``set`` will not be considered "list-like" anymore (:issue:`23061`) - :meth:`Index.to_frame` now supports overriding column name(s) (:issue:`22580`). - :meth:`Categorical.from_codes` now can take a ``dtype`` parameter as an alternative to passing ``categories`` and ``ordered`` (:issue:`24398`). -- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`). +- New attribute ``__git_version__`` will return git commit sha of current build (:issue:`21295`). - Compatibility with Matplotlib 3.0 (:issue:`22790`). - Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`) - :func:`read_fwf` now accepts keyword ``infer_nrows`` (:issue:`15138`). @@ -426,7 +417,7 @@ Other Enhancements - :class:`IntervalIndex` has gained the :attr:`~IntervalIndex.is_overlapping` attribute to indicate if the ``IntervalIndex`` contains any overlapping intervals (:issue:`23309`) - :func:`pandas.DataFrame.to_sql` has gained the ``method`` argument to control SQL insertion clause. See the :ref:`insertion method ` section in the documentation. (:issue:`8953`) - :meth:`DataFrame.corrwith` now supports Spearman's rank correlation, Kendall's tau as well as callable correlation methods. (:issue:`21925`) -- :meth:`DataFrame.to_json`, :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and :meth:`DataFrame.to_XXX` etc. now support tilde(~) in path argument. (:issue:`23473`) +- :meth:`DataFrame.to_json`, :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_pickle`, and other export methods now support tilde(~) in path argument. (:issue:`23473`) .. _whatsnew_0240.api_breaking: @@ -438,8 +429,8 @@ Pandas 0.24.0 includes a number of API breaking changes. .. _whatsnew_0240.api_breaking.deps: -Dependencies have increased minimum versions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Increased minimum versions for dependencies +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We have updated our minimum supported versions of dependencies (:issue:`21242`, :issue:`18742`, :issue:`23774`, :issue:`24767`). If installed, we now require: @@ -1167,17 +1158,19 @@ Other API Changes .. _whatsnew_0240.api.extension: -ExtensionType Changes -~~~~~~~~~~~~~~~~~~~~~ +Extension Type Changes +~~~~~~~~~~~~~~~~~~~~~~ **Equality and Hashability** -Pandas now requires that extension dtypes be hashable. The base class implements +Pandas now requires that extension dtypes be hashable (i.e. the respective +``ExtensionDtype`` objects; hashability is not a requirement for the values +of the corresponding ``ExtensionArray``). The base class implements a default ``__eq__`` and ``__hash__``. If you have a parametrized dtype, you should update the ``ExtensionDtype._metadata`` tuple to match the signature of your ``__init__`` method. See :class:`pandas.api.extensions.ExtensionDtype` for more (:issue:`22476`). -**Reshaping changes** +**New and changed methods** - :meth:`~pandas.api.types.ExtensionArray.dropna` has been added (:issue:`21185`) - :meth:`~pandas.api.types.ExtensionArray.repeat` has been added (:issue:`24349`) @@ -1195,9 +1188,25 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your - Added :meth:`pandas.api.types.register_extension_dtype` to register an extension type with pandas (:issue:`22664`) - Updated the ``.type`` attribute for ``PeriodDtype``, ``DatetimeTZDtype``, and ``IntervalDtype`` to be instances of the dtype (``Period``, ``Timestamp``, and ``Interval`` respectively) (:issue:`22938`) +.. _whatsnew_0240.enhancements.extension_array_operators: + +**Operator support** + +A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison +operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``: + +1. Define each of the operators on your ``ExtensionArray`` subclass. +2. Use an operator implementation from pandas that depends on operators that are already defined + on the underlying elements (scalars) of the ``ExtensionArray``. + +See the :ref:`ExtensionArray Operator Support +` documentation section for details on both +ways of adding operator support. + **Other changes** - A default repr for :class:`pandas.api.extensions.ExtensionArray` is now provided (:issue:`23601`). +- :meth:`ExtensionArray._formatting_values` is deprecated. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`) - An ``ExtensionArray`` with a boolean dtype now works correctly as a boolean indexer. :meth:`pandas.api.types.is_bool_dtype` now properly considers them boolean (:issue:`22326`) **Bug Fixes** @@ -1246,7 +1255,6 @@ Deprecations - The methods :meth:`DataFrame.update` and :meth:`Panel.update` have deprecated the ``raise_conflict=False|True`` keyword in favor of ``errors='ignore'|'raise'`` (:issue:`23585`) - The methods :meth:`Series.str.partition` and :meth:`Series.str.rpartition` have deprecated the ``pat`` keyword in favor of ``sep`` (:issue:`22676`) - Deprecated the ``nthreads`` keyword of :func:`pandas.read_feather` in favor of ``use_threads`` to reflect the changes in ``pyarrow>=0.11.0``. (:issue:`23053`) -- :meth:`ExtensionArray._formatting_values` is deprecated. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`) - :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`) - Constructing a :class:`TimedeltaIndex` from data with ``datetime64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23539`) - Constructing a :class:`DatetimeIndex` from data with ``timedelta64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23675`)