Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Uses pd.NA in IntegerArray #29964

Merged
merged 57 commits into from
Dec 30, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
1eec965
API: Uses pd.NA in IntegerArray
TomAugspurger Dec 2, 2019
f5f61ea
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 2, 2019
c569562
wip
TomAugspurger Dec 2, 2019
a8261a4
wip
TomAugspurger Dec 3, 2019
c8ff04f
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 3, 2019
cddc9df
fixup value counts
TomAugspurger Dec 3, 2019
9488d34
fixed to_numpy
TomAugspurger Dec 3, 2019
0d5aab8
doc
TomAugspurger Dec 3, 2019
fa61a6d
wip
TomAugspurger Dec 3, 2019
de2c6c6
wip
TomAugspurger Dec 3, 2019
60d7663
wip
TomAugspurger Dec 3, 2019
a4c4618
fixup extension
TomAugspurger Dec 3, 2019
0a500be
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 4, 2019
1c716f3
update tests
TomAugspurger Dec 4, 2019
67c8d51
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 4, 2019
22a2bc7
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 4, 2019
34de18e
updates
TomAugspurger Dec 4, 2019
78944d1
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 5, 2019
ffbe299
wip
TomAugspurger Dec 5, 2019
7abf40e
API: Handle pow & rpow special cases
TomAugspurger Dec 5, 2019
36d403d
move
TomAugspurger Dec 6, 2019
f6b4062
Merge remote-tracking branch 'upstream/master' into na-pow
TomAugspurger Dec 6, 2019
945e8cd
revert
TomAugspurger Dec 6, 2019
04546f3
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 6, 2019
a493965
Merge remote-tracking branch 'upstream/master' into na-pow
TomAugspurger Dec 6, 2019
8fc8b3a
fixup
TomAugspurger Dec 6, 2019
a49aa65
handle negative
TomAugspurger Dec 6, 2019
8ad166d
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 6, 2019
dd745c3
Merge branch 'na-pow' into NA-scalar+IntegerArray
TomAugspurger Dec 6, 2019
88fa412
expand test
TomAugspurger Dec 6, 2019
0902eef
wip
TomAugspurger Dec 6, 2019
721a1ea
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 9, 2019
c658307
fixup
TomAugspurger Dec 9, 2019
4f9d775
exceptions
TomAugspurger Dec 9, 2019
1244ef4
wip
TomAugspurger Dec 9, 2019
4a34b45
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 9, 2019
5293d87
fixup
TomAugspurger Dec 9, 2019
39f225a
arrow
TomAugspurger Dec 9, 2019
ea19b2d
update
TomAugspurger Dec 9, 2019
fe2d98e
fixup
TomAugspurger Dec 10, 2019
68fe155
update
TomAugspurger Dec 10, 2019
f27a5c2
fixup
TomAugspurger Dec 10, 2019
b97450b
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 16, 2019
5d62af8
updates
TomAugspurger Dec 16, 2019
2bf57d6
test, repr
TomAugspurger Dec 16, 2019
2f4e1cd
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 17, 2019
021dc7b
fixup
TomAugspurger Dec 17, 2019
197f18b
enable
TomAugspurger Dec 17, 2019
259b779
fixup
TomAugspurger Dec 17, 2019
c0cfef9
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 18, 2019
3183d53
ints
TomAugspurger Dec 18, 2019
4986d84
restore comment
TomAugspurger Dec 18, 2019
76806e9
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 30, 2019
64b4ccc
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 30, 2019
b39dc60
docs
TomAugspurger Dec 30, 2019
800158d
docs
TomAugspurger Dec 30, 2019
e5d6832
fixup
TomAugspurger Dec 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions doc/source/user_guide/integer_na.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ Nullable integer data type
IntegerArray is currently experimental. Its API or implementation may
change without warning.

.. versionchanged:: 1.0.0

Now uses :attr:`pandas.NA` as the missing value rather
than :attr:`numpy.nan`.

In :ref:`missing_data`, we saw that pandas primarily uses ``NaN`` to represent
missing data. Because ``NaN`` is a float, this forces an array of integers with
Expand All @@ -23,6 +27,9 @@ much. But if your integer column is, say, an identifier, casting to float can
be problematic. Some integers cannot even be represented as floating point
numbers.

Construction
------------

Pandas can represent integer data with possibly missing values using
:class:`arrays.IntegerArray`. This is an :ref:`extension types <extending.extension-types>`
implemented within pandas.
Expand All @@ -39,6 +46,12 @@ NumPy's ``'int64'`` dtype:

pd.array([1, 2, np.nan], dtype="Int64")

All NA-like values are replaced with :attr:`pandas.NA`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to add a versionchanged tag here (and below)


.. ipython:: python

pd.array([1, 2, np.nan, None, pd.NA], dtype="Int64")

This array can be stored in a :class:`DataFrame` or :class:`Series` like any
NumPy array.

Expand Down Expand Up @@ -78,6 +91,9 @@ with the dtype.
In the future, we may provide an option for :class:`Series` to infer a
nullable-integer dtype.

Operations
----------

Operations involving an integer array will behave similar to NumPy arrays.
Missing values will be propagated, and the data will be coerced to another
dtype if needed.
Expand Down Expand Up @@ -123,3 +139,15 @@ Reduction and groupby operations such as 'sum' work as well.

df.sum()
df.groupby('B').A.sum()

Scalar NA Value
---------------

:class:`arrays.IntegerArray` uses :attr:`pandas.NA` as its scalar
missing value. Slicing a single element that's missing will return
:attr:`pandas.NA`

.. ipython:: python

a = pd.array([1, None], dtype="Int64")
a[1]
58 changes: 58 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,64 @@ The following methods now also correctly output values for unobserved categories

As a reminder, you can specify the ``dtype`` to disable all inference.

:class:`arrays.IntegerArray` now uses :attr:`pandas.NA`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`arrays.IntegerArray` now uses :attr:`pandas.NA` rather than
:attr:`numpy.nan` as its missing value marker (:issue:`29964`).

*pandas 0.25.x*

.. code-block:: python

>>> a = pd.array([1, 2, None], dtype="Int64")
>>> a
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

>>> a[2]
nan

*pandas 1.0.0*

.. ipython:: python

a = pd.array([1, 2, None], dtype="Int64")
a[2]

See :ref:`missing_data.NA` for more on the differences between :attr:`pandas.NA`
and :attr:`numpy.nan`.

:class:`arrays.IntegerArray` comparisons return :class:`arrays.BooleanArray`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Comparison operations on a :class:`arrays.IntegerArray` now returns a
:class:`arrays.BooleanArray` rather than a NumPy array (:issue:`29964`).

*pandas 0.25.x*

.. code-block:: python

>>> a = pd.array([1, 2, None], dtype="Int64")
>>> a
<IntegerArray>
[1, 2, NaN]
Length: 3, dtype: Int64

>>> a > 1
array([False, True, False])

*pandas 1.0.0*

.. ipython:: python

a = pd.array([1, 2, None], dtype="Int64")
a > 1

Note that missing values now propagate, rather than always comparing unequal
like :attr:`numpy.nan`. See :ref:`missing_data.NA` for more.

By default :meth:`Categorical.min` now returns the minimum instead of np.nan
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
6 changes: 4 additions & 2 deletions pandas/core/arrays/boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -730,7 +730,6 @@ def all(self, skipna: bool = True, **kwargs):
@classmethod
def _create_logical_method(cls, op):
def logical_method(self, other):

if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)):
# Rely on pandas to unbox and dispatch to us.
return NotImplemented
Expand Down Expand Up @@ -777,8 +776,11 @@ def logical_method(self, other):
@classmethod
def _create_comparison_method(cls, op):
def cmp_method(self, other):
from pandas.arrays import IntegerArray

if isinstance(other, (ABCDataFrame, ABCSeries, ABCIndexClass)):
if isinstance(
other, (ABCDataFrame, ABCSeries, ABCIndexClass, IntegerArray)
):
# Rely on pandas to unbox and dispatch to us.
return NotImplemented

Expand Down
Loading