Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add BooleanArray extension array #29555

Merged
merged 21 commits into from
Nov 25, 2019
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
640dac9
ENH: add BooleanArray extension array
jorisvandenbossche Nov 11, 2019
b9597bb
enable arithmetic ops + ufuncs
jorisvandenbossche Nov 12, 2019
fa77b7a
switch back to object dtype for __array__ + astype tests
jorisvandenbossche Nov 12, 2019
29415a9
temp
jorisvandenbossche Nov 12, 2019
c4a53f2
Merge remote-tracking branch 'upstream/master' into boolean-EA
jorisvandenbossche Nov 14, 2019
b1182bc
updates for feedback + add BooleanArray docstring
jorisvandenbossche Nov 15, 2019
94c5a90
Merge remote-tracking branch 'upstream/master' into boolean-EA
jorisvandenbossche Nov 18, 2019
1861602
try fix test for old numpy
jorisvandenbossche Nov 18, 2019
ad6c477
fix in place modification of mask / follow numpy for division
jorisvandenbossche Nov 18, 2019
67bf21a
string -> boolean copy paste errors
jorisvandenbossche Nov 18, 2019
f153fb2
add basic docs
jorisvandenbossche Nov 18, 2019
e24c097
empty test
jorisvandenbossche Nov 18, 2019
f0d0c6e
fix BooleanDtype construction + doc lint
jorisvandenbossche Nov 19, 2019
a3e1e93
Merge remote-tracking branch 'upstream/master' into boolean-EA
jorisvandenbossche Nov 20, 2019
1717583
add extra tests for constructors + check dimensionality
jorisvandenbossche Nov 20, 2019
5ce67e2
validate values when converting to boolean array
jorisvandenbossche Nov 20, 2019
8c0abe6
various updates
jorisvandenbossche Nov 20, 2019
031a113
fix + test return types of reducers
jorisvandenbossche Nov 20, 2019
90558d6
fix base reduction tests
jorisvandenbossche Nov 20, 2019
af82754
Merge remote-tracking branch 'upstream/master' into boolean-EA
jorisvandenbossche Nov 25, 2019
0eb3ca2
small edits
jorisvandenbossche Nov 25, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/getting_started/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1950,6 +1950,7 @@ sparse :class:`SparseDtype` (none) :class:`arrays.
intervals :class:`IntervalDtype` :class:`Interval` :class:`arrays.IntervalArray` :ref:`advanced.intervalindex`
nullable integer :class:`Int64Dtype`, ... (none) :class:`arrays.IntegerArray` :ref:`integer_na`
Strings :class:`StringDtype` :class:`str` :class:`arrays.StringArray` :ref:`text`
Boolean (with NA) :class:`BooleanDtype` :class:`bool` :class:`arrays.BooleanArray` :ref:`api.arrays.bool`
=================== ========================= ================== ============================= =============================

Pandas has two ways to store strings.
Expand Down
23 changes: 23 additions & 0 deletions doc/source/reference/arrays.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Nullable Integer :class:`Int64Dtype`, ... (none) :ref:`api.array
Categorical :class:`CategoricalDtype` (none) :ref:`api.arrays.categorical`
Sparse :class:`SparseDtype` (none) :ref:`api.arrays.sparse`
Strings :class:`StringDtype` :class:`str` :ref:`api.arrays.string`
Boolean (with NA) :class:`BooleanDtype` :class:`bool` :ref:`api.arrays.bool`
=================== ========================= ================== =============================

Pandas and third-party libraries can extend NumPy's type system (see :ref:`extending.extension-types`).
Expand Down Expand Up @@ -485,6 +486,28 @@ The ``Series.str`` accessor is available for ``Series`` backed by a :class:`arra
See :ref:`api.series.str` for more.


.. _api.arrays.bool:

Boolean data with missing values
--------------------------------

The boolean dtype (with the alias ``"boolean"``) provides support for storing
boolean data (True, False values) with missing values, which is not possible
with a bool :class:`numpy.ndarray`.

.. autosummary::
:toctree: api/
:template: autosummary/class_without_autosummary.rst

arrays.BooleanArray

.. autosummary::
:toctree: api/
:template: autosummary/class_without_autosummary.rst

BooleanDtype


.. Dtype attributes which are manually listed in their docstrings: including
.. it here to make sure a docstring page is built for them

Expand Down
24 changes: 24 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,30 @@ of the Series or columns of a DataFrame will also have string dtype.
We recommend explicitly using the ``string`` data type when working with strings.
See :ref:`text.types` for more.

.. _whatsnew_100.boolean:

Boolean data type with missing values support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We've added :class:`BooleanDtype` / :class:`~arrays.BooleanArray`, an extension
type dedicated to boolean data that can hold missing values. With the default
``'bool`` data type based on a numpy bool array, the column can only hold
jreback marked this conversation as resolved.
Show resolved Hide resolved
True or False values and not missing values. This new :class:`BooleanDtype`
can store missing values as well by keeping track of this in a separate mask.
(:issue:`29555`)

.. ipython:: python

pd.Series([True, False, None], dtype=pd.BooleanDtype())

You can use the alias ``"boolean"`` as well.

.. ipython:: python

s = pd.Series([True, False, None], dtype="boolean")
s


.. _whatsnew_1000.enhancements.other:

Other enhancements
Expand Down
1 change: 1 addition & 0 deletions pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
IntervalDtype,
DatetimeTZDtype,
StringDtype,
BooleanDtype,
# missing
isna,
isnull,
Expand Down
2 changes: 2 additions & 0 deletions pandas/arrays/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
See :ref:`extending.extension-types` for more.
"""
from pandas.core.arrays import (
BooleanArray,
Categorical,
DatetimeArray,
IntegerArray,
Expand All @@ -16,6 +17,7 @@
)

__all__ = [
"BooleanArray",
"Categorical",
"DatetimeArray",
"IntegerArray",
Expand Down
14 changes: 14 additions & 0 deletions pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,20 @@ def compare_operators_no_eq_ne(request):
return request.param


@pytest.fixture(
params=["__and__", "__rand__", "__or__", "__ror__", "__xor__", "__rxor__"]
)
def all_logical_operators(request):
"""
Fixture for dunder names for common logical operations

* |
* &
* ^
"""
return request.param


@pytest.fixture(params=[None, "gzip", "bz2", "zip", "xz"])
def compression(request):
"""
Expand Down
1 change: 1 addition & 0 deletions pandas/core/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# TODO: Remove get_dummies import when statsmodels updates #18264
from pandas.core.algorithms import factorize, unique, value_counts
from pandas.core.arrays import Categorical
from pandas.core.arrays.boolean import BooleanDtype
from pandas.core.arrays.integer import (
Int8Dtype,
Int16Dtype,
Expand Down
1 change: 1 addition & 0 deletions pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
ExtensionScalarOpsMixin,
try_cast_to_ea,
)
from .boolean import BooleanArray # noqa: F401
from .categorical import Categorical # noqa: F401
from .datetimes import DatetimeArray # noqa: F401
from .integer import IntegerArray, integer_array # noqa: F401
Expand Down
9 changes: 9 additions & 0 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1088,6 +1088,15 @@ def _add_comparison_ops(cls):
cls.__le__ = cls._create_comparison_method(operator.le)
cls.__ge__ = cls._create_comparison_method(operator.ge)

@classmethod
def _add_logical_ops(cls):
cls.__and__ = cls._create_logical_method(operator.and_)
cls.__rand__ = cls._create_logical_method(ops.rand_)
cls.__or__ = cls._create_logical_method(operator.or_)
cls.__ror__ = cls._create_logical_method(ops.ror_)
cls.__xor__ = cls._create_logical_method(operator.xor)
cls.__rxor__ = cls._create_logical_method(ops.rxor)


class ExtensionScalarOpsMixin(ExtensionOpsMixin):
"""
Expand Down
Loading