Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add BooleanArray extension array #29555

Merged
merged 21 commits into from
Nov 25, 2019

Conversation

jorisvandenbossche
Copy link
Member

Closes #21778
xref closed PRs #22226 and #25415, and xref the discussions on missing values in #28095 and #28778

This PR adds a BooleanArray extension array ("boolean" dtype):

  • It uses the same mask-based approach as the IntegerArray: numpy array for the values + numpy array for the mask
  • it does not yet implement any new NA behaviour as being discussed DISCUSS: boolean dtype with missing value support #28778, but for now follows the current NaN behaviour
  • It is not yet "produced" from other extension arrays (eg a comparison method on IntegerArray, I would leave that for follow-up PRs)
  • There is certainly a lot of code to share between this BooleanArray and IntegerArray (eg in a "BaseMaskedArray" base class), but didn't do that yet here (focusing on the BooleanArray for now)

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Nov 11, 2019
data[self._mask] = self._na_value
return data
else:
return self._data
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted this compared to IntegerArray to return bool dtype if there are no NaNs (this helps to enable boolean indexing automatically for this case). But we should probably make a consistent choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we want to return object-dtype here if we can avoid it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, a predictable dtype (not depending on the values in the array) is also nice ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as I thought more about it the dtype stability was winning me over. Not sure what's best.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched back to always object dtype for now. There are methods (like astype(bool)) that force you to give bool dtype if possible. We should maybe think about adding a to_numpy to our arrays as well (the same for IntegerArray).

pandas/core/arrays/boolean.py Show resolved Hide resolved
# def test_to_boolean_array_error(values):
# # error in converting existing arrays to BooleanArray
# with pytest.raises(TypeError):
# pd.array(values, dtype="boolean")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently those actually work, as the coercion routine is using numpy coercion: np.array(['a'], dtype=bool) actually works and use bool-ness of the values.
So the question here is if we want to allow this, or rather be more strict (eg only allow actual bools + integer 0/1)? (personally prefer to be strict)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, as long as actual bools and integer 0/1 as accepted then either works for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a decision here, and get tests for it. I think we can justify either, since np.array([np.nan], dtype=bool) will be array([True]). So we're already deviating from NumPy in this limited case.

Do you have a plan for how to do the actual validation? Will it be somewhat expensive?

What are we allowing in __setitem__? Do we convert the value to bool, or do we require booleans?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So an option could be to allow: bool + int/float of 0's and 1's (the float is needed in case it are 0/1's with np.nan), and raise for the rest.

Do you have a plan for how to do the actual validation? Will it be somewhat expensive?

We can check for actual bool/int/float dtypes (and then for int and float check that the original is equal to the result converted back to numeric; to ensure it were only 0's and 1's). For object we can first do an infer_dtype, and then apply the same logic.

What are we allowing in setitem? Do we convert the value to bool, or do we require booleans?

Currently, we use the same coerce_to_array function in __setitem__ to convert the value(s) to be set to boolean values+mask. So if we restrict like the above, then that follows the same pattern for __setitem__ (but we could also be more strict in __setitem__ if that is preferred)



# @pytest.mark.parametrize("ufunc", [np.add])
# def test_ufuncs_binary(ufunc):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to implement arithmetic ops

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the expected return types? Would you use IntegerArray for things returning an integer dtype?

ndarray[bool] will sometimes return a float dtype...

In [17]: a = np.array([True, False])

In [18]: a / 1
Out[18]: array([1., 0.])

would we return an ndarray there, with NaN for the missing values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess returning a PandasArray is also an option... If we want to stay in the algebra of op(BooleanArray, other) returning an ExtensionArray

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I went with tthe dtype that numpy ops result in, and then if result is bool -> BooleanArray, if is int -> IntegerArray, otherwise if result is float dtype (or something else) return the numpy array.

@jbrockmendel
Copy link
Member

should this be marked as a WIP?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you did not inherit from IntegeryArray (and dtype) and just copied

surely these share much more code and could easily share a bae class at the very least

@jorisvandenbossche
Copy link
Member Author

is there a reason you did not inherit from IntegerArray (and dtype) and just copied

surely these share much more code and could easily share a bae class at the very least

As I said in the top post: yes this needs to share code with IntegerArray through common base class, but initially focussed on a working BooleanArray and deciding on its behaviour.
I am happy (and planning) to do that, here or in a follow-up.

I don't think it should directly inherit IntegerArray, but personally prefer a common base class. I think that will make it clearer what is actually shared.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2019

is there a reason you did not inherit from IntegerArray (and dtype) and just copied
surely these share much more code and could easily share a bae class at the very least

As I said in the top post: yes this needs to share code with IntegerArray through common base class, but initially focussed on a working BooleanArray and deciding on its behaviour.
I am happy (and planning) to do that, here or in a follow-up.

I don't think it should directly inherit IntegerArray, but personally prefer a common base class. I think that will make it clearer what is actually shared.

i agree a common base class is good and as a follow up is fine

@jorisvandenbossche jorisvandenbossche added this to the 1.0 milestone Nov 12, 2019
@jorisvandenbossche
Copy link
Member Author

should this be marked as a WIP?

@jbrockmendel not in the sense of "this is ready to be reviewed", and it also passing all extension array tests (except one, see below).
There are certainly still todo's (eg the arithmetic ops and ufunc machinery), but eg some commented out tests are mainly API questions (what behaviour do we want? for which review is welcome). There are also parts that I am not planning to address in this PR (eg adding a scalar pd.NA) but rather in separate PRs.


The one failure in the base extension array tests, is for concatting mixed dtypes (result = pd.concat([df1, df2])):

(Pdb) df1
       A
0   True
1  False
2   True
(Pdb) df2
   A
0  1
1  2
2  3
(Pdb) result
   A
0  1
1  0
2  1
0  1
1  2
2  3
(Pdb) expected
       A
0   True
1  False
2   True
0      1
1      2
2      3

So it is expecting to return object dtype.
However, on master for numpy dtypes, we also coerce booleans to integer:

In [23]: df1 = pd.DataFrame({'A': [True, False]})  

In [24]: df2 = pd.DataFrame({'A': [1, 2]})  

In [25]: pd.concat([df1, df2])    
Out[25]: 
   A
0  1
1  0
0  1
1  2

What to follow here? (we should probably have some discussion about clearer rules (and document this) for which dtypes coerce with which others. Or does this exist somewhere?)

@jbrockmendel
Copy link
Member

not in the sense of "this is ready to be reviewed",

I'm confused by a potential double-negative. If this is not ready to be reviewed, then I suggest it be marked as WIP so that would-be reviewers focus elsewhere for the time being.

@jorisvandenbossche
Copy link
Member Author

Let me be clearer then: this is ready to be reviewed

@jorisvandenbossche
Copy link
Member Author

Are people OK with first reviewing / getting merged this version of a BooleanArray without using the new NA, and only in a follow-up adapt it to use NA ?
(I prefer that to be able to work on those things in parallel, many aspects of BooleanArray do not involve NA vs np.nan, and those an already be reviewed here).

@jreback
Copy link
Contributor

jreback commented Nov 13, 2019

yes let’s keep BooleanArray independent and merge first

pandas/core/arrays/boolean.py Outdated Show resolved Hide resolved
pandas/core/arrays/boolean.py Outdated Show resolved Hide resolved
else:
mask = mask_values
else:
mask = np.asarray(mask, dtype=bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm when is this case reached? I'm a little confused about the mask | mask_values below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, you can pass a mask manually to this function (and in that case it gets combined with the potential mask of the values). However, this functionality is actually not used right now by BooleanArray.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests for this (as I think we want to expose this publicly at some point)

pandas/core/arrays/boolean.py Show resolved Hide resolved
pandas/core/arrays/boolean.py Outdated Show resolved Hide resolved
pandas/core/arrays/boolean.py Outdated Show resolved Hide resolved
@jorisvandenbossche
Copy link
Member Author

Can you check: does this handle indexing a BooleanArray with another BooleanArray well? Does it have to go through object dtype?

@TomAugspurger boolean indexing does not yet work. Well, it did work when BooleanArray coerced to a bool ndarray, and then it worked automatically (for the case that there are no NAs).
But now I changed back to always convert to object ndarray, we will need to handle this case specifically. Since that will probably involve adding a check and proper conversion to numpy array in several places in the indexing code, I was planning on doing that later / separate PR.

@TomAugspurger
Copy link
Contributor

Since that will probably involve adding a check and proper conversion to numpy array in several places in the indexing code, I was planning on doing that later / separate PR.

SGTM. Have we discussed the API on that anywhere? Do missing values propagate, or are they treated as False?

@jorisvandenbossche
Copy link
Member Author

Have we discussed the API on that anywhere? Do missing values propagate, or are they treated as False?

That's where I want feedback, see the discussion in #28778

# may need to fill infs
# and mask wraparound
if is_float_dtype(result):
mask |= (result == np.inf) | (result == -np.inf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is copied from IntegerArray, and should not be an inplace operation. It risks alterting self._mask inplace. xref #27829

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is copied from IntegerArray, and should not be an inplace operation. It risks alterting self._mask inplace. xref #27829

OK, I removed this here altogether, as I don't think we should fill any infs in the BooleanArray division (we should separately also look at this for IntegerArray to see what behaviour we want)

This means that for BooleanArray I follow numpy's behaviour in returning float with inf:

In [19]: True / np.array([True, False, True])
/home/joris/miniconda3/envs/dev/bin/ipython:1: RuntimeWarning: divide by zero encountered in true_divide
  #!/home/joris/miniconda3/envs/dev/bin/python
Out[19]: array([ 1., inf,  1.])

@jorisvandenbossche
Copy link
Member Author

@jreback
Copy link
Contributor

jreback commented Nov 20, 2019

have u tested this is indexing? i assume that would be a follow up

@jorisvandenbossche
Copy link
Member Author

have u tested this is indexing? i assume that would be a follow up

Yep, see #29555 (comment)

@jorisvandenbossche
Copy link
Member Author

So to gather the follow-ups:

  • Use pd.NA instead of np.nan for missing value indicator + update the logical ops to use the pd.NA behaviour
  • Reduce duplication with IntegerArray (implement a BaseMaskedArray base class or something like that). This might depend on the decision on whether we already want to use pd.NA in IntegerArray as well or not.
  • Update the behaviour of any/all reductions with skipna=False (API: any/all in context of boolean dtype with missing values #29686)
  • Enable boolean indexing with BooleanArray (we can start with handling the case of no NAs; the behaviour for with NAs still need to be decided: DISCUSS: boolean dtype with missing value support #28778)

(will open an issue for this)

@jorisvandenbossche
Copy link
Member Author

Are you all fine with almost merging this? (or maybe a final round of review?)
Merging it would make it easier to start working on the follow-ups

@jreback
Copy link
Contributor

jreback commented Nov 21, 2019

let me look one more round

@jorisvandenbossche
Copy link
Member Author

Other question: what do we want pd.array([..]) to return by default (when not specifying a dtype)?

Currently, you get this behavour:

In [9]: pd.array([True, False]) 
Out[9]: 
<PandasArray>
[True, False]
Length: 2, dtype: bool

In [10]: pd.array([True, False, np.nan]) 
Out[10]: 
<PandasArray>
[1.0, 0.0, nan]
Length: 3, dtype: float64

In [12]: pd.array([True, False, pd.NA])
Out[12]: 
<PandasArray>
[True, False, NA]
Length: 3, dtype: object

Do we want to return BooleanArray also when there are no missing values (which currently returns boolean PandasArray)?
Eg for integer, we also do not return the IntegerArray (I think the "rule" now is to basically have the same default behaviour as pandas, so only infer the extensions dtypes that are used by default (datetimetz, period, interval, etc). So if we want to be consistent with that, we should not return BooleanArray.

When a pd.NA is present, that's of course something new, and we could in principle decide to already infer the new dtype in that case (although then the inference of the dtype depends on the presence of a missing value or not ..) Although I am not sure this is technically easy to do, as the infer_dtypes does not distinguish different types of NA/NaN at the moment (and this out of scope of this PR, could be a future enhancement).

@jreback
Copy link
Contributor

jreback commented Nov 21, 2019

I would be +1 on using all of the extension arrays (SA, IA, BA) in pa.array by default. I dont' see any downside here. I would create an issue to update infer_dtype to do this, shouldn't be very hard as that can now return IA (with an opt in flag).

@jorisvandenbossche
Copy link
Member Author

I think our reasoning before was to keep this consistent with the main constructors (eg in Series), so that we could in principle move towards Series(..) being Series(array(..)) at some point.

@TomAugspurger
Copy link
Contributor

Do we want to return BooleanArray also when there are no missing values (which currently returns boolean PandasArray)?

I don't think that we should have value-dependent behavior here.

On the broader issue, I'd be fine with pd.array(..., dtype=None) starting to infer these new EAs. Will make a new issue for that.

@jorisvandenbossche
Copy link
Member Author

@jreback ping

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine. some questions, can be done on followup, mostly asking about test coverage. if you want to do a minor pass ok, or can merge.

doc/source/whatsnew/v1.0.0.rst Show resolved Hide resolved
return True


def coerce_to_array(values, mask=None, copy=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type these as much as possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure how to type those (I don't see anything in pandas._typing for "list like" ?)

pandas/core/arrays/boolean.py Show resolved Hide resolved
inferred_dtype = lib.infer_dtype(values_object, skipna=True)
integer_like = ("floating", "integer", "mixed-integer-float")
if inferred_dtype not in ("boolean", "empty") + integer_like:
raise TypeError("Need to pass bool-like values")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests hit here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests hit here?

Yes, there is a test that passes all kinds of non-boolean-like values

In general, I ran locally pytest with coverage, and there is 97% coverage for this file. The main non-covered things are some parts of the ufunc related code, and some length mismatch errors in the ops code.

pandas/core/arrays/boolean.py Show resolved Hide resolved
@classmethod
def _from_sequence(cls, scalars, dtype=None, copy=False):
if dtype:
assert dtype == "boolean"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have tests for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have tests for this?

Do we need that? This is an internal assertion, that should never be raised to the user but is here to help the developer (I don't think we should add tests for those asserts). So the "test" is that this actually never occurs in the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I can also leave it out (the assert). For BooleanArray, the dtype is not very useful. I think this parameter is mainly used for cases where multiple dtypes are possible per array (eg int64, int32 etc for IntegerArray)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right i think it actually should always be None here, as this is internally passed. however a user might try to pass something that is not None (so this should maybe be a ValueError)

pandas/core/arrays/boolean.py Show resolved Hide resolved
pandas/core/arrays/boolean.py Show resolved Hide resolved
"__rmod__",
):
# combine keeps boolean type
expected = expected.astype("Int8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what numpy does, eg:

In [17]: np.array([True, False]) ** 2                                                                                                                                                                              
Out[17]: array([1, 0], dtype=int8)

So for those ops, I just followed numpy's behaviour with boolean arrays.



class TestComparisonOps(base.BaseComparisonOpsTests):
def check_opname(self, s, op_name, other, exc=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need these overrides here? (you are already doing it above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's to override that there is no exception being raised (will add a comment about that).

(above is for the arithmetic ones)

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

doc/source/whatsnew/v1.0.0.rst Show resolved Hide resolved
return True


def coerce_to_array(values, mask=None, copy=False):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure how to type those (I don't see anything in pandas._typing for "list like" ?)

pandas/core/arrays/boolean.py Show resolved Hide resolved
inferred_dtype = lib.infer_dtype(values_object, skipna=True)
integer_like = ("floating", "integer", "mixed-integer-float")
if inferred_dtype not in ("boolean", "empty") + integer_like:
raise TypeError("Need to pass bool-like values")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests hit here?

Yes, there is a test that passes all kinds of non-boolean-like values

In general, I ran locally pytest with coverage, and there is 97% coverage for this file. The main non-covered things are some parts of the ufunc related code, and some length mismatch errors in the ops code.

pandas/core/arrays/boolean.py Show resolved Hide resolved
raise TypeError(
"mask should be boolean numpy array. Use "
"the 'array' function instead"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only when you actually coerce, not here though

Not fully sure I understand. We don't do any coercing in this __init__ constructor, and a few lines above there are isinstance checks that the input can only be boolean ndarrays.

this is something we should check (here and in IntegerArray) as if you accidently pass a non ndim==1 then it would be an error (can be a followup)

As following your comment, I already added a ndim check; I now raise an error if ndim is not 1 (on the lines below)

@classmethod
def _from_sequence(cls, scalars, dtype=None, copy=False):
if dtype:
assert dtype == "boolean"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have tests for this?

Do we need that? This is an internal assertion, that should never be raised to the user but is here to help the developer (I don't think we should add tests for those asserts). So the "test" is that this actually never occurs in the tests.

@classmethod
def _from_sequence(cls, scalars, dtype=None, copy=False):
if dtype:
assert dtype == "boolean"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I can also leave it out (the assert). For BooleanArray, the dtype is not very useful. I think this parameter is mainly used for cases where multiple dtypes are possible per array (eg int64, int32 etc for IntegerArray)

"__rmod__",
):
# combine keeps boolean type
expected = expected.astype("Int8")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what numpy does, eg:

In [17]: np.array([True, False]) ** 2                                                                                                                                                                              
Out[17]: array([1, 0], dtype=int8)

So for those ops, I just followed numpy's behaviour with boolean arrays.



class TestComparisonOps(base.BaseComparisonOpsTests):
def check_opname(self, s, op_name, other, exc=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's to override that there is no exception being raised (will add a comment about that).

(above is for the arithmetic ones)

@jreback jreback merged commit 7d7f885 into pandas-dev:master Nov 25, 2019
@jreback
Copy link
Contributor

jreback commented Nov 25, 2019

thanks @jorisvandenbossche other things can be done as followups.

@jbrockmendel
Copy link
Member

@jorisvandenbossche for those of us who didn't follow the thread in real-time, can you summarize the design decisions that got made in the end? and any decisions that remain un-decided?

keechongtan added a commit to keechongtan/pandas that referenced this pull request Nov 25, 2019
…ndexing-1row-df

* upstream/master: (185 commits)
  ENH: add BooleanArray extension array (pandas-dev#29555)
  DOC: Add link to dev calendar and meeting notes (pandas-dev#29737)
  ENH: Add built-in function for Styler to format the text displayed for missing values (pandas-dev#29118)
  DEPR: remove statsmodels/seaborn compat shims (pandas-dev#29822)
  DEPR: remove Index.summary (pandas-dev#29807)
  DEPR: passing an int to read_excel use_cols (pandas-dev#29795)
  STY: fstrings in io.pytables (pandas-dev#29758)
  BUG: Fix melt with mixed int/str columns (pandas-dev#29792)
  TST: add test for ffill/bfill for non unique multilevel (pandas-dev#29763)
  Changed description of parse_dates in read_excel(). (pandas-dev#29796)
  BUG: pivot_table not returning correct type when margin=True and aggfunc='mean'  (pandas-dev#28248)
  REF: Create _lib/window directory (pandas-dev#29817)
  Fixed small mistake (pandas-dev#29815)
  minor cleanups (pandas-dev#29798)
  DEPR: enforce deprecations in core.internals (pandas-dev#29723)
  add test for unused level raises KeyError (pandas-dev#29760)
  Add documentation linking to sqlalchemy (pandas-dev#29373)
  io/parsers: ensure decimal is str on PythonParser (pandas-dev#29743)
  Reenabled no-unused-function (pandas-dev#29767)
  CLN:F-string in pandas/_libs/tslibs/*.pyx (pandas-dev#29775)
  ...

# Conflicts:
#	pandas/tests/frame/indexing/test_indexing.py
@TomAugspurger
Copy link
Contributor

@jbrockmendel started at #29556 (comment)

@jorisvandenbossche jorisvandenbossche deleted the boolean-EA branch November 25, 2019 17:15
@jorisvandenbossche
Copy link
Member Author

@jbrockmendel this PR has no NA-related design decisions (it's just adding a BooleanArray that has not yet any specific NA behaviour). All discussions are still in the respective issues (#28095, #28778) and in the open PR to add NA (#29597), and will come in a future PR to add NA to BooleanArray.

@jorisvandenbossche
Copy link
Member Author

@jreback opened two follow-up issues related to improving the conversion / astype (which currently goes through object dtype where it is not always needed): #29838 and #29839

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Nov 25, 2019
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Nov 25, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
@jorisvandenbossche jorisvandenbossche mentioned this pull request Dec 3, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: boolean Extension Array
4 participants