Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: or, and and not operators not correctly implemented for pd.NA #49828

Open
3 tasks done
vsbits opened this issue Nov 22, 2022 · 4 comments
Open
3 tasks done

BUG: or, and and not operators not correctly implemented for pd.NA #49828

vsbits opened this issue Nov 22, 2022 · 4 comments
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@vsbits
Copy link

vsbits commented Nov 22, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

edit: formatting problem

Reproducible Example

import pandas as pd
not pd.NA
import pandas as pd
pd.NA and False
import pandas as pd
pd.NA or True

Issue Description

or, and and not operators not correctly implemented for pd.NA. Changing the order even raises error:

>>> True and pd.NA
<NA>
>>> False and pd.NA
False
>>> pd.NA and pd.NA
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
    raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> pd.NA and True
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
    raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> pd.NA and False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
    raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> True or pd.NA
True
>>> False and pd.NA
False
>>> pd.NA and pd.NA
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
    raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> pd.NA and True
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
    raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> pd.NA and False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
    raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous
>>> not pd.NA
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/missing.pyx", line 413, in pandas._libs.missing.NAType.__bool__
    raise TypeError("boolean value of NA is ambiguous")
TypeError: boolean value of NA is ambiguous

Expected Behavior

Same as R language:

> !NA
[1] NA
> NA | TRUE
[1] TRUE
> NA | FALSE
[1] NA
> NA & TRUE
[1] NA
> NA & FALSE
[1] FALSE

Installed Versions

INSTALLED VERSIONS

commit : 6f90ac3
python : 3.8.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.16.3-microsoft-standard-WSL2
Version : #1 SMP Fri Apr 2 22:23:49 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.0.dev0+719.g6f90ac3b2a.dirty
numpy : 1.23.4
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.57.1
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.0.3
IPython : 8.6.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 2022.11.0
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.2
numba : 0.56.3
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : 0.17.9
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2021.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.44
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.11.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : None
qtpy : None
pyqt5 : None

@vsbits vsbits added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 22, 2022
@vsbits vsbits changed the title BUG: BUG: or, and and not operators not correctly implemented for pd.NA Nov 22, 2022
@lithomas1 lithomas1 added Numeric Operations Arithmetic, Comparison, and Logical operations NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 23, 2022
@kostyafarber
Copy link
Contributor

Might take a look at this one

@kostyafarber
Copy link
Contributor

kostyafarber commented Dec 6, 2022

Is this the expected behaviour of pd.NA as it's not meant to be used in boolean statements?

https://pandas.pydata.org/pandas-docs/version/1.0.0/user_guide/missing_data.html#logical-operations:~:text=This%20also%20means,missing%20values%20beforehand.

@vsbits
Copy link
Author

vsbits commented Dec 6, 2022

Not sure, but I don't think so. Logical operations should follow the rules of three-valued logic.
It raises an error only if pd.NA comes first in the comparison. And using | instead of or works. Seems inconsistent.

>>> True or pd.NA
True
>>> True | pd.NA
True
>>> pd.NA | True
True
>>> pd.NA or True
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas\_libs\missing.pyx", line 382, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

On the same line, I guess not NA should return NA. I know R works that way, but it has a logical class (TRUE/FALSE/NA), which is different from python bool (True, False).

@jorisvandenbossche
Copy link
Member

The core of the issue here is related to multiple things:

  1. We currently decided to raise an error when pd.NA is evaluated in a boolean context (bool(pd.NA)) -> API: bool(pd.NA) #38224
  2. The and and or keywords will call bool(..) on the left side value, which can thus cause this error to bubble up, and there is no way to override this behaviour. And so you get inconsistent behaviour depending on whether pd.NA is the left or right side argument.
  3. On the other hand, the | and & operators can be overridden for custom behaviour on you object (using __and__ and __or__, and that is what we do for pd.NA), so we can provide consistent behaviour for those operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

4 participants