Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.gt (and other comparison methods) can fail with dtype=object #59418

Open
2 of 3 tasks
warwickmm opened this issue Aug 5, 2024 · 12 comments
Open
2 of 3 tasks
Assignees
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@warwickmm
Copy link

warwickmm commented Aug 5, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd
>>> 
>>> x = pd.Series([None], dtype=object)
>>> y = pd.Series([0])

# This raises a: "TypeError: '>' not supported between instances of 'NoneType' and 'int'"
>>> x.gt(y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/test/venv/lib/python3.12/site-packages/pandas/core/series.py", line 6300, in gt
    return self._flex_method(
           ^^^^^^^^^^^^^^^^^^
  File "/home/test/venv/lib/python3.12/site-packages/pandas/core/series.py", line 6246, in _flex_method
    return self._binop(other, op, level=level, fill_value=fill_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/test/venv/lib/python3.12/site-packages/pandas/core/series.py", line 6195, in _binop
    result = func(this_vals, other_vals)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'int'


# This runs without error.
>>> x > y
0    False
dtype: bool


# When converted to DataFrames (with object dtypes), .gt runs without error:
>>> x.to_frame().gt(y.to_frame())
       0
0  False


# If the series has dtype=float, the comparison runs without error.
>>> x.astype(float).gt(y)
0    False
dtype: bool

Issue Description

When a Series has dtype=object, comparison methods (e.g., .gt) can raise a TypeError: '>' not supported error. No error is encountered when using the > operator, or when calling DataFrame.gt, or when the Series has dtype=float.

Expected Behavior

When the Series has dtype=object, the behavior of Series.gt should be consistent with the > operator and with the DataFrame.gt method.

Installed Versions

INSTALLED VERSIONS
------------------
commit                : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python                : 3.12.4.final.0
python-bits           : 64
OS                    : Linux
OS-release            : 6.10.2-arch1-1
Version               : #1 SMP PREEMPT_DYNAMIC Sat, 27 Jul 2024 16:49:55 +0000
machine               : x86_64
processor             : 
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.2
numpy                 : 2.0.1
pytz                  : 2024.1
dateutil              : 2.9.0.post0
setuptools            : 71.1.0
pip                   : 23.2.1
Cython                : None
pytest                : 8.3.1
hypothesis            : None
sphinx                : None
blosc                 : None
feather               : None
xlsxwriter            : None
lxml.etree            : None
html5lib              : None
pymysql               : None
psycopg2              : None
jinja2                : None
IPython               : None
pandas_datareader     : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : None
gcsfs                 : None
matplotlib            : 3.9.1
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
pandas_gbq            : None
pyarrow               : None
pyreadstat            : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : 1.14.0
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
zstandard             : None
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None
@warwickmm warwickmm added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 5, 2024
@Patsnoop
Copy link

Patsnoop commented Aug 5, 2024

I would like to work on this

@rhshadrach
Copy link
Member

Thanks for the report - it seems to me comparing None to e.g. integers should raise. My guess is that x > y succeeding is a result of assuming None is an NA value and hence behaves like np.nan (always false for comparisons). Further investigations are welcome!

@rhshadrach rhshadrach added Numeric Operations Arithmetic, Comparison, and Logical operations Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 5, 2024
@KevsterAmp
Copy link
Contributor

take

@KevsterAmp KevsterAmp removed their assignment Aug 7, 2024
@KevsterAmp
Copy link
Contributor

@rhshadrach - Any ideas for a fix? do we raise an error when "<" is used between Series that contains None?

@rhshadrach
Copy link
Member

That seems like the correct behavior to me - yes.

@warwickmm
Copy link
Author

Should DataFrame.gt raise an error as well?

@warwickmm
Copy link
Author

Also, should one expect the behavior to be consistent across all values for which pd.isna returns True (e.g., None, np.nan, pd.NA, etc.)? Or does one need to be cognizant of how missing values are represented in each instance?

@rhshadrach
Copy link
Member

My above comments are only regarding Python's None when stored in an object-dtype column or Series.

@warwickmm
Copy link
Author

Thanks. I'll just note that the below also currently runs without error. Not sure if that's a situation that needs to be considered as well.

>>> x = pd.Series([None], dtype=object)
>>> x.gt(0)
0    False
dtype: bool

@maushumee
Copy link
Contributor

Hi @warwickmm! Are you working on this? If not, I would like to take this up.

@warwickmm
Copy link
Author

I am not.

@maushumee
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

5 participants