Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different behaviour of df.isin() in 1.0.5/1.1.0, when df contains None #35565

Open
johny-b opened this issue Aug 5, 2020 · 9 comments
Open
Labels
Bug isin isin method Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version

Comments

@johny-b
Copy link

johny-b commented Aug 5, 2020

>>> import pandas as pd
>>> x = pd.DataFrame([['foo', 'bar'], [1, None]])
>>> y = x[1].copy()

# PANDAS 1.0.5
>>> x.isin(y)
       0     1
0  False  True
1  False  True

# PANDAS 1.1.0
>>> x.isin(y)
       0      1
0  False   True
1  False  False

Is this change intended?

@simonjayhawkins
Copy link
Member

Thanks @johny-b for the report.

Is this change intended?

I doubt it. running git bisect now to investigate.

@simonjayhawkins simonjayhawkins added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Regression Functionality that used to work in a prior pandas version labels Aug 5, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Aug 5, 2020
@simonjayhawkins
Copy link
Member

#31296 cc @jbrockmendel

19ae087 is the first bad commit
commit 19ae087
Author: jbrockmendel jbrockmendel@gmail.com
Date: Sun Mar 8 09:07:03 2020 -0700

PERF: do DataFrame.op(series, axis=0) blockwise (#31296)

@johny-b
Copy link
Author

johny-b commented Aug 5, 2020

Take a look also on this, on 1.0.5:

>>> x = pd.DataFrame([['bar', None]])
>>> x.isin(x[1])
       0     1
0  False  True
>>> x = pd.DataFrame([[1, None]])
>>> x.isin(x[1])
       0      1
0  False  False

This looks like a bug to me, and this change fixes it (at least improves consistency, on 1.1.0 this is always False)

@simonjayhawkins
Copy link
Member

possibly related to #19356 and #34125

tbh, I find the usage with Series and DataFrame non-intuitive anyway as they are not treated as unordered collections and if the Series is aligned on labels, I don't understand why isin doesn't have an axis keyword for broadcasting.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html

@rhshadrach
Copy link
Member

rhshadrach commented Sep 5, 2020

The result is determined here:

pandas/pandas/_libs/ops.pyx

Lines 163 to 169 in 70c056b

x = left[i]
y = right[i]
if checknull(x) or checknull(y):
result[i] = False
else:
result[i] = PyObject_RichCompareBool(x, y, flag)

where checknull is True for both x and y. This is consistent with e.g. np.nan != np.nan but inconsistent with None == None.

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 7, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.3, 1.1.4 Oct 5, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline

@simonjayhawkins simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020
@simonjayhawkins
Copy link
Member

moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline

@jreback jreback modified the milestones: 1.1.5, Contributions Welcome Nov 25, 2020
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Dec 20, 2020
@simonjayhawkins simonjayhawkins added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Dec 20, 2020
@DriesSchaumont
Copy link
Member

take

@mroeschke mroeschke added isin isin method and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Aug 8, 2021
@DriesSchaumont DriesSchaumont removed their assignment Nov 13, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug isin isin method Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants