Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency of SparseArray.__getitem__(SparseArray[bool]) #23122

Closed
TomAugspurger opened this issue Oct 13, 2018 · 1 comment · Fixed by #44955
Closed

Efficiency of SparseArray.__getitem__(SparseArray[bool]) #23122

TomAugspurger opened this issue Oct 13, 2018 · 1 comment · Fixed by #44955
Assignees
Labels
Performance Memory or execution speed performance Sparse Sparse Data Type
Milestone

Comments

@TomAugspurger
Copy link
Contributor

This currently densifies:

# TODO: I think we can avoid densifying when masking a
# boolean SparseArray with another. Need to look at the
# key's fill_value for True / False, and then do an intersection
# on the indicies of the sp_values.
if isinstance(key, SparseArray):
if is_bool_dtype(key):
key = key.to_dense()
else:
key = np.asarray(key)

I haven't investigated it, but we should be able to do a boolean mask as an
intersection sp_values on self and key. If key is SparseDtype[bool, False]
(i.e. False is the fill_value) this should be a lot faster.

@TomAugspurger TomAugspurger added Performance Memory or execution speed performance Sparse Sparse Data Type Difficulty Intermediate labels Oct 13, 2018
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Oct 13, 2018
@bdrum
Copy link
Contributor

bdrum commented Dec 8, 2021

take

bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
bdrum added a commit to bdrum/pandas that referenced this issue Dec 21, 2021
BUG: unary operators for SparseArray doesn't recalc indexes(pandas-dev#44956)
bdrum added a commit to bdrum/pandas that referenced this issue Dec 24, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.4 Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Sparse Sparse Data Type
Projects
None yet
4 participants