Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: ensure IntervalIndex.left/right are 64bit if numeric #50130

Merged
merged 2 commits into from
Dec 11, 2022

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Dec 8, 2022

Extracted part for #49560, slightly altered.

IntervalIndex depends on IntervalTree which only accepts 64-bit int/uint/floats. This ensures that numpy numeric arrays in IntervalTree will always be 64-bit.

A question is if we should have non-64bit IntervalTrees. I've considered that that is maybe a different question than #49650, and I could write an issue about it and someone could take it in a follow-up, possibly before 2.0?

This PR changes nothing functionality-wise, but is needed when Index can have non-64bit indexes after #49650.

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a use-case for non 64-bit arrays?

def _maybe_convert_numeric_to_64bit(self, idx: Index) -> Index:
# IntervalTree only supports 64 bit numpy array
dtype = idx.dtype
if np.issubclass_(idx.dtype.type, np.number):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use dtype.type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes:)

@topper-123
Copy link
Contributor Author

Is there a use-case for non 64-bit arrays?

Not for me personally, it was more of a question I had when I wrote the code, and we now will support non-64-bit indexes. IMO this can come later, if someone sees a need for it.

@phofl
Copy link
Member

phofl commented Dec 11, 2022

Sounds good to me. What we have to keep in mind a bit: If we compile IntervalTree for all possible dtypes, this will further increase the size of pandas for end users. So as long as there is no use case, I would keep as is. But open to revisit if there is a situation where this would be helpful

@topper-123
Copy link
Contributor Author

topper-123 commented Dec 11, 2022

Yeah, I agree.

I’ve updated the PR.

@phofl phofl added the Interval Interval data type label Dec 11, 2022
@phofl phofl added this to the 2.0 milestone Dec 11, 2022
@phofl phofl merged commit 68537e3 into pandas-dev:main Dec 11, 2022
@phofl
Copy link
Member

phofl commented Dec 11, 2022

Thx @topper-123

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants