Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in loc raised Error when non-integer slice was given for MultiIndex #37707

Merged
merged 31 commits into from
Nov 24, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Nov 9, 2020

A slice return caused issues everywhere, when this function was used. So I unpacked it right in there.

@phofl phofl added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Nov 9, 2020
@@ -2702,7 +2702,10 @@ def _get_loc_single_level_index(self, level_index: Index, key: Hashable) -> int:
if is_scalar(key) and isna(key):
return -1
else:
return level_index.get_loc(key)
loc = level_index.get_loc(key)
# if isinstance(loc, slice):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is commented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shit, no I tried something out. This has to get back in. Will add it back momentarily

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented it back in. At least my test failed previously :)

@@ -598,3 +598,28 @@ def test_getitem_loc_commutability(multiindex_year_month_day_dataframe_random_da
result = ser[2000, 5]
expected = df.loc[2000, 5]["A"]
tm.assert_series_equal(result, expected)


def test_getitem_loc_datetime():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put partial in the name of the test, move to pandas/tests/indexing/test_partial.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@pep8speaks
Copy link

pep8speaks commented Nov 9, 2020

Hello @phofl! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-24 18:33:22 UTC

@phofl phofl changed the title Bug in loc raised Error when non integer interval was given for MulIndex Bug in loc raised Error when non integer interval was given for MultiIndex Nov 9, 2020
@phofl
Copy link
Member Author

phofl commented Nov 9, 2020

fixing #24263 was quite tricky. Have to return the slice here, because the codes do not equal our start value. I thinkt there is no other way as checking for slice outside of the function

@@ -469,6 +469,7 @@ Indexing
- Bug in :meth:`Index.where` incorrectly casting numeric values to strings (:issue:`37591`)
- Bug in :meth:`Series.loc` and :meth:`DataFrame.loc` raises when numeric label was given for object :class:`Index` although label was in :class:`Index` (:issue:`26491`)
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
- Bug in :meth:`DataFrame.loc` raised ``TypeError`` when non integer interval was given to select values from :class:`MultiIndex` (:issue:`25165`, :issue:`24263`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"non-integer :class:`Interval`"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, changed

@phofl phofl changed the title Bug in loc raised Error when non integer interval was given for MultiIndex Bug in loc raised Error when non-integer Interval was given for MultiIndex Nov 12, 2020
@jreback jreback added this to the 1.2 milestone Nov 13, 2020
pandas/core/indexes/multi.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member

@jreback im about to turn in, pls ping me on these indexing PRs to review in the AM

� Conflicts:
�	doc/source/whatsnew/v1.2.0.rst
@@ -475,6 +475,7 @@ Indexing
- Bug in :meth:`Series.loc` and :meth:`DataFrame.loc` raises when numeric label was given for object :class:`Index` although label was in :class:`Index` (:issue:`26491`)
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using a listlike indexer containing NA values (:issue:`37722`)
- Bug in :meth:`DataFrame.loc` raised ``TypeError`` when non-integer :class:`Interval` was given to select values from :class:`MultiIndex` (:issue:`25165`, :issue:`24263`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests dont seem to have anything to do with Intervals. Am i missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you are right. Poor wording on my side. I meant a slice.

@@ -3048,22 +3055,25 @@ def convert_indexer(start, stop, step, indexer=indexer, codes=level_codes):

else:

code = self._get_loc_single_level_index(level_index, key)
idx = self._get_loc_single_level_index(level_index, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side-note: _get_loc_single_level_index looks sketchy. if level_index contains NA values, we should probably be doing get_loc and not returning -1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into this as a follow up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have looked into this. Problem is, we are taking an index level from the MultiIndex, which does not contain the nans, so we will always get a KeyError, even if the original MultiIndex had nan in this level.

@phofl phofl changed the title Bug in loc raised Error when non-integer Interval was given for MultiIndex Bug in loc raised Error when non-integer slice was given for MultiIndex Nov 14, 2020
@phofl
Copy link
Member Author

phofl commented Nov 14, 2020

Hm the test I fixed was wrong previously I think

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments, pls merge master

end = start + section.searchsorted(idx, side="right")
start = start + section.searchsorted(idx, side="left")
else:
if isinstance(idx, slice):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this an if/else and then an else case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm sort of, L2688 should be an elif on L2687

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should be ok now

pandas/core/indexes/multi.py Show resolved Hide resolved
@@ -584,6 +584,7 @@ Indexing
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using a listlike indexer containing NA values (:issue:`37722`)
- Bug in :meth:`DataFrame.xs` ignored ``droplevel=False`` for columns (:issue:`19056`)
- Bug in :meth:`DataFrame.loc` raised ``TypeError`` when non-integer slice was given to select values from :class:`MultiIndex` (:issue:`25165`, :issue:`24263`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i incorrectly said in another PR to use a participle when I meant a gerund: "raised" -> "raising"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, changed

end = start + section.searchsorted(idx, side="right")
start = start + section.searchsorted(idx, side="left")
else:
if isinstance(idx, slice):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm sort of, L2688 should be an elif on L2687

else:
start = level_codes.searchsorted(idx, side="left")
end = level_codes.searchsorted(idx, side="right")
if start == end:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

expected = expected.swaplevel(0, 1)

result = df2.loc[:, "2019-02":, :]
tm.assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if any way to parameterize this would be great

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eval is the only option I can think of, but would want to avoid this. Don't know how to parametrize the changing position of : can be done otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changing position of :

do you mean "2019-02": vs :"2019-02"? Those are slice("2019-02", None) and slice(None, "2019-02"), respectively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this take the same code path? Then this would work for sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not perfect, but way better. Thanks very much

@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

this looks fine, can you rebase and ping on green

� Conflicts:
�	doc/source/whatsnew/v1.2.0.rst
@phofl
Copy link
Member Author

phofl commented Nov 24, 2020

@jreback green

@jreback jreback merged commit 1cc030e into pandas-dev:master Nov 24, 2020
@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

thanks @phofl very nice

@phofl phofl deleted the 25165 branch November 24, 2020 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
4 participants