Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in loc raised Error when non-integer slice was given for MultiIndex #37707

Merged
merged 31 commits into from
Nov 24, 2020
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c8df1c8
Bug in loc raised Error when non integer interval was given for Multi…
phofl Nov 9, 2020
9e62c50
Adjust tests
phofl Nov 9, 2020
8377a79
Comment it back in
phofl Nov 9, 2020
e6acaac
Run black
phofl Nov 9, 2020
37fe677
Rename test
phofl Nov 9, 2020
550ac88
Merge branch 'master' of https://github.com/pandas-dev/pandas into 25165
phofl Nov 9, 2020
d38c9e7
Change new implementation
phofl Nov 9, 2020
72345a4
Do slicing outside
phofl Nov 9, 2020
04056ae
Add test
phofl Nov 9, 2020
41dbbcc
Change issue number
phofl Nov 9, 2020
07f13f3
Merge branch 'master' of https://github.com/pandas-dev/pandas into 25165
phofl Nov 9, 2020
adaa27b
Delete None
phofl Nov 9, 2020
061468c
Merge branch 'master' of https://github.com/pandas-dev/pandas into 25165
phofl Nov 10, 2020
d59e7f7
Change whatsnew
phofl Nov 12, 2020
d9562c4
Add comment
phofl Nov 12, 2020
c382af0
Improve test
phofl Nov 12, 2020
694e469
Change test description
phofl Nov 12, 2020
d78d8ce
Change issue number in test
phofl Nov 13, 2020
d9a4054
Merge branch 'master' of https://github.com/pandas-dev/pandas into 25165
phofl Nov 13, 2020
c13da44
Fix comments
phofl Nov 14, 2020
0a6cab7
Add testcases
phofl Nov 14, 2020
4ffe46e
Merge branch 'master' of https://github.com/pandas-dev/pandas into 25165
phofl Nov 14, 2020
3e9baa6
Fix bugs
phofl Nov 14, 2020
c579c88
Adress review comments
phofl Nov 15, 2020
0be97cd
Merge branch 'master' of https://github.com/pandas-dev/pandas into 25165
phofl Nov 15, 2020
923610e
Fix whatsnew
phofl Nov 16, 2020
fdd170e
Fix blank line and if else
phofl Nov 17, 2020
3f3c133
Merge branch 'master' of https://github.com/pandas-dev/pandas into 25165
phofl Nov 17, 2020
4c17aeb
Parametrize test
phofl Nov 17, 2020
a766f98
Run black
phofl Nov 17, 2020
48424fb
Merge branch 'master' of https://github.com/pandas-dev/pandas into 25165
phofl Nov 24, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -584,6 +584,7 @@ Indexing
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using a listlike indexer containing NA values (:issue:`37722`)
- Bug in :meth:`DataFrame.xs` ignored ``droplevel=False`` for columns (:issue:`19056`)
- Bug in :meth:`DataFrame.loc` raising ``TypeError`` when non-integer slice was given to select values from :class:`MultiIndex` (:issue:`25165`, :issue:`24263`)

Missing
^^^^^^^
Expand Down
31 changes: 23 additions & 8 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -2676,11 +2676,20 @@ def _partial_tup_index(self, tup, side="left"):
return start + section.searchsorted(loc, side=side)

idx = self._get_loc_single_level_index(lev, lab)
if k < n - 1:
if isinstance(idx, slice) and k < n - 1:
jreback marked this conversation as resolved.
Show resolved Hide resolved
# Get start and end value from slice, necessary when a non-integer
# interval is given as input GH#37707
start = idx.start
end = idx.stop
elif k < n - 1:
end = start + section.searchsorted(idx, side="right")
start = start + section.searchsorted(idx, side="left")
else:
return start + section.searchsorted(idx, side=side)
if isinstance(idx, slice):
jreback marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this an if/else and then an else case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm sort of, L2688 should be an elif on L2687

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should be ok now

idx = idx.start
return start + section.searchsorted(idx, side=side)
else:
return start + section.searchsorted(idx, side=side)

def _get_loc_single_level_index(self, level_index: Index, key: Hashable) -> int:
"""
Expand Down Expand Up @@ -3014,6 +3023,8 @@ def convert_indexer(start, stop, step, indexer=indexer, codes=level_codes):
start = 0
if key.stop is not None:
stop = level_index.get_loc(key.stop)
elif isinstance(start, slice):
stop = len(level_index)
else:
stop = len(level_index) - 1
step = key.step
Expand Down Expand Up @@ -3048,22 +3059,26 @@ def convert_indexer(start, stop, step, indexer=indexer, codes=level_codes):

else:

code = self._get_loc_single_level_index(level_index, key)
idx = self._get_loc_single_level_index(level_index, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side-note: _get_loc_single_level_index looks sketchy. if level_index contains NA values, we should probably be doing get_loc and not returning -1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into this as a follow up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have looked into this. Problem is, we are taking an index level from the MultiIndex, which does not contain the nans, so we will always get a KeyError, even if the original MultiIndex had nan in this level.


if level > 0 or self.lexsort_depth == 0:
# Desired level is not sorted
locs = np.array(level_codes == code, dtype=bool, copy=False)
locs = np.array(level_codes == idx, dtype=bool, copy=False)
if not locs.any():
# The label is present in self.levels[level] but unused:
raise KeyError(key)
return locs

i = level_codes.searchsorted(code, side="left")
j = level_codes.searchsorted(code, side="right")
if i == j:
if isinstance(idx, slice):
jreback marked this conversation as resolved.
Show resolved Hide resolved
start = idx.start
end = idx.stop
else:
start = level_codes.searchsorted(idx, side="left")
end = level_codes.searchsorted(idx, side="right")
if start == end:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# The label is present in self.levels[level] but unused:
raise KeyError(key)
return slice(i, j)
return slice(start, end)

def get_locs(self, seq):
"""
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/multi/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -768,7 +768,7 @@ def test_timestamp_multiindex_indexer():
[
pd.date_range(
start="2019-01-02T00:15:33",
end="2019-01-05T02:15:33",
end="2019-01-05T03:15:33",
jreback marked this conversation as resolved.
Show resolved Hide resolved
freq="H",
name="date",
),
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/indexing/multiindex/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -640,3 +640,12 @@ def test_getitem_non_found_tuple():
)
with pytest.raises(KeyError, match=r"\(2\.0, 2\.0, 3\.0\)"):
df.loc[(2.0, 2.0, 3.0)]


def test_get_loc_datetime_index():
# GH#24263
index = pd.date_range("2001-01-01", periods=100)
mi = MultiIndex.from_arrays([index])
# Check if get_loc matches for Index and MultiIndex
assert mi.get_loc("2001-01") == slice(0, 31, None)
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
assert index.get_loc("2001-01") == slice(0, 31, None)
52 changes: 51 additions & 1 deletion pandas/tests/indexing/multiindex/test_partial.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
import numpy as np
import pytest

from pandas import DataFrame, Float64Index, Int64Index, MultiIndex
from pandas import (
DataFrame,
Float64Index,
Int64Index,
MultiIndex,
date_range,
to_datetime,
)
import pandas._testing as tm


Expand Down Expand Up @@ -208,6 +215,49 @@ def test_setitem_multiple_partial(self, multiindex_dataframe_random_data):
expected.loc["bar"] = 0
tm.assert_series_equal(result, expected)

def test_partial_getitem_loc_datetime(self):
# GH: 25165
date_idx = date_range("2019", periods=2, freq="MS")
df = DataFrame(
list(range(4)),
index=MultiIndex.from_product([date_idx, [0, 1]], names=["x", "y"]),
)
expected = DataFrame(
[2, 3],
index=MultiIndex.from_product(
[[to_datetime("2019-02-01")], [0, 1]], names=["x", "y"]
),
)
result = df["2019-2":]
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
tm.assert_frame_equal(result, expected)
result = df.loc["2019-2":]
tm.assert_frame_equal(result, expected)

result = df.loc(axis=0)["2019-2":]
tm.assert_frame_equal(result, expected)

result = df.loc["2019-2":, :]
tm.assert_frame_equal(result, expected)

df2 = df.swaplevel(0, 1).sort_index()
expected = expected.swaplevel(0, 1)

result = df2.loc[:, "2019-02":, :]
tm.assert_frame_equal(result, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if any way to parameterize this would be great

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eval is the only option I can think of, but would want to avoid this. Don't know how to parametrize the changing position of : can be done otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changing position of :

do you mean "2019-02": vs :"2019-02"? Those are slice("2019-02", None) and slice(None, "2019-02"), respectively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this take the same code path? Then this would work for sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not perfect, but way better. Thanks very much


expected = df.copy()
result = df[:"2019-2"]
tm.assert_frame_equal(result, expected)

result = df.loc[:"2019-2"]
tm.assert_frame_equal(result, expected)

result = df.loc(axis=0)[:"2019-2"]
tm.assert_frame_equal(result, expected)

result = df.loc[:"2019-2", :]
tm.assert_frame_equal(result, expected)


def test_loc_getitem_partial_both_axis():
# gh-12660
Expand Down