Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Let MultiIndex.set_levels accept any iterable (#23273) #23291

Merged
merged 1 commit into from
Oct 25, 2018
Merged

BUG: Let MultiIndex.set_levels accept any iterable (#23273) #23291

merged 1 commit into from
Oct 25, 2018

Conversation

imankulov
Copy link
Contributor

@imankulov imankulov commented Oct 23, 2018

Quite surprisingly, is_list_like accepts a wide range of iterables including non-subscriptable objects. Later in the code set_levels implicitly supposes that passed instance can return its 0'th element by index, which is not always the case, as provided in the example of the #23273.

To address the issue, a new helper function is_subscriptable is added, and if passed levels look like a list, but are not subscriptable, they are explicltly converted to a list with list(levels)

@pep8speaks
Copy link

Hello @imankulov! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Oct 23, 2018

Codecov Report

Merging #23291 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23291      +/-   ##
==========================================
+ Coverage   92.22%   92.22%   +<.01%     
==========================================
  Files         169      169              
  Lines       50911    50913       +2     
==========================================
+ Hits        46954    46956       +2     
  Misses       3957     3957
Flag Coverage Δ
#multiple 90.65% <100%> (ø) ⬆️
#single 42.29% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.46% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1c26375...15f3505. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note in bug fixes / indexing

@@ -389,6 +390,9 @@ def set_levels(self, levels, level=None, inplace=False,
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
"""
if is_list_like(levels) and not is_subscriptable(levels):
levels = list(levels)

if level is not None and not is_list_like(level):
if not is_list_like(levels):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just add

elif is_list_like(levels):
   levels = list(levels)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this approach, but unfortunately, it breaks functionality where one passes to set_levels CategoricalIndex (or in general, something else which is more complex than a list).

Specifically, this turns

index.set_levels(CategoricalIndex(list("bac")), 0)

into

index.set_levels(list("bac"), 0)

and breaks test_set_levels_categorical

I can add change this with

if is_list_like(levels) and not isinstance(levels, Index):

but I'm not sure what's better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 2nd would be fine, we don't need/want any more api checkers)

new_sizes = map(int, ['3', '2', '1'])
index = pd.MultiIndex.from_arrays([sizes, colors], names=['size', 'color'])
index2 = index.set_levels(new_sizes, level='size')
assert_matching(index2.get_level_values('size'), [3, 2, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct the appropriate MultiIndex and use tm.assert_index_equal

Check if the object is subscriptable.
String types are not included as sequences here.

Parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed

@imankulov
Copy link
Contributor Author

Thanks for the review @jreback. Addressed all comments. Need advice on how to deal better with set_index(CategoricalIndex)

@@ -389,6 +390,9 @@ def set_levels(self, levels, level=None, inplace=False,
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'foo', u'bar'])
"""
if is_list_like(levels) and not is_subscriptable(levels):
levels = list(levels)

if level is not None and not is_list_like(level):
if not is_list_like(levels):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 2nd would be fine, we don't need/want any more api checkers)

index = pd.MultiIndex.from_arrays([sizes, colors], names=['size', 'color'])

new_sizes = map(int, ['3', '2', '1'])
index = index.set_levels(new_sizes, level='size')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result =

use expected =

@jreback jreback added this to the 0.24.0 milestone Oct 25, 2018
@jreback jreback merged commit 6b8e5e8 into pandas-dev:master Oct 25, 2018
@jreback
Copy link
Contributor

jreback commented Oct 25, 2018

thanks!

@imankulov imankulov deleted the gh23273 branch October 25, 2018 20:09
thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018
…ndas

* repo_org/master: (23 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...
thoo added a commit to thoo/pandas that referenced this pull request Oct 27, 2018
…xamples

* repo_org/master: (83 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas.MultiIndex.set_levels does not accept generator
3 participants