Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Index.to_frame method #17815

Merged
merged 1 commit into from
Oct 9, 2017

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Oct 8, 2017

Title is self-explanatory. Closes #15230.

@gfyoung gfyoung added Enhancement Indexing Related to indexing on series/frames, not to indexes themselves labels Oct 8, 2017
@gfyoung gfyoung added this to the 0.21.0 milestone Oct 8, 2017
@codecov
Copy link

codecov bot commented Oct 8, 2017

Codecov Report

Merging #17815 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17815      +/-   ##
==========================================
- Coverage   91.26%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       49978    49990      +12     
==========================================
+ Hits        45611    45614       +3     
- Misses       4367     4376       +9
Flag Coverage Δ
#multiple 89.05% <100%> (ø) ⬆️
#single 40.24% <16.66%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.47% <100%> (+0.01%) ⬆️
pandas/core/indexes/datetimes.py 95.61% <100%> (+0.02%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.74% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7db7f82...75f11cc. Read the comment docs.

@codecov
Copy link

codecov bot commented Oct 8, 2017

Codecov Report

Merging #17815 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17815      +/-   ##
==========================================
- Coverage   91.26%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       49980    49992      +12     
==========================================
+ Hits        45613    45616       +3     
- Misses       4367     4376       +9
Flag Coverage Δ
#multiple 89.05% <100%> (ø) ⬆️
#single 40.24% <16.66%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 96.39% <ø> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.61% <100%> (+0.02%) ⬆️
pandas/core/indexes/base.py 96.47% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.74% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f0ee53...1c0aaec. Read the comment docs.

@@ -1005,6 +1005,29 @@ def to_series(self, **kwargs):
index=self._shallow_copy(),
name=self.name)

def to_frame(self, index=True):
"""
Create a DataFrame with the columns the levels of the Index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be reworded. Unless I'm missing something this should only return one column for the base Index, which should only have a single level (MultiIndex has it's own method/docstring). So maybe something like "Create a DataFrame with the Index values as a column".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...bad English copied and pasted. Fixed.

Parameters
----------
index : boolean, default True
return the Index as the index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a little confusing to me on first read. Maybe something along the lines of "use this Index as the index of the resulting DataFrame" would be a little more clear?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...bad English copied and pasted. Fixed.

@@ -913,6 +913,45 @@ def to_series(self, keep_tz=False):
index=self._shallow_copy(),
name=self.name)

def to_frame(self, index=True, keep_tz=False):
"""
Create a DataFrame with the columns the levels of the Index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...bad English copied and pasted. Fixed.

Parameters
----------
index : boolean, default True
return the DatetimeIndex as the index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah...bad English copied and pasted. Fixed.


Returns
-------
DataFrame : a DataFrame containing the original Index data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Index" -> "DatetimeIndex"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -51,6 +51,20 @@ def test_to_series(self):
assert s.index is not idx
assert s.name == idx.name

def test_to_frame(self):
idx = self.create_index()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

DataFrame : a DataFrame containing the original Index data.
"""

from pandas import DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not do the same thing as .to_series() where the values and the index are the same.
is there a reason you are doing this?

Copy link
Member Author

@gfyoung gfyoung Oct 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where the values and the index are the same.

Not sure I get you here. The implementation I wrote tries to be consistent with what was done with MultiIndex by constructing DataFrame with data and setting the index if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is reasonable, side issue this is a little suspect:

In [4]: pd.MultiIndex.from_product([range(3),list('ab')], names=['foo', 'bar']).to_frame()
Out[4]: 
        bar  foo
foo bar         
0   a     a    0
    b     b    0
1   a     a    1
    b     b    1
2   a     a    2
    b     b    2

In [5]: pd.MultiIndex.from_product([range(3),list('ab')], names=['foo', 'bar']).to_series()
Out[5]: 
foo  bar
0    a      (0, a)
     b      (0, b)
1    a      (1, a)
     b      (1, b)
2    a      (2, a)
     b      (2, b)
dtype: object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is suspect about it?
The index seems toe same in both examples?

@jreback jreback removed this from the 0.21.0 milestone Oct 8, 2017
@gfyoung
Copy link
Member Author

gfyoung commented Oct 8, 2017

@jreback @jschendel : Addressed all comments, and everything is green. PTAL.

@@ -1010,14 +1010,14 @@ def _to_safe_for_reshape(self):

def to_frame(self, index=True):
"""
Create a DataFrame with the columns the levels of the MultiIndex
Create a DataFrame with the columns and levels of the MultiIndex.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem quite right. Maybe something like "Create a DataFrame with the levels of the MultiIndex as columns"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

@jschendel
Copy link
Member

Minor comment, otherwise LGTM.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments!

(I don't feel too enthusiastic about adding this method, but given the consistency between Index and MultiIndex I suppose this is good)

@@ -31,6 +31,7 @@ New features
- Added ``skipna`` parameter to :func:`~pandas.api.types.infer_dtype` to
support type inference in the presence of missing values (:issue:`17059`).
- :class:`~pandas.Resampler.nearest` is added to support nearest-neighbor upsampling (:issue:`17496`).
- :class:`~pandas.Index~` has added support for a ``to_frame`` method (:issue:`15230`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second ~ should not be there I think

Set the index of the returned DataFrame as self.

keep_tz : optional, defaults False.
return the data keeping the timezone.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have this keyword?
I suppose the reason is for consistency with to_series, but that has this keyword for historical reasons (I think): it dates from before we could have tz aware data inside a series / frame.

Given that, I am not sure it is needed to copy that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche : Given that you're not 100% sure with this logic, I'd rather stick with consistency. We can always remove (and deprecate) the argument in one sweep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only think of deprecating it, then it makes no sense to add it IMO.
(you can say the same with "we can always add it later if people want it" :-))


.. versionadded:: 0.20.0

Parameters
----------
index : boolean, default True
return this MultiIndex as the index
Set the index of the returned DataFrame as self.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would not use "self" in docstrings. For users that are not very familiar with writing classes themselves (and you don't need to do this for using pandas) this will sound very strange.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Fixed.

def test_to_frame(self):
# see gh-15230
idx = self.create_index()
name = idx.name or 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is at least one in the create_index indices that has a name? (or at least one that has no name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is.

@jreback jreback added this to the 0.21.0 milestone Oct 9, 2017
@jreback jreback merged commit 2ebe085 into pandas-dev:master Oct 9, 2017
@jreback
Copy link
Contributor

jreback commented Oct 9, 2017

thanks. maybe make an issue about this #17815 (comment) if you would

@gfyoung gfyoung deleted the to-frame-index branch October 9, 2017 16:34
ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants