Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: #22896, Fixed docstring of to_stata in pandas/core/frame.py #23152

Merged
merged 7 commits into from
Nov 21, 2018

Conversation

ryankarlos
Copy link
Contributor

@ryankarlos ryankarlos commented Oct 14, 2018

@pep8speaks
Copy link

pep8speaks commented Oct 14, 2018

Hello @ryankarlos! Thanks for updating the PR.

Comment last updated on November 20, 2018 at 21:28 Hours UTC

@ryankarlos ryankarlos changed the title To stata docstring DOC: #22896, Fixed docstring of to_stata in pandas/core/frame.py Oct 14, 2018
@ryankarlos
Copy link
Contributor Author

ryankarlos commented Oct 14, 2018

Ive followed the docstring format here https://pandas.pydata.org/pandas-docs/stable/contributing_docstring.html

Ive added in a returns section for to_stata in pandas/core/frame.py, corrected linting issues and modified the examples as some of the variables (mainly for the date example) were not defined and hence failing some of the tests.

For some reason, the tests were throwing up linting errors for an unrelated older pull request - but Ive corrected these as well.

@datapythonista datapythonista added Docs IO Stata read_stata, to_stata labels Oct 14, 2018
@datapythonista
Copy link
Member

Thanks for the work on this @ryankarlos

Can you leave in this PR just the changes in the to_stata docstring please? Things become more complex if we mix things, and I see some blank lines added in unrelated code, and changes to the split function (which are welcome, but in a separate PR).

I also see that there are PEP-8 issues in the examples, and may be they can be simplified a bit.

@codecov
Copy link

codecov bot commented Oct 14, 2018

Codecov Report

Merging #23152 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #23152   +/-   ##
=======================================
  Coverage   92.29%   92.29%           
=======================================
  Files         161      161           
  Lines       51493    51493           
=======================================
  Hits        47524    47524           
  Misses       3969     3969
Flag Coverage Δ
#multiple 90.68% <ø> (ø) ⬆️
#single 42.32% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.02% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99df7da...64e02d0. Read the comment docs.

>>> dates_1 = pd.to_datetime(["2016-05-06","2017-05-07", "2018-05-10"])
>>> dates_2 = pd.to_datetime(["2016-11-06","2017-11-07", "2018-11-10"])
>>> df_dates = pd.DataFrame({'date1': dates_1, 'date2': dates_2})
>>> df_dates.to_stata('./date_data_file.dta', {2 : 'tw'})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the {2: 'tw'} means? I know this is part of the original docstring, but I think we should make the examples in a way that they explain what is going on. I don't know much about stata or this function, and this without explanation is not useful to me.

Also, would be great to simplify this. May be just a date column if that makes sense, without creating variables first, and may be using date_range to avoid so much information that does not add value. And if we can create a single DataFrame at the beginning that is useful for all functions, that would be much better.

Finally, if we can use something looking more like a real example. I think it gives a better impression and helps understand than a [1, 2, 3]. We've got some examples with animals in pandas/core/generic.py, I'd use one of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tw is a Stata specific option for displaying time in weeks i believe - i have updated it with a single data frame using the animals examples from pandas/core/generic.py and excluded the separate example specifically for dates (i agree it does not really add any value).


Alternatively you can create an instance of the StataWriter class

>>> writer = StataWriter('./data_file.dta', data)
>>> StataWriter = pd.io.stata.StataWriter
>>> writer = StataWriter('./data_file.dta', df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use instead (assuming you're using the animals data):
>>> writer = pd.io.stata.StataWriter('animals.dta', df)

@ryankarlos
Copy link
Contributor Author

ryankarlos commented Oct 14, 2018

Thanks for the work on this @ryankarlos

Can you leave in this PR just the changes in the to_stata docstring please? Things become more complex if we mix things, and I see some blank lines added in unrelated code, and changes to the split function (which are welcome, but in a separate PR).

I also see that there are PEP-8 issues in the examples, and may be they can be simplified a bit.

@datapythonista Sorry for that, I have undone the changes to the other docstring and only left the to_stata one. I wasn't sure what those PEP-8 issues regarding the escape sequence were referring to (as they didn't show up in the flake8 test )- I have modified and simplified the examples and include - instead of \ between the days, month and year .... seems like that may have fixed the problem.

@ryankarlos
Copy link
Contributor Author

@datapythonista let me know if there are any updates required

@datapythonista
Copy link
Member

Looks good, added some comments.

@ryankarlos
Copy link
Contributor Author

@datapythonista Did you add comments to the latest commit - as I cannot find anything ?

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't have a computer right now, and it's tricky to do the reviews from the phone. I think the comments should be added now.

@@ -1846,12 +1846,15 @@ def to_stata(self, fname, convert_dates=None, write_index=True,
data_label=None, variable_labels=None, version=114,
convert_strl=None):
"""
Export Stata binary dta files.
Converting data frame object to Stata dta format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use DataFrame instead of data frame.

And dta between quotes to be consistent with the next paragraph (or probably better to remove them there).

Export Stata binary dta files.
Converting data frame object to Stata dta format.

Writes the Dataframe to a Stata dataset file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFrame with capital F

Converting data frame object to Stata dta format.

Writes the Dataframe to a Stata dataset file.
"dta" files contain a Stata dataset.

Parameters
----------
fname : path (string), buffer or path object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types in this line should be Python types. I think in this case should be: str, file descriptor or pathlib.Path

@@ -1896,6 +1899,10 @@ def to_stata(self, fname, convert_dates=None, write_index=True,

.. versionadded:: 0.23.0

Returns
------
Stata (dta) file or StataWriter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the Python returned type of the function. I don't think it returns anything. If this is the case just remove the section.


Examples
--------
>>> data.to_stata('./data_file.dta')
Converting dataframe with date column to Stata dta file
using the to_stata method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea about stata, but I don't think having a date column and indices adds any value here unless we can explain why.

I'd use a much simpler example. And I'd leave the StataWriter example to that docstring.

@ryankarlos
Copy link
Contributor Author

ryankarlos commented Oct 25, 2018

@datapythonista Thanks for the comments - incorporated the changes.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ryankarlos, I've just seen that I didn't submit the comments of the last review (I did it from the phone, and it's tricky).

Besides those, can you edit ci/code_checks.py and in the part of the doctests, stop excluding this docstring, as they should now pass.

Thanks!

... 'parrot'],
... 'speed': [350, 18, 361, 15]}).set_index(['date',
... 'animal'])
... 'speed': [350, 18, 361, 15]})
>>> df.to_stata('animals.dta')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add # doctest: +SKIP so the test is not run, and the file is not generated and left in the disk.

then the buffer will not be automatically closed after the file
data has been written.
fname : str, file descriptor or pathlib.Path
String or path to file which needs to be converted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the original description more.

pandas.io.stata.StataWriter117 : low-level writer for version 117 files
pandas.read_stata : Import Stata data files.
pandas.io.stata.StataWriter : Writer for Stata data files.
pandas.io.stata.StataWriter117 : Writer for version 117 files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pandas. prefix is unnecessary.

>>> writer.write_file()

With dates:
Converting DataFrame to Stata "dta" file using the to_stata method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is obvious given the context, I'd remove it.

@ryankarlos
Copy link
Contributor Author

@datapythonista added requested changes

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Can you add optional to the data_label parameter, and the default to version : {114, 117}, default 114.

I think with that is ready to be merged (if you can also take a look at the CI and make sure it's green, so can merge).

@ryankarlos
Copy link
Contributor Author

@datapythonista Thanks, changes added - all green now

@datapythonista
Copy link
Member

lgtm, just changed a word. I'll wait for for someone else to merge this, so we have an extra pair of eyes on it.

Thanks @ryankarlos

@jreback jreback added this to the 0.24.0 milestone Nov 20, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny comments. @bashtage if you can have a look.

pandas/core/frame.py Outdated Show resolved Hide resolved
@ryankarlos
Copy link
Contributor Author

@jreback All green now

@jreback jreback merged commit 8341cec into pandas-dev:master Nov 21, 2018
@jreback
Copy link
Contributor

jreback commented Nov 21, 2018

thanks @ryankarlos

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Fix the docstring of to_stata in pandas/core/frame.py
4 participants