Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: Deprecate encoding from stata reader/writer #21400

Merged
merged 1 commit into from
Jun 12, 2018

Conversation

bashtage
Copy link
Contributor

@bashtage bashtage commented Jun 9, 2018

Deprecate the encoding parameter from all Stata reading and writing
methods and classes. The encoding depends only on the file format and
cannot be changed by users.

@bashtage
Copy link
Contributor Author

bashtage commented Jun 9, 2018

For 0.24

@codecov
Copy link

codecov bot commented Jun 9, 2018

Codecov Report

Merging #21400 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21400      +/-   ##
==========================================
+ Coverage   91.89%   91.89%   +<.01%     
==========================================
  Files         153      153              
  Lines       49596    49597       +1     
==========================================
+ Hits        45576    45577       +1     
  Misses       4020     4020
Flag Coverage Δ
#multiple 90.29% <100%> (ø) ⬆️
#single 41.86% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.23% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 415012f...4c02ceb. Read the comment docs.

@gfyoung gfyoung added Deprecate Functionality to remove in pandas IO Stata read_stata, to_stata labels Jun 9, 2018
@gfyoung
Copy link
Member

gfyoung commented Jun 9, 2018

@bashtage : How come your PR is closing #21244, which is already closed?

@@ -45,7 +45,7 @@ Other API Changes
Deprecations
~~~~~~~~~~~~

-
- :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated has deprecated ``encoding``. The encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • encoding --> the encoding argument
  • "The encoding" --> "The encoding" (extra space).

result = encoded.kreis1849[0]

expected = raw.kreis1849[0]
assert result == expected
assert isinstance(result, compat.string_types)

with tm.ensure_clean() as path:
encoded.to_stata(path, encoding='latin-1',
write_index=False, version=version)
encoded.to_stata(path, write_index=False, version=version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure that the DeprecationWarning is issued when we pass in the encoding argument?

@bashtage
Copy link
Contributor Author

bashtage commented Jun 9, 2018 via email

@bashtage bashtage force-pushed the deprecate-stata-encoding branch 2 times, most recently from dc00bd8 to 1e39bb2 Compare June 9, 2018 20:31
@bashtage
Copy link
Contributor Author

bashtage commented Jun 9, 2018

I added a test and fixed the whatsnew

with warnings.catch_warnings(record=True) as w:
warnings.simplefilter("always")
encoded = read_stata(self.dta_encoding, encoding='latin-1')
assert len(w) == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(repost since GitHub hid it due to recent push): Hmmm...I was thinking that you use tm.assert_produces_warning or pytest.warns so that we explicitly ensure that the warning is produced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this can be:

with tm.assert_produces_warning(FutureWarning):
    encoded = read_stata(self.dta_encoding, encoding='latin-1')

(and the same for the other occurrence)

Deprecate the encoding parameter from all Stata reading and writing
methods and classes.  The encoding depends only on the file format and
cannot be changed by users.
Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtage : For some reason, I missed the notification that you had patched the warnings setup in the tests. LGTM!

cc @jorisvandenbossche

@jreback jreback added this to the 0.24.0 milestone Jun 12, 2018
@jreback jreback merged commit 66d9b15 into pandas-dev:master Jun 12, 2018
@jreback
Copy link
Contributor

jreback commented Jun 12, 2018

thanks @bashtage

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
Deprecate the encoding parameter from all Stata reading and writing
methods and classes.  The encoding depends only on the file format and
cannot be changed by users.
@hmgaudecker
Copy link

I cannot check right now, but I do not think that the sentence " The encoding depends only on the file format and cannot be changed by users." is quite true. If I remember correctly, pre-118 Stata files used, in an undocumented fashion, the native encoding, Latin-1 on Windows and MacRoman on OS X, I don't remember for Linux. Officially, special characters were not supported.

I think there is a case to be made for the encoding keyword, but with old versions dying out I am not sure whether it is worth the effort.

@bashtage bashtage deleted the deprecate-stata-encoding branch September 20, 2018 15:49
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Deprecate the encoding parameter from all Stata reading and writing
methods and classes.  The encoding depends only on the file format and
cannot be changed by users.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_stata always uses 'utf8'
5 participants