You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pandas raises an error to do with encoding, traceable back to this line:
Diagnosis
The error is caused by the “smart quote” character “, which is encoded in Latin-1 in the Stata .dta file, but it considered an invalid byte sequence in Unicode.
The errors originates in the StataReader class in io/stata.py:
Instead of 'utf-8', Pandas should use self._encoding or self._default_encoding, just like other parts of the code use when reading from the input buffer/file. Making the relevant change on my machine makes the issue go away.
Steps to reproduce
Expected behaviour
Pandas reads the stata file just fine.
Actual behaviour
Pandas raises an error to do with encoding, traceable back to this line:
Diagnosis
The error is caused by the “smart quote” character “, which is encoded in Latin-1 in the Stata
.dta
file, but it considered an invalid byte sequence in Unicode.The errors originates in the
StataReader
class inio/stata.py
:Instead of
'utf-8'
, Pandas should useself._encoding or self._default_encoding
, just like other parts of the code use when reading from the input buffer/file. Making the relevant change on my machine makes the issue go away.Output of
pd.show_versions()
pandas: 0.20.3
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: