Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.read_csv()/dtype and index_col combo #32930

Closed
sbwiecko opened this issue Mar 23, 2020 · 1 comment
Closed

pd.read_csv()/dtype and index_col combo #32930

sbwiecko opened this issue Mar 23, 2020 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request

Comments

@sbwiecko
Copy link

Code Sample, a copy-pastable example if possible

data=pd.read_csv(file, dtype={'donor':str}, index_col='donor') # dtype of the index IS NOT str
# to fix:
data= pd.read_csv(file, dtype={'donor': str})
data.set_index('donor', drop=True, inplace=True)

Problem description

during pd.read_csv(), when I set a dtype to one column and set the same column as index, the index dtype is lost

Expected Output

When both dtype and index_col are used with the same column, I want the index in dtype specificied in the pd.read_csv() call, i.e. in my example str

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 20.0.2
setuptools : 45.2.0
Cython : None
pytest : 5.0.1
hypothesis : 4.27.0
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.0.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : 0.12.3
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

@jbrockmendel jbrockmendel added the IO CSV read_csv, to_csv label Apr 1, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jul 28, 2020
@simonjayhawkins
Copy link
Member

closing as duplicate of #9435

@simonjayhawkins simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed Bug IO CSV read_csv, to_csv labels Aug 25, 2020
@simonjayhawkins simonjayhawkins removed this from the Contributions Welcome milestone Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants