Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support mangle_dupe_cols=False in pd.read_csv() #13262

Closed
gfyoung opened this issue May 23, 2016 · 11 comments
Closed

ENH: Support mangle_dupe_cols=False in pd.read_csv() #13262

gfyoung opened this issue May 23, 2016 · 11 comments
Labels
Enhancement IO CSV read_csv, to_csv

Comments

@gfyoung
Copy link
Member

gfyoung commented May 23, 2016

#12935 added full support for duplicate column names (in header or in names) by mangling them. While this has been considered acceptable by users, ideally, we would like to not have to mangle them.

@jreback jreback changed the title ENH: Support 'mangle_dupe_cols=False' in parsers.py ENH: Support mangle_dupe_cols=False in pd.read_csv() May 23, 2016
@jreback jreback added IO CSV read_csv, to_csv Compat pandas objects compatability with Numpy or Python functions Difficulty Intermediate labels May 23, 2016
@jreback jreback added this to the Next Major Release milestone May 23, 2016
@gfyoung
Copy link
Member Author

gfyoung commented Jul 27, 2017

@jreback : Given what you said in #17060, is this something we should still pursue ?

@jorisvandenbossche
Copy link
Member

Depending on how difficult this is, I would personally still have it as our goal to have mangle_dupe_cols=False implemented some time.

@caniko
Copy link

caniko commented Sep 18, 2018

What is the ETA on this issue?

@jreback
Copy link
Contributor

jreback commented Sep 18, 2018

when / if a community pull request happens

@gfyoung
Copy link
Member Author

gfyoung commented Sep 19, 2018

@caniko2 : This is quite a tricky one given that duplicate column names have unusual behavior in pandas. You are more than welcome to submit a PR to implement it if you like.

@jackzhenguo
Copy link

jackzhenguo commented May 22, 2019

Could anyone help me to check whether current pandas 0.24.2 support "mangle_dupe_cols=False"?

I find docs at http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html, showing : Passing in False will cause data to be overwritten if there are duplicate names in the columns.

Thanks so much!

@gfyoung
Copy link
Member Author

gfyoung commented May 22, 2019

Still no support, as behavior of data handling has proven to be quite non-trivial when there are duplicate column names. You are welcome to give it a shot though!

@grisaitis
Copy link

Still no support, as behavior of data handling has proven to be quite non-trivial when there are duplicate column names. You are welcome to give it a shot though!

is this issue still difficult to resolve?

@hepcat72
Copy link

Since I cannot set it to False and I cannot otherwise check for duplicated using df.columns.duplicated() on the dataframe returned by read_excel, how do I raise an exception when a duplicate is found - because it definitely causes a problem with other code - and the user needs to rectify it.

@Jaakkonen
Copy link

Jaakkonen commented Mar 21, 2022

The documentation really shouldn't have this option if it in reality doesn't exist. Or the docs should say that this is to-be-implemented (and has been in that state for almost 6 years already)

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@phofl
Copy link
Member

phofl commented Apr 9, 2023

The argument was removed, so closing

@phofl phofl closed this as completed Apr 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests