Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: read_table error with tabs, dtype and index_col #4363

Closed
brentp opened this issue Jul 25, 2013 · 5 comments
Closed

BUG: read_table error with tabs, dtype and index_col #4363

brentp opened this issue Jul 25, 2013 · 5 comments
Labels
Bug Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv

Comments

@brentp
Copy link

brentp commented Jul 25, 2013

This gist demonstrates the problem:
https://gist.github.com/brentp/6066942

It's discussed in this thread:
https://groups.google.com/forum/#!topic/pydata/hIIZpZqbY5M

As discussed in the thread, there is some interaction when specifying dtype, index_col, and sep in read_csv.

As the gist shows, things work find with sep="\s+", but fail for the same set with sep="\t" even though the file is tab-delimited.

@jtratner
Copy link
Contributor

Could you add a csv/tsv file to that file that demonstrates it? I know
copy/paste is easiest for ipython, but it'd be great to be able to just
download the raw data file and work with it :)

On Thu, Jul 25, 2013 at 12:06 PM, Brent Pedersen - Bioinformatics <
notifications@github.com> wrote:

This gist demonstrates the problem:
https://gist.github.com/brentp/6066942

It's discussed in this thread:
https://groups.google.com/forum/#!topic/pydata/hIIZpZqbY5M

As discussed in the thread, there is some interaction when specifying
dtype, index_col, and sep in read_csv.

As the gist shows, things work find with sep="\s+", but fail for the same
set with sep="\t" even though the file is tab-delimited.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4363
.

@brentp
Copy link
Author

brentp commented Jul 25, 2013

@jtratner
Copy link
Contributor

I was playing around with this. Weirdly, if you pass the text directly to the reader in StringIO, it works, but doesn't if you read from the file:

text = StringIO(txt.replace("   ", "\t"))
df = pd.read_csv(text, dtype=np.int_, sep="\t", index_col=0)
print r"OK \t"

Additionally, if you remove the dtype, it also works:

df = pd.read_csv('t.txt', sep="\t")
print r"OK \t+", df.shape

@jtratner
Copy link
Contributor

Okay, issue is the index column, everything works if you change index to integers.

@jreback
Copy link
Contributor

jreback commented Apr 26, 2016

dupe of #9435

@jreback jreback closed this as completed Apr 26, 2016
@jreback jreback added the Duplicate Report Duplicate issue or pull request label Apr 26, 2016
@jreback jreback modified the milestones: No action, Next Major Release Apr 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

3 participants