BUG: read_table error with tabs, dtype and index_col #4363

brentp · 2013-07-25T16:06:06Z

This gist demonstrates the problem:
https://gist.github.com/brentp/6066942

It's discussed in this thread:
https://groups.google.com/forum/#!topic/pydata/hIIZpZqbY5M

As discussed in the thread, there is some interaction when specifying dtype, index_col, and sep in read_csv.

As the gist shows, things work find with sep="\s+", but fail for the same set with sep="\t" even though the file is tab-delimited.

jtratner · 2013-07-25T21:03:01Z

Could you add a csv/tsv file to that file that demonstrates it? I know
copy/paste is easiest for ipython, but it'd be great to be able to just
download the raw data file and work with it :)

On Thu, Jul 25, 2013 at 12:06 PM, Brent Pedersen - Bioinformatics <
notifications@github.com> wrote:

This gist demonstrates the problem:
https://gist.github.com/brentp/6066942

It's discussed in this thread:
https://groups.google.com/forum/#!topic/pydata/hIIZpZqbY5M

As discussed in the thread, there is some interaction when specifying
dtype, index_col, and sep in read_csv.

As the gist shows, things work find with sep="\s+", but fail for the same
set with sep="\t" even though the file is tab-delimited.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/4363
.

brentp · 2013-07-25T21:12:51Z

I added the data file to the gist.

https://gist.github.com/brentp/6066942/raw/95742c7811f89e032194e1f32a849272e0268c15/t.txt

jtratner · 2013-07-25T22:42:20Z

I was playing around with this. Weirdly, if you pass the text directly to the reader in StringIO, it works, but doesn't if you read from the file:

text = StringIO(txt.replace("   ", "\t"))
df = pd.read_csv(text, dtype=np.int_, sep="\t", index_col=0)
print r"OK \t"

Additionally, if you remove the dtype, it also works:

df = pd.read_csv('t.txt', sep="\t")
print r"OK \t+", df.shape

jtratner · 2013-07-25T22:50:18Z

Okay, issue is the index column, everything works if you change index to integers.

jreback · 2016-04-26T17:22:34Z

dupe of #9435

ghost assigned jtratner Sep 9, 2013

jreback mentioned this issue Sep 30, 2013

BUG: wrong index name during read_csv if using usecols #5003

Merged

mcwitt mentioned this issue Mar 11, 2014

BUG: read_table() ignores dtype argument when multi-character separator is specified #6607

Closed

jreback unassigned jtratner Mar 30, 2014

jreback modified the milestones: 0.15.0, 0.14.0 Mar 30, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

jreback closed this as completed Apr 26, 2016

jreback added the Duplicate Report Duplicate issue or pull request label Apr 26, 2016

jreback modified the milestones: No action, Next Major Release Apr 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: read_table error with tabs, dtype and index_col #4363

BUG: read_table error with tabs, dtype and index_col #4363

brentp commented Jul 25, 2013

jtratner commented Jul 25, 2013

brentp commented Jul 25, 2013

jtratner commented Jul 25, 2013

jtratner commented Jul 25, 2013

jreback commented Apr 26, 2016

BUG: read_table error with tabs, dtype and index_col #4363

BUG: read_table error with tabs, dtype and index_col #4363

Comments

brentp commented Jul 25, 2013

jtratner commented Jul 25, 2013

brentp commented Jul 25, 2013

jtratner commented Jul 25, 2013

jtratner commented Jul 25, 2013

jreback commented Apr 26, 2016