Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv: delim_whitespace ignored when skip_footer is set #4880

Closed
RutgerK opened this issue Sep 19, 2013 · 4 comments
Closed

read_csv: delim_whitespace ignored when skip_footer is set #4880

RutgerK opened this issue Sep 19, 2013 · 4 comments
Labels
Bug IO CSV read_csv, to_csv

Comments

@RutgerK
Copy link

RutgerK commented Sep 19, 2013

The delim_whitespace options no longer works when specifying a skip_footer other then zero. I can replicate the behavior in 0.12 with the following example:

import pandas as pd
from StringIO import StringIO

indata = StringIO("""1.2   5.6   8.5
4.5   6.7   6.4


""")

indata.seek(0)
df = pd.read_csv(indata, delim_whitespace=True, header=None, skip_footer=2)

Which returns:

                 0
0  1.2   5.6   8.5
1  4.5   6.7   6.4

Note how its only one column, instead of three. Changing the skip_footer to 0 makes it work as expected.

indata.seek(0)
df = pd.read_csv(indata, delim_whitespace=True, header=None, skip_footer=0)

Returns:

     0    1    2
0  1.2  5.6  8.5
1  4.5  6.7  6.4
2  NaN  NaN  NaN
3  NaN  NaN  NaN

If the above example is used with something like names=['a','b','c'] an (obvious) 'ValueError' exception occurs: Expected 3 fields in line 1, saw 1.

@jreback
Copy link
Contributor

jreback commented Oct 11, 2013

pushing to 0.14

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 9, 2014
@jtratner
Copy link
Contributor

this combination is just not supported at all any more, so either close this or change this to a request to support both delim_whitespace and skip_footer in one engine and/or to deal with the bug listed at end?

In [5]: df = pd.read_csv(indata, delim_whitespace=True, header=None, skip_footer=2)
Traceback (most recent call last)
    ...
ValueError: Falling back to the 'python' engine because the 'c' engine does not support skip_footer, but this causes 'delim_whitespace' to be ignored as it is not supported by the 'python' engine.

However, it fails badly if you switch to a regex delimiter:

>>> pd.read_csv(indata, sep='\s+', header=None, skip_footer=2)
Empty DataFrame
Columns: [0, 1, 2]
Index: []

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2016

@jreback : this isn't an issue anymore. You can most certainly specify both together now.

>>> from pandas import read_csv
>>> from pandas.comat import StringIO
>>> data = """1.2   5.6   8.5
4.5   6.7   6.4


"""
>>> read_csv(StringIO(data), delim_whitespace=True, header=None, skip_footer=2,
skip_blank_lines=False, engine='python')   # skip_footer not supported in C engine
     0    1    2
0  1.2  5.6  8.5
1  4.5  6.7  6.4

@jorisvandenbossche
Copy link
Member

Indeed. Note that the last comment from @jtratner about it failing was due to skip_blank_lines=True by default because of which the skipfooter was not needed anymore

@jorisvandenbossche jorisvandenbossche modified the milestones: No action, Next Major Release Jul 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

5 participants