fread fails with uneven number of columns when max columns in final row (with fill=TRUE and col.names set) #2691

alexdthomas · 2018-03-20T22:03:29Z

This may be related to issue #1812, but as that one does not have a reproducible example to confirm, I thought it would be more appropriate to open a new issue.

When a file with an uneven number of columns has the max number of columns in the final row fread fails with the following error:

Error in fread("foo", header = FALSE, fill = TRUE, sep = ",", col.names = paste("V", :
Expecting 3 cols, but line 9 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

This occurs even with fill = TRUE and the maximum number of column names passed to col.names .

Here is a small example

text <- "12223, University\n12227, bridge, Sky\n12828, Sunset\n13801, Ground\n14853, Tranceamerica\n14854, San Francisco\n15595, shibuya, Shrine\n16126, fog, San Francisco\n16520, California, ocean, summer, golden gate, beach, San Francisco\n"
cat(text, file = "foo")
max.fields<-max(count.fields("foo", sep = ','))
fread("foo", header = FALSE, fill=TRUE, sep=",", col.names = paste("V", 1:max.fields, sep = ""))

However, when the row with the maximum number of fields is moved to the middle of the file (in this example row 6), fread behaves as expected.

text <- "12223, University\n12227, bridge, Sky\n12828, Sunset\n13801, Ground\n14853, Tranceamerica\n16520, California, ocean, summer, golden gate, beach, San Francisco\n14854, San Francisco\n15595, shibuya, Shrine\n16126, fog, San Francisco\n"
cat(text, file = "foo")
max.fields<-max(count.fields("foo", sep = ','))
fread("foo", header = FALSE, fill=TRUE, sep=",", col.names = paste("V", 1:max.fields, sep = ""))

I included this caveat in my answer to this Stackoverflow question

laptop session info

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_2.2.1        microbenchmark_1.4-4 data.table_1.10.4-3 

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2 scales_0.5.0     compiler_3.4.1   lazyeval_0.2.1   plyr_1.8.4       tools_3.4.1      pillar_1.2.1    
 [8] gtable_0.2.0     tibble_1.4.2     Rcpp_0.12.15     grid_3.4.1       rlang_0.2.0      munsell_0.4.3

Also tested on this machine, same results

#R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] vegan_2.4-4       lattice_0.20-35   permute_0.9-4     ggplot2_2.2.1     data.table_1.10.4 reshape2_1.4.3   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15     cluster_2.0.6    magrittr_1.5     MASS_7.3-49      munsell_0.4.3    colorspace_1.3-2
 [7] rlang_0.1.6      stringr_1.2.0    plyr_1.8.4       tools_3.4.4      parallel_3.4.4   grid_3.4.4      
[13] gtable_0.2.0     nlme_3.1-131.1   mgcv_1.8-23      digest_0.6.15    yaml_2.1.14      lazyeval_0.2.1  
[19] tibble_1.4.2     Matrix_1.2-11    labeling_0.3     stringi_1.1.6    compiler_3.4.4   pillar_1.1.0    
[25] scales_0.5.0

The text was updated successfully, but these errors were encountered:

tlapak · 2021-06-15T20:33:16Z

Both pieces of code produce, up to row order, identical output for me on current CRAN version (1.14.0).

Rajdeep-689 · 2023-04-24T23:46:43Z

Hi Team,

I have the same problem. I have multiple .csv files under a directory. I am reading that under a list iteration. The below is the code and warning. I have used fill=True, but not working anything it seems. Can someone please just guide me..

Code:
setwd('E:/SOH-WORKING/CSV')
content <- rbindlist(
lapply(
list.files(path = 'E:/SOH-WORKING/CSV', pattern = "*.csv"),
fread,
select = c('#LOCATION', 'DIV_NAME', 'GROUP_NAME', 'DEPT_NAME', 'CLASS_NAME', 'SUB_NAME', 'ITEM_DESC', 'SEASON_DESC', 'STYLE_DESC', 'COLOR_DESC', 'SIZE_DESC', 'AVAILABLE_QTY')
), use.names=TRUE, fill=TRUE
)

Log:
Warning messages:
1: In FUN(X[[i]], ...) :
Stopped early on line 84. Expected 96 fields but found 97. Consider fill=TRUE and comment.char=. First discarded non-empty line:
2: In FUN(X[[i]], ...) :
Stopped early on line 20. Expected 96 fields but found 97. Consider fill=TRUE and comment.char=. First discarded non-empty line:
3: In FUN(X[[i]], ...) :
Stopped early on line 72. Expected 96 fields but found 97. Consider fill=TRUE and comment.char=. First discarded non-empty line:
4: In FUN(X[[i]], ...) :
Stopped early on line 119. Expected 96 fields but found 97. Consider fill=TRUE and comment.char=. First discarded non-empty line:
5: In FUN(X[[i]], ...) :
Stopped early on line 218. Expected 96 fields but found 97. Consider fill=TRUE and comment.char=. First discarded non-empty line:
6: In FUN(X[[i]], ...) :
Stopped early on line 60. Expected 96 fields but found 97. Consider fill=TRUE and comment.char=. First discarded non-empty line:
7: In FUN(X[[i]], ...) :
Stopped early on line 53. Expected 96 fields but found 97. Consider fill=TRUE and comment.char=. First discarded non-empty line:
8: In FUN(X[[i]], ...) :
Stopped early on line 253. Expected 96 fields but found 97. Consider fill=TRUE and comment.char=. First discarded non-empty line:
9: In FUN(X[[i]], ...) :
Stopped early on line 214. Expected 96 fields but found 97. Consider fill=TRUE and comment.char=. First discarded non-empty line:

Please help me if there's any work around.

ben-schwen · 2024-03-21T11:25:43Z

#5119 added the examples as test cases. Both work now with fread(file, fill=TRUE)

st-pasha added bug fread labels Mar 21, 2018

MichaelChirico changed the title ~~fread fails with uneven number of columns when max collumns in final row (with fill=TRUE and col.names set)~~ fread fails with uneven number of columns when max columns in final row (with fill=TRUE and col.names set) Feb 19, 2019

MichaelChirico mentioned this issue Feb 19, 2019

Master list of most-requested issues #3189

Open

76 tasks

MichaelChirico added the High label May 30, 2020

jangorecki removed the High label Jun 3, 2020

ben-schwen mentioned this issue Aug 28, 2021

fread: use fill with integer as ncol guess #5119

Merged

ben-schwen added this to the 1.16.0 milestone Jan 5, 2024

MichaelChirico closed this as completed in #5119 Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fread fails with uneven number of columns when max columns in final row (with fill=TRUE and col.names set) #2691

fread fails with uneven number of columns when max columns in final row (with fill=TRUE and col.names set) #2691

alexdthomas commented Mar 20, 2018

tlapak commented Jun 15, 2021

Rajdeep-689 commented Apr 24, 2023

ben-schwen commented Mar 21, 2024

fread fails with uneven number of columns when max columns in final row (with fill=TRUE and col.names set) #2691

fread fails with uneven number of columns when max columns in final row (with fill=TRUE and col.names set) #2691

Comments

alexdthomas commented Mar 20, 2018

tlapak commented Jun 15, 2021

Rajdeep-689 commented Apr 24, 2023

ben-schwen commented Mar 21, 2024