Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted *printed* output from read_excel #82

Closed
felasa opened this issue Apr 17, 2015 · 11 comments
Closed

Unwanted *printed* output from read_excel #82

felasa opened this issue Apr 17, 2015 · 11 comments

Comments

@felasa
Copy link

felasa commented Apr 17, 2015

read_excel prints lines unrelated to the output:

file to use for example (tst_case.xls):

https://drive.google.com/uc?export=download&id=0B_h1BO3hBg0gX0lhSU5SQXBvd1E:

code:

> foo <- read_excel("./tst_case.xls")
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 a8 00 00 00 06 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 a8 00 00 00 06 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 a8 00 00 00 06 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 a8 00 00 00 06 00 
@felasa
Copy link
Author

felasa commented Apr 21, 2015

Just noticed that if I save the file as xlsx then the problem goes away.

@eibanez
Copy link

eibanez commented Apr 23, 2015

I think this has to do with the way libxls deals with named ranges (#79)

For future reference, the printf statement that creates this output is in the function xls_parseWorkBook in xls.c, under the line that contains XLS_RECORD_DEFINEDNAME

@almartin82
Copy link

Hi! Found another example of this.

name <- tempfile(pattern = "enr", tmpdir = tempdir(), fileext = ".zip")
tdir <- tempdir()

downloader::download("http://www.nj.gov/education/data/enr/enr11/enr.zip", dest=tname, mode="wb") 

utils::unzip(tname, exdir = tdir)

#read excel
enr_files <- utils::unzip(tname, exdir = ".", list = TRUE)
enr <- readxl::read_excel(paste0(tdir,'\\',enr_files$Name[1]))

outputs:

DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 23 65 00 00 19 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 23 65 00 00 19 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 23 65 00 00 19 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 23 65 00 00 19 00 

Excel name manager does not report and defined names or ranges in that file name - for whatever that's worth!

Super helpful package. Thanks for all your work on this--

@eibanez
Copy link

eibanez commented Apr 29, 2015

It looks like it could be a named range or a global function.

eibanez added a commit to eibanez/readxl that referenced this issue May 5, 2015
@krlmlr
Copy link
Member

krlmlr commented May 8, 2015

I have filed an issue with libxls: https://sourceforge.net/p/libxls/bugs/27/

@eibanez
Copy link

eibanez commented May 8, 2015

I already filed one and implemented it GitHub. They are debating whether to include my changes upstream.

@kendonB
Copy link

kendonB commented Oct 23, 2015

+1 for addressing this

@sheriferson
Copy link

I am still running into this issue with readxl version 0.1.0

I can't find a way to suppress the DEFINEDNAME output without also capturing the output I actually care about (the contents of the Excel sheet).

Is there a temporary workaround that we can implement in our R code, or do we have to wait for an update to libxls?

Update with workaround:
I thought I would contribute my temporary workaround. It's nothing genius, just wrapping any call to a function in readxl in capture.output(..., file = "/dev/null"). If you're on Windows you need to use file = 'NUL'.

e.g. a function read_all_sheets that reads all sheets in an Excel file and returns a list of dataframes.

if (require(readxl)) {
  read_all_sheets <- function(filename, rowsToSkip = 0) {
    capture.output(sheets <- excel_sheets(filename),
                   file = "/dev/null")

    capture.output(  # I need to do this because of the crappy `DEFINEDNAME` output
      x <- lapply(sheets, function(y) read_excel(filename, sheet = y,
                                               skip = rowsToSkip)),

      file = "/dev/null")
    names(x) <- sheets
    return(x)
  }
}

This is a modification on this Stack Overflow answer.

@krlmlr
Copy link
Member

krlmlr commented Dec 23, 2015

The upstream project, libxml, seems to be dormant: No commits for a year, no reaction to the issue I filed.

kbrose added a commit to kbrose/e-coli-beach-predictions that referenced this issue Jan 16, 2016
Using the fix found at
tidyverse/readxl#82 (comment)
to write the output to the null file. Should work on either
Windows, Linux, and Mac OSX at the least.
@t-kalinowski
Copy link

I recently ran into this issue. Similar in spirit to the solution above, I ended up masking the readxl functions in my global env with these silenced versions. Leaving this here in case anyone else runs into the same issue

excel_sheets <- function(path) {
  quiet_excel_sheets <- purrr::quietly(readxl::excel_sheets)
  out <- quiet_excel_sheets(path)
  if(length(c(out[["warnings"]], out[["messages"]])) == 0)
         return(out[["result"]])
  else readxl::excel_sheets(path)
}

read_excel <-  function(...) {
  quiet_read <- purrr::quietly(readxl::read_excel)
  out <- quiet_read(...)
  if(length(c(out[["warnings"]], out[["messages"]])) == 0)
    return(out[["result"]])
  else readxl::read_excel(...)
}

@arvi1000
Copy link

arvi1000 commented Dec 11, 2016

Another +1 for this issue. I'm using the same workaround posted above for now:

quiet_read <- purrr::quietly(readxl::read_excel)
my_data <- quiet_read("my_sheet.xls")$result

I'm reading an .xls (not .xlsx) and my sessionInfo is:

Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.5.6  readxl_0.1.1     magrittr_1.5     data.table_1.9.6 ggplot2_2.2.0   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6      assertthat_0.1   chron_2.3-47     grid_3.3.1       plyr_1.8.4      
 [6] gtable_0.2.0     formatR_1.4      scales_0.4.1     stringi_1.1.1    lazyeval_0.2.0  
[11] tools_3.3.1      stringr_1.1.0    munsell_0.4.3    colorspace_1.2-6 knitr_1.14      
[16] tibble_1.1 ```

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants