Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading xlsx and xls #208

Closed
msgoussi opened this issue Oct 10, 2016 · 4 comments
Closed

Reading xlsx and xls #208

msgoussi opened this issue Oct 10, 2016 · 4 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@msgoussi
Copy link

Dear Mr. Hadley,
I have a file downloaded from a database (in both version 2007 (xlsx) and 2003 (xls) )
If I use read_excel with use xlsx, get an error
Error: Couldn't find 'xl/styles.xml' in 'D:/My Documents/DRSD/R/EIU/EIU_Data.xlsx'

I think that sheets are xml files combined in Workbook.

if I use read_excel xls, get an error
Error in x[needs_ticks] <- paste0("", gsub("", "\`", x[needs_ticks]), :
NAs are not allowed in subscripted assignments

I really wish to help with this his error and show me how to read every sheet.

Thanks
xls file
https://drive.google.com/file/d/0B3Z74IvmfSYQQjF4bzhDXy1iTlU/view

xlsx file
https://drive.google.com/file/d/0B3Z74IvmfSYQeWNTemJJUzZ5WkE/view

@MichaelChirico
Copy link
Contributor

@msgoussi I don't have any problem with the first file:

fl = "EIU_Data.xls"
library(readxl)

all(sapply(excel_sheets(fl),
           function(x) 
             is.data.frame(read_excel(fl, sheet = x, na = "n.a."))))
# [1] TRUE

I'm running version: readxl_0.1.1.9000.

Your xlsx file is not shared

@donboyd5
Copy link

donboyd5 commented Dec 8, 2016

Hi,

I have the same problem when trying to read some xls files from the U.S. Bureau of Economic Analysis. Reproducible example below.

Not shown: xlsx::read.xlsx can read the files.

Don

# reproducible example of problem reading xls files from U.S. Bureau of Economic Analysis
library("readxl")
tfile <- tempfile()
download.file("http://www.bea.gov//national/nipaweb/GetCSV.asp?GetWhat=SS_Data/SectionAll_xls.zip&Section=11", tfile, mode="wb")
unzip(tfile, list=TRUE) # verify that we really have the file
tdir <- tempdir()
unzip(tfile, exdir=tdir)
dir(tdir) # verify that the xls files are there
excel_sheets(paste0(tdir, "/Section1all_xls.xls")) # verify that we can identify the file we want
df <- read_excel(paste0(tdir, "/Section1all_xls.xls"), sheet=2)
df
# Results in this error:
# Error in x[needs_ticks] <- paste0("`", gsub("`", "\\\\`", x[needs_ticks]),  : NAs are not allowed in subscripted assignments

@donboyd5
Copy link

I now see that the above problem is caused by bad column names in the Excel file. It is solved by using col_names=FALSE. It would be great if read_excel could generate a more descriptive warning in a case like this.

Don

@hadley hadley added the bug an unexpected problem or unintended behavior label Jan 3, 2017
@jennybc
Copy link
Member

jennybc commented Jan 6, 2017

  1. Agree readxl should not return a tibble with NA for 1 or more column names. That is Returned tibble has NAs in colnames #199.
  2. However, tibble should still be able to print such a thing. I've opened a pull request there Backtick NA names in printing; fixes 206 tibble#207.

@jennybc jennybc closed this as completed Jan 6, 2017
@lock lock bot locked and limited conversation to collaborators Oct 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

5 participants