-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reads blank rows after end of data #203
Comments
I have seen same behavior and it's a killer! Deleting rows by selecting the cells and hitting the delete key does not seem to delete the content. It disappears, but readxl finds it somehow! I did find that if you delete rows by selecting rows using their row numbers, right-clicking, and doing delete, that seems to really delete the rows. |
Dealing with the same issue, though it is easy to fix by using |
library(readxl)
#download.file("https://www1.nyc.gov/assets/finance/downloads/pdf/rolling_sales/annualized-sales/2012/2012_bronx.xls", destfile = "2012_bronx.xls")
df <- read_excel("2012_bronx.xls")
#> DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 04 00 04 00 00 00 14 00
#> and so on and so forth
df
#> # A tibble: 25,145 × 21
#> `BRONX ANNUALIZE SALE FOR YEAR 2012. All Sales From Jan 1, 2012 - Dec 31, 2012.`
#> <chr>
#> 1 Sales File as of 03/31/2013 Coop Sales Files as of 03/31/2013
#> 2 Neighborhood Name and Descriptive Data is as of 04/19/13
#> 3 Building Class Category is based on Building Class at Time of Sale.
#> 4 BOROUGH\n
#> 5 2.000000
#> 6 2.000000
#> 7 2.000000
#> 8 2.000000
#> 9 2.000000
#> 10 2.000000
#> # ... with 25,135 more rows, and 20 more variables: `NA` <chr>,
#> # `NA` <chr>, `NA` <chr>, `NA` <chr>, `NA` <chr>, `` <chr>, `` <chr>,
#> # `` <chr>, `` <chr>, `` <chr>, `` <chr>, `` <chr>, `` <chr>, `` <chr>,
#> # `` <chr>, `` <chr>, `` <chr>, `` <chr>, `` <chr>, `` <chr>
dim(df)
#> [1] 25145 21
df2 <- tidyr::drop_na(df)
dim(df2)
#> [1] 5152 21
as.data.frame(tail(df))
#> BRONX ANNUALIZE SALE FOR YEAR 2012. All Sales From Jan 1, 2012 - Dec 31, 2012.
#> 1 <NA>
#> 2 <NA>
#> 3 <NA>
#> 4 <NA>
#> 5 <NA>
#> 6 <NA>
#> NA NA NA NA NA
#> 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#>
#> 1 <NA> <NA> <NA> <NA> <NA> <NA>
#> 2 <NA> <NA> <NA> <NA> <NA> <NA>
#> 3 <NA> <NA> <NA> <NA> <NA> <NA>
#> 4 <NA> <NA> <NA> <NA> <NA> <NA>
#> 5 <NA> <NA> <NA> <NA> <NA> <NA>
#> 6 <NA> <NA> <NA> <NA> <NA> <NA> |
This comes from taking the declared dimensions literally. I can already fix for xlsx, by replacing Line 56 in 3338f92
Verified with xlsx saved from |
With some Excel files,
readxl
seems to read a large number of blank rows after the end of the visible data. For example, when reading this file from the New York City Department of Finance (withskip=4
),readxl
reads 25,141 rows, but 19,990 of them are blank. I'm using the CRAN version (0.1.1). (Sorry if it's verboten to report issues with the CRAN version here; I posted on Stack Overflow but didn't get any useful responses)The text was updated successfully, but these errors were encountered: