-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readxl skips (and then later appends?) embedded blank lines #671
Comments
I'm not working on readxl right now but I will be soon-ish. Perhaps you can share the actual offending sheet (or something that behaves the same) with me privately at jenny@rstudio.com, the maintainer email for this package. Is there maybe something going on with hidden rows? |
Thanks, Jenny! Here is the link to offending sheet: https://github.com/tidyverse/readxl/files/7281312/test.xlsx |
Do you happen to know how this file was created? It must be some sort of 3rd party tool. It does some strange things in terms of attaching row number or cell address to individual rows or cells. Here is an excerpt (this is mostly a note-to-self, i.e. for me) <row r="23">
<c s="15" t="s">
<v>62</v>
</c>
<c s="17">
<v>44363.240671296298</v>
</c>
</row>
<row>...</row>
<row>...</row>
<row r="27">
<c s="4" t="s">
<v>37</v>
</c>
<c s="2"/>
<c s="4"/>
<c s="2"/>
<c s="2"/>
<c s="2"/>
<c s="2"/>
</row> This is one of those files where, broadly speaking, we have to infer cell location from relative location in the <row>
<c s="7" t="s">
<v>493</v>
</c>
<c s="6">
<v>142.8623</v>
</c>
<c s="16" t="s">
<v>65</v>
</c>
</row>
<row s="3" customFormat="1">
<c r="B40" s="3" t="s">
<v>400</v>
</c>
<c s="3" t="s">...</c>
<c s="3" t="s">...</c>
</row>
If it's any consolation, opening the file in Excel, then doing "Save as" to write out a fresh copy with the Excel application seems to result in a conventional |
Thinking about it more, maybe this would not be too much work to accommodate. I’ll have another look. |
OK fixed! Observations from before, which no longer hold, i.e. the view in R and Excel now match:
library(readxl)
library(tidyverse)
dat <- read_xlsx(
"investigations/iss671-weird-rows.xlsx",
sheet = "Summary", col_names = FALSE
)
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> * `` -> ...4
#> * `` -> ...5
#> * ...
dat %>%
select(1:2) %>%
print(n=40)
#> # A tibble: 112 × 2
#> ...1 ...2
#> <chr> <chr>
#> 1 Summary <NA>
#> 2 File Name on Meter 831_Data.057.s
#> 3 File Name on PC 831C_11049-20210616 082213-831_Data.057.ldbin
#> 4 Serial Number 11049
#> 5 Model SoundAdvisor™ Model 831C
#> 6 Firmware Version 04.0.8R0
#> 7 User <NA>
#> 8 Location <NA>
#> 9 Job Description <NA>
#> 10 Note <NA>
#> 11 <NA> <NA>
#> 12 Measurement <NA>
#> 13 Description <NA>
#> 14 Latitude GPS Not Synchronized
#> 15 Longitude GPS Not Synchronized
#> 16 Elevation GPS Not Synchronized
#> 17 Start 44363.348761574103
#> 18 Stop 44363.351539351897
#> 19 Duration 2.7777777777777801E-03
#> 20 Run Time 2.7777777777777801E-03
#> 21 Pause 0
#> 22 <NA> <NA>
#> 23 Pre-Calibration 44363.240671296298
#> 24 Post-Calibration None
#> 25 Calibration Deviation ---
#> 26 <NA> <NA>
#> 27 Overall Settings <NA>
#> 28 RMS Weight A Weighting
#> 29 Peak Weight Z Weighting
#> 30 Detector Fast
#> 31 Preamplifier PRM831
#> 32 Microphone Correction Off
#> 33 Integration Method Linear
#> 34 OBA Range Normal
#> 35 OBA Bandwidth 1/1 and 1/3
#> 36 OBA Frequency Weighting A Weighting
#> 37 OBA Max Spectrum Bin Max
#> 38 Gain 0
#> 39 Overload 142.8623
#> 40 <NA> A
#> # … with 72 more rows Created on 2022-03-22 by the reprex package (v2.0.1.9000) |
I'm having trouble reading a specific .xlsx file, where read_excel() will skip some embedded blank lines and then, sometimes, add them as blank rows. As a result the row order of data frame imported by readxl differs from the spreadsheet viewed in Excel.
The attached test.xlsx is 112 rows, with no leading or trailing whitespace, but the imported data frame is 108 rows. This is because some embedded blank rows are skipped and then, sometimes but not always, added later as blank rows.
For example, the Excel-viewed version of the spreadsheet shows a blank row at rows 11, 22, 26. But the imported data does not include these blank rows until later, in rows 37-39 of the data frame below.
How can I help solve this problem? It appears to be pretty specific to this spreadsheet (I can't seem to come up with a generic example that causes the same problem) but I don't understand why.
The text was updated successfully, but these errors were encountered: