-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dates in numeric columns #263
Comments
Strange, I've never seen that. With a |
Sorry, I didn't make myself very clear. This happens when a column has both text and dates in the same column. library(readxl)
read_excel("example.xls")
#> # A tibble: 3 × 1
#> var1
#> <chr>
#> 1 30-30
#> 2 20-20
#> 3 43018.000000
read_excel("example.xls", col_types = "date")
#> # A tibble: 3 × 1
#> var1
#> <dttm>
#> 1 <NA>
#> 2 <NA>
#> 3 2017-10-10
#> Warning messages:
#> 1: In xls_cols(path, sheet, col_names = col_names, col_types = col_types, :
#> Expecting date in [1, 1] got '30-30'
#> 2: In xls_cols(path, sheet, col_names = col_names, col_types = col_types, :
#> Expecting date in [2, 1] got '20-20'
#> # Or with new "list" type
read_excel("example.xls", col_types = "list")
#> # A tibble: 3 × 1
#> var1
#> <list>
#> 1 <chr [1]>
#> 2 <chr [1]>
#> 3 <dttm [1]> |
@gergness Can you also attach the examples.xls file to this issue for completeness? Is this xls specific? |
Ha sure, sorry for not giving the full reprex. This problem does affect both .xls and .xlsx. I've just realized that github allows .xlsx files (I didn't check after it wouldn't allow my .xls file), so I've attached the same data as a .xlsx: example.xlsx And here's a dropbox link for the .xls: example.xls And perhaps even better, here's using test files from the package (must be run from the Rproject's dir): library(readxl)
test_sheet <- function(fname) rprojroot::find_testthat_root_file("sheets", fname)
types_xlsx <- read_excel(test_sheet("types.xlsx"), col_types = rep("text", 5))
types_xlsx$date
#> [1] "38717" "38717" NA
# Note that the "Unknown type: 517" message is unrelated.
types_xls <- read_excel(test_sheet("types.xls"), col_types = rep("text", 5))
#> Unknown type: 517
types_xls$date
#> [1] "38717.000000" "38717" NA |
Is this an exact duplicate of #259 then? I know about that. |
No, I just grabbed the first example spreadsheet I could find that had date formatting, so once you fix #259, if you read that data in as text it would still read: This issue is about whether readxl should silently promote date cells to a number when they are within text or numeric columns. When a date is encountered in a numeric column, I think that it should give a warning and set the value to NA instead of the numeric representation of the date. This would match the behavior of how it handles unexpected characters in a cell. For a text column, I think that there is the possibility to do a little more, but I don't know how much complexity you'd be willing to add. Some possibilities are:
|
See #118 re: the issue of creating strings from dates. I'm going to use this issue just to track the treatment of date cells within a numeric column. Which I think should become |
When a date is in a column that is either numeric or character, readxl silently includes the numeric representation of the date. Should this give a warning? See some discussion in #256
The text was updated successfully, but these errors were encountered: