Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dates in numeric columns #263

Closed
gergness opened this issue Feb 13, 2017 · 7 comments
Closed

Dates in numeric columns #263

gergness opened this issue Feb 13, 2017 · 7 comments
Labels
datetime 📆 feature a feature request or enhancement

Comments

@gergness
Copy link
Contributor

When a date is in a column that is either numeric or character, readxl silently includes the numeric representation of the date. Should this give a warning? See some discussion in #256

@stla
Copy link

stla commented Feb 13, 2017

Strange, I've never seen that. With a xlsx file, I always get dates as POSIXct.

@gergness
Copy link
Contributor Author

Sorry, I didn't make myself very clear. This happens when a column has both text and dates in the same column.

library(readxl) 
read_excel("example.xls")
#> # A tibble: 3 × 1
#>           var1
#>          <chr>
#> 1        30-30
#> 2        20-20
#> 3 43018.000000

read_excel("example.xls", col_types = "date")
#> # A tibble: 3 × 1
#>         var1
#>       <dttm>
#> 1       <NA>
#> 2       <NA>
#> 3 2017-10-10
#> Warning messages:
#> 1: In xls_cols(path, sheet, col_names = col_names, col_types = col_types,  :
#>   Expecting date in [1, 1] got '30-30'
#> 2: In xls_cols(path, sheet, col_names = col_names, col_types = col_types,  :
#>   Expecting date in [2, 1] got '20-20'
 
#> # Or with new "list" type
read_excel("example.xls", col_types = "list")
#> # A tibble: 3 × 1
#>         var1
#>       <list>
#> 1  <chr [1]>
#> 2  <chr [1]>
#> 3 <dttm [1]>

@jennybc
Copy link
Member

jennybc commented Feb 13, 2017

@gergness Can you also attach the examples.xls file to this issue for completeness? Is this xls specific?

@gergness
Copy link
Contributor Author

Ha sure, sorry for not giving the full reprex. This problem does affect both .xls and .xlsx.

I've just realized that github allows .xlsx files (I didn't check after it wouldn't allow my .xls file), so I've attached the same data as a .xlsx: example.xlsx

And here's a dropbox link for the .xls: example.xls

And perhaps even better, here's using test files from the package (must be run from the Rproject's dir):

library(readxl)
test_sheet <- function(fname) rprojroot::find_testthat_root_file("sheets", fname)

types_xlsx <- read_excel(test_sheet("types.xlsx"), col_types = rep("text", 5))
types_xlsx$date
#> [1] "38717" "38717" NA  

# Note that the "Unknown type: 517" message is unrelated.
types_xls <- read_excel(test_sheet("types.xls"), col_types = rep("text", 5))
#> Unknown type: 517
types_xls$date
#> [1] "38717.000000" "38717"        NA  

@jennybc
Copy link
Member

jennybc commented Feb 14, 2017

Is this an exact duplicate of #259 then? I know about that.

@gergness
Copy link
Contributor Author

No, I just grabbed the first example spreadsheet I could find that had date formatting, so once you fix #259, if you read that data in as text it would still read: [1] "38717.000000" "38717" NA

This issue is about whether readxl should silently promote date cells to a number when they are within text or numeric columns.

When a date is encountered in a numeric column, I think that it should give a warning and set the value to NA instead of the numeric representation of the date. This would match the behavior of how it handles unexpected characters in a cell.

For a text column, I think that there is the possibility to do a little more, but I don't know how much complexity you'd be willing to add. Some possibilities are:

  1. Just give a warning, as above.
  2. Allow the user to specify a way to format the date as text (eg "%m/%d/%y" -> "10/10/2017")
  3. Read from the excel spreadsheet how Excel is formatting the date and convert it the same way.

@jennybc jennybc added datetime 📆 feature a feature request or enhancement labels Feb 17, 2017
@jennybc
Copy link
Member

jennybc commented Mar 1, 2017

See #118 re: the issue of creating strings from dates.

I'm going to use this issue just to track the treatment of date cells within a numeric column. Which I think should become NA and throw a warning.

@jennybc jennybc changed the title Dates in numeric and character columns Dates in numeric columns Mar 1, 2017
@lock lock bot locked and limited conversation to collaborators Oct 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
datetime 📆 feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants