Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

passing dates or datetime.date objects as parameters #198

Closed
rafazaya opened this issue Mar 27, 2018 · 11 comments
Closed

passing dates or datetime.date objects as parameters #198

rafazaya opened this issue Mar 27, 2018 · 11 comments

Comments

@rafazaya
Copy link

Not sure if this is considered a good reprex example for reticulate, but let me try this.

I'm having issues using datetime.date objects as parameters into my python scripts, where they become floats in the process Per the below, a datetime.date object as a default parameter works, passing it in does not, and returning the value returns the integer equivalent. Thoughts on how to pass a date to a function?

test.py:

from datetime import datetime, timedelta, date

def prior_day(edate=date.today()):
    return edate - timedelta(days=1)

def return_param(edate):
	return edate

Then in R...

library(reticulate)
#> Warning: package 'reticulate' was built under R version 3.4.4
conda_path <- "my/path"
env_name <- "envir"
use_condaenv(condaenv = env_name, conda = conda_path)

dt <- import("datetime")
edate <- dt$date(2018L,3L,26L)
edate
#> [1] "2018-03-26"
source_python("test.py")
prior_day()
#> [1] "2018-03-25"
prior_day(edate)
#> Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: unsupported operand type(s) for -: 'float' and 'datetime.timedelta'
#> 
#> Detailed traceback: 
#>   File "<string>", line 4, in prior_day
return_param(edate)
#> [1] 17616

Created on 2018-03-26 by the reprex package (v0.2.0).

@kevinushey
Copy link
Collaborator

Thanks for the bug report! This is primarily because we transform R Date objects into NumPy datetime64 arrays. See e.g.

> library(reticulate)
> r_to_py(Sys.Date())
['2018-03-26T00:00:00.000000000']

I believe this is typically the behaviour you want when manipulating e.g. Dates that live as part of a Pandas DataFrame, or as a vector / list of dates, but it does fail when one attempts to manipulate with plain old Python datetime objects.

I'm still not 100% sure whether we've made the right choice, though. Perhaps Date objects should be coerced into Python datetime objects? Maybe only times (e.g. POSIXt) should be converted into NumPy datetime64 objects?

@rafazaya
Copy link
Author

I'm very biased, but it would be cool to have some way to get to python datetime objects. I haven't played around with reticulate enough to investigate if there's an awkward workaround, which would be sufficient. I could also just write wrappers which converts the strings to dates in python and then passes arguments along within python, which may be best if this is a small use case generally.

I'm thankful for this package and the comment!

@kevinushey
Copy link
Collaborator

kevinushey commented Mar 27, 2018

For context, we make the conversion here:

reticulate/R/conversion.R

Lines 125 to 146 in 897c65d

#' @export
r_to_py.Date <- function(x, convert = FALSE) {
# we prefer datetime64 for efficiency
if (py_module_available("numpy"))
return(r_to_py.POSIXt(as.POSIXct(x)))
# otherwise, fallback to using Python's datetime class
datetime <- import("datetime", convert = convert)
items <- lapply(x, function(item) {
iso <- strsplit(format(x), "-", fixed = TRUE)[[1]]
year <- as.integer(iso[[1]])
month <- as.integer(iso[[2]])
day <- as.integer(iso[[3]])
datetime$date(year, month, day)
})
if (length(items) == 1)
items[[1]]
else
items
}

Perhaps there's a better way we could decide whether we should produce a Python datetime vs. a NumPy datetime64?

Maybe we should only convert dates / times contained within a data frame to a NumPy datetime64, and standalone date vectors should use (lists of) Python datetime objects?

@kevinushey
Copy link
Collaborator

@terrytangyuan I'm curious what you think re: choosing Python datetime vs. NumPy datetime64 here.

@terrytangyuan
Copy link
Contributor

I am all in for more performant conversion (I like the current default and fallback behavior) but I don't think this is 100% what the users might want. Perhaps we should expose this as an option?

@vqv
Copy link

vqv commented Apr 17, 2018

I just ran into this issue. The current choice makes sense if the primary purpose is to convert R data frames into Pandas data frames. However, when calling Python functions it is necessary to have exact control over the resulting type produced by r_to_py().

The current documentation says that single element R vector will be converted to a Python scalar (https://rstudio.github.io/reticulate/articles/calling_python.html). This suggests that a single Date should be converted to a scalar datetime rather than an array (of a single element). However, I don't think it's a good idea for the resulting type to be fundamentally depend on the length of the input, i.e. datetime versus NumPy array of datetime64's. So I don't think it is a good idea for r_to_py() to guess, and there should be a way to get explicit control over the resulting type.

What about exposing explicit converters, e.g. r_to_py_datetime() with the same functionality as here:

reticulate/R/conversion.R

Lines 133 to 145 in 897c65d

datetime <- import("datetime", convert = convert)
items <- lapply(x, function(item) {
iso <- strsplit(format(x), "-", fixed = TRUE)[[1]]
year <- as.integer(iso[[1]])
month <- as.integer(iso[[2]])
day <- as.integer(iso[[3]])
datetime$date(year, month, day)
})
if (length(items) == 1)
items[[1]]
else
items

@kevinushey
Copy link
Collaborator

kevinushey commented May 11, 2018

Thanks for the comments, everyone. I think we should adopt the following convention:

  • R Dates are converted to Python datetime objects and vice versa;
  • R times are converted to NumPy datetime64 objects and vice versa.

Does this sound reasonable? Note that I'm exploring this as part of #259.

Note that doing this also somewhat relieves reticulate from having to provide conversion helpers -- if you want a Python datetime, then you can just convert your R time to a date object, and then request the conversion.

@eddelbuettel
Copy link
Contributor

Just as an aside, we can't fully map to datetime64 as we only have 53 bits in POSIXct. That was one of the reasons I started the nanotime package which lives on top of bit64::integer64.

I think the route via Date is not good either as you'd loose all intra-day components (or maybe I am missing something). In that sense, better to 'just' loose the sub-microsecond part...

@kevinushey
Copy link
Collaborator

I think POSIXct <-> datetime64 as a default is at least sensible, but given that we do risk losing precision we may want to provide some APIs for using nanotime explicitly. Does that make sense?

Maybe we need some extensions to py_to_r() that allow users to define custom converters, that take column(s) of a Pandas DataFrame and produce R objects? Then in theory a user or a package could define a converter that goes from datetime64 <-> nanotime.

A similar argument could probably be made for r_to_py()...

@eddelbuettel
Copy link
Contributor

Yes, a hook to permit extensions seems sensible, as does the suggested default.

The added converters always pull a tail of other packages in: bit64 for representation, RcppCCTZ to parse/convert and nanotime for the actual datetime objects. A handful. Probably not a useful default, but possible Suggests: and optional use.

@rafazaya
Copy link
Author

Following up on this, it seems the (possibly new?) convert=FALSE parameters do the trick. For a scalar date, i can do the below:

library(reticulate)
dt <- import("datetime", convert = FALSE)
dt$date$today()
dt$date(2019L, 4L, 24L)

...and for the more general case, #259 is already merged.

I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants