Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Timestamp/Timedelta constructors when passed a Timestamp/Timedelta #30543

Closed
jschendel opened this issue Dec 29, 2019 · 3 comments · Fixed by #30676
Closed

PERF: Timestamp/Timedelta constructors when passed a Timestamp/Timedelta #30543

jschendel opened this issue Dec 29, 2019 · 3 comments · Fixed by #30676
Assignees
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance Timedelta Timedelta data type
Milestone

Comments

@jschendel
Copy link
Member

The Timestamp constructor's performance could be improved when an existing Timestamp object is passed to it via an isinstance-like check:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.26.0.dev0+1469.ge817ffff3'

In [2]: ts = pd.Timestamp('2020')

In [3]: def timestamp_isinstance_shortcircuit(ts): 
   ...:     if isinstance(ts, pd.Timestamp): 
   ...:         return ts 
   ...:     return pd.Timestamp(ts) 
   ...:

In [4]: %timeit pd.Timestamp(ts)
849 ns ± 13.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit timestamp_isinstance_shortcircuit(ts)
121 ns ± 0.279 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Some care is needed in the constructor to check if other arguments have been passed, e.g. tz, where we wouldn't be able to directly return the Timestamp object.

Similar story for the Timedelta constructor (should be done in a separate PR):

In [6]: td = pd.Timedelta('1 day')

In [7]: def timedelta_isinstance_shortcircuit(td): 
   ...:     if isinstance(td, pd.Timedelta): 
   ...:         return td 
   ...:     return pd.Timedelta(td) 
   ...:

In [8]: %timeit pd.Timedelta(td)
800 ns ± 1.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [9]: %timeit timedelta_isinstance_shortcircuit(td)
120 ns ± 0.269 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

xref #30520

@jschendel jschendel added Datetime Datetime data dtype Performance Memory or execution speed performance Timedelta Timedelta data type labels Dec 29, 2019
@jschendel jschendel added this to the Contributions Welcome milestone Dec 29, 2019
@AlexKirko
Copy link
Member

take

@AlexKirko
Copy link
Member

AlexKirko commented Jan 3, 2020

Seems straightforward: just add shortcuts to the constructors, make sure they don't fire when unnecessary, and add tests that checks that if we pass a Timestamp / Timedelta object, then we get the same, unmutated object. I'll create the PRs tomorrow.

@AlexKirko
Copy link
Member

Ran into a blocking bug while working on this one and submitted an issue (#30692). Will work on it first to solve the current issue cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants