-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF-#4929: Compute dtype
when using Series.dt
accessor
#4930
Conversation
Signed-off-by: Myachev <anatoly.myachev@intel.com>
dtype
when using Series.dt
accessor
Codecov Report
@@ Coverage Diff @@
## master #4930 +/- ##
==========================================
+ Coverage 84.82% 89.54% +4.71%
==========================================
Files 268 269 +1
Lines 19701 19984 +283
==========================================
+ Hits 16712 17894 +1182
+ Misses 2989 2090 -899
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Signed-off-by: Myachev <anatoly.myachev@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much speedup are we getting with this change?
These changes improve asynchronous behavior when the next operation needs to know the types. |
dt_date = Map.register(_dt_prop_map("date"), dtypes=np.object_) | ||
dt_time = Map.register(_dt_prop_map("time"), dtypes=np.object_) | ||
dt_timetz = Map.register(_dt_prop_map("timetz"), dtypes=np.object_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels weird, doesn't pandas have a specific dtype for datetime-s? I thought it did
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ser.dt is available on Series with datetime64 dtypes, the ser.dt.foo methods here return the dtypes specified
dt_year = Map.register(_dt_prop_map("year"), dtypes=np.int64) | ||
dt_month = Map.register(_dt_prop_map("month"), dtypes=np.int64) | ||
dt_day = Map.register(_dt_prop_map("day"), dtypes=np.int64) | ||
dt_hour = Map.register(_dt_prop_map("hour"), dtypes=np.int64) | ||
dt_minute = Map.register(_dt_prop_map("minute"), dtypes=np.int64) | ||
dt_second = Map.register(_dt_prop_map("second"), dtypes=np.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please check this is correct? I mean, wasting int64 to store numbers from 0 to 59 feels a little too much...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those are correct. See: https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.minute.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vnlitvinov agree that it does seem a bit wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could probably be improved upstream. the lower-level calls return int32, get wrapped somewhere in int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked using the following example, everything is so. Do I need to do something else or we can merge?
pd.Series(pd.to_timedelta(np.arange(5), unit='d')).dt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@vnlitvinov could you merge it or you wait for someone' review? |
@pyrito do you know how to restart docs CI build? |
The only way I've been able to rerun is to amend the commit and force push to retrigger everything. |
I usually try to give at least a day after my approval to let someone else report any issues they see with a PR. |
I see, thanks! |
Signed-off-by: Myachev anatoly.myachev@intel.com
What do these changes do?
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
dtype
when usingSeries.dt
accessor #4929docs/development/architecture.rst
is up-to-date