Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF-#4929: Compute dtype when using Series.dt accessor #4930

Merged
merged 2 commits into from
Sep 9, 2022

Conversation

anmyachev
Copy link
Collaborator

@anmyachev anmyachev commented Sep 6, 2022

Signed-off-by: Myachev anatoly.myachev@intel.com

What do these changes do?

  • commit message follows format outlined here
  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves PERF: compute dtype when using Series.dt accessor #4929
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date
  • added (Issue Number: PR title (PR Number)) and github username to release notes for next major release

Signed-off-by: Myachev <anatoly.myachev@intel.com>
@anmyachev anmyachev changed the title PERF-#4929: compute dtype when using Series.dt accessor PERF-#4929: Compute dtype when using Series.dt accessor Sep 6, 2022
@codecov
Copy link

codecov bot commented Sep 6, 2022

Codecov Report

Merging #4930 (b01b348) into master (19ac8ff) will increase coverage by 4.71%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4930      +/-   ##
==========================================
+ Coverage   84.82%   89.54%   +4.71%     
==========================================
  Files         268      269       +1     
  Lines       19701    19984     +283     
==========================================
+ Hits        16712    17894    +1182     
+ Misses       2989     2090     -899     
Impacted Files Coverage Δ
...odin/core/storage_formats/pandas/query_compiler.py 96.60% <100.00%> (+0.56%) ⬆️
modin/logging/config.py 94.59% <0.00%> (-1.30%) ⬇️
modin/experimental/batch/test/test_pipeline.py 90.21% <0.00%> (ø)
modin/pandas/groupby.py 93.77% <0.00%> (+0.23%) ⬆️
modin/pandas/series.py 94.24% <0.00%> (+0.23%) ⬆️
modin/pandas/series_utils.py 99.43% <0.00%> (+0.56%) ⬆️
modin/core/io/text/excel_dispatcher.py 94.01% <0.00%> (+0.85%) ⬆️
modin/core/io/column_stores/parquet_dispatcher.py 96.25% <0.00%> (+2.08%) ⬆️
...tations/pandas_on_python/partitioning/partition.py 93.75% <0.00%> (+2.08%) ⬆️
... and 38 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Myachev <anatoly.myachev@intel.com>
@anmyachev anmyachev marked this pull request as ready for review September 6, 2022 11:37
@anmyachev anmyachev requested a review from a team as a code owner September 6, 2022 11:37
Copy link
Collaborator

@pyrito pyrito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much speedup are we getting with this change?

@anmyachev
Copy link
Collaborator Author

How much speedup are we getting with this change?

These changes improve asynchronous behavior when the next operation needs to know the types.

Comment on lines +1443 to +1445
dt_date = Map.register(_dt_prop_map("date"), dtypes=np.object_)
dt_time = Map.register(_dt_prop_map("time"), dtypes=np.object_)
dt_timetz = Map.register(_dt_prop_map("timetz"), dtypes=np.object_)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels weird, doesn't pandas have a specific dtype for datetime-s? I thought it did

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ser.dt is available on Series with datetime64 dtypes, the ser.dt.foo methods here return the dtypes specified

Comment on lines +1446 to +1451
dt_year = Map.register(_dt_prop_map("year"), dtypes=np.int64)
dt_month = Map.register(_dt_prop_map("month"), dtypes=np.int64)
dt_day = Map.register(_dt_prop_map("day"), dtypes=np.int64)
dt_hour = Map.register(_dt_prop_map("hour"), dtypes=np.int64)
dt_minute = Map.register(_dt_prop_map("minute"), dtypes=np.int64)
dt_second = Map.register(_dt_prop_map("second"), dtypes=np.int64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please check this is correct? I mean, wasting int64 to store numbers from 0 to 59 feels a little too much...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vnlitvinov agree that it does seem a bit wasteful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could probably be improved upstream. the lower-level calls return int32, get wrapped somewhere in int64

Copy link
Collaborator Author

@anmyachev anmyachev Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked using the following example, everything is so. Do I need to do something else or we can merge?

pd.Series(pd.to_timedelta(np.arange(5), unit='d')).dt

Copy link
Collaborator

@vnlitvinov vnlitvinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@anmyachev
Copy link
Collaborator Author

@vnlitvinov could you merge it or you wait for someone' review?

@anmyachev
Copy link
Collaborator Author

@pyrito do you know how to restart docs CI build?

@pyrito
Copy link
Collaborator

pyrito commented Sep 9, 2022

@pyrito do you know how to restart docs CI build?

The only way I've been able to rerun is to amend the commit and force push to retrigger everything.

@vnlitvinov
Copy link
Collaborator

@vnlitvinov could you merge it or you wait for someone' review?

I usually try to give at least a day after my approval to let someone else report any issues they see with a PR.
Seems that no one is objecting this time, so merging 😺

@vnlitvinov vnlitvinov merged commit b09dd42 into modin-project:master Sep 9, 2022
@anmyachev
Copy link
Collaborator Author

@vnlitvinov could you merge it or you wait for someone' review?

I usually try to give at least a day after my approval to let someone else report any issues they see with a PR. Seems that no one is objecting this time, so merging 😺

I see, thanks!

@anmyachev anmyachev deleted the issue4929 branch September 9, 2022 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: compute dtype when using Series.dt accessor
4 participants