Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut/paste (most) remaining tz funcs to tslibs/timezones #17526

Merged
merged 5 commits into from
Sep 15, 2017

Conversation

jbrockmendel
Copy link
Member

We're very nearly done with tslibs.timezones. I had hoped to bring over the remaining functions in smaller chunks, but this turns out to be the smallest independent subset that contains _get_dst_info and maybe_get_tz.

Getting _get_dst_info and maybe_get_tz separated is a milestone because it allows us to move tseries.frequencies.Resolution and related functions into cython without having a dependency on tslib. This in turn gets the dependency on khash out of tslib. A few more nice things become feasible.

This is almost pure cut/paste. The only change I made was replacing isinstance(tz, string_types) with is_string_object(tz). If requested, I'll do a follow-up to de-privatize the names.

timezones is within spitting distance of being valid python. Getting it over that hump would allow linting and coverage that is tough as it is.

Note that if/when #17363 is merged, a bunch of residual imports in tslib can be cleaned up.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine. need to get passing of course. also run some asv for timezones to confirm no changes.

@@ -254,7 +252,7 @@ def period_ordinal(int y, int m, int d, int h, int min,
return get_period_ordinal(y, m, d, h, min, s, us, ps, freq)


cpdef int64_t period_ordinal_to_dt64(int64_t ordinal, int freq) nogil:
cpdef period_ordinal_to_dt64(int64_t ordinal, int freq) nogil:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a mistake-- and apparently the cause of the build errors. Just pushed a fix.

@jreback jreback self-assigned this Sep 14, 2017
@jreback jreback added the Internals Related to non-user accessible pandas implementation label Sep 14, 2017
@jbrockmendel
Copy link
Member Author

asv continuous -f 1.1 -E virtualenv master HEAD -b tz -b timezone
[...]
       before           after         ratio
     [fa557f73]       [30c3fa6b]
-          1.58ms           1.07ms     0.68  timeseries.DatetimeIndex.time_reset_index_tz
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@jreback
Copy link
Contributor

jreback commented Sep 14, 2017

can you run benches for -b timeseries

@jbrockmendel
Copy link
Member Author

asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
          before           after         ratio
     [fa557f73]       [109d605e]
+         234±3μs         527±70μs     2.25  timeseries.SemiMonthOffset.time_begin_decr
+        789±10μs      1.33±0.08ms     1.69  timeseries.Offsets.time_custom_bmonthend_incr_n
+         241±3μs         405±30μs     1.68  timeseries.SemiMonthOffset.time_begin_decr_n
+           348μs            509μs     1.46  timeseries.DatetimeIndex.time_unique
+      6.80±0.2ms       9.74±0.5ms     1.43  timeseries.ResampleDataFrame.time_mean_string
+     6.99±0.07ms       9.94±0.8ms     1.42  timeseries.ResampleDataFrame.time_min_numpy
+          13.0ms       18.3±0.4ms     1.41  timeseries.ToDatetime.time_iso8601_nosep
+           2.13s            2.87s     1.35  timeseries.SemiMonthOffset.time_begin_apply_index
+      25.0±0.7ms       33.2±0.9ms     1.33  timeseries.TimeSeries.time_add_irregular
+      7.24±0.2ms       9.60±0.5ms     1.33  timeseries.ResampleDataFrame.time_min_string
+           1.75s            2.22s     1.27  timeseries.SemiMonthOffset.time_end_incr_rng
+     6.70±0.04ms       8.44±0.2ms     1.26  timeseries.TimeSeries.time_large_lookup_value
+         250±5μs          313±7μs     1.25  timeseries.TimeSeries.time_sort_index_monotonic
+      15.2±0.1ms           19.0ms     1.25  timeseries.ToDatetime.time_format_YYYYMMDD
+           1.78s            2.19s     1.23  timeseries.SemiMonthOffset.time_end_apply_index
+      37.3±0.9ms         45.7±2ms     1.23  timeseries.Iteration.time_iter_datetimeindex_preexit
+         149±4μs          182±3μs     1.22  timeseries.Offsets.time_timeseries_day_incr
+      7.61±0.2ms         9.21±2ms     1.21  timeseries.ResampleDataFrame.time_mean_numpy
+         143±4μs          173±4μs     1.21  timeseries.Offsets.time_timeseries_day_apply
+      17.7±0.5ms       21.0±0.1ms     1.19  timeseries.TimeDatetimeConverter.time_convert
+      84.7±0.6μs          100±4μs     1.18  timeseries.Offsets.time_custom_bday_apply
+           892ms            1.05s     1.18  timeseries.ToDatetime.time_iso8601_tz_spaceformat
+      14.9±0.1ms         17.5±1ms     1.17  timeseries.ResampleSeries.time_timestamp_downsample_mean
+           2.01s            2.32s     1.15  timeseries.SemiMonthOffset.time_begin_decr_rng
+       234±0.9μs         266±10μs     1.13  timeseries.SemiMonthOffset.time_end_decr
+        96.6±3μs          109±3μs     1.13  timeseries.Offsets.time_custom_bday_apply_dt64
+        48.0±1μs         54.0±2μs     1.12  timeseries.Offsets.time_timeseries_year_apply
+           3.72s            4.17s     1.12  timeseries.SeriesArithmetic.time_add_offset_slow
+      19.9±0.5ms       22.0±0.9ms     1.11  timeseries.TimeSeries.time_sort_index_non_monotonic
+           1.64s            1.81s     1.10  timeseries.SeriesArithmetic.time_add_offset_fast
+          1.61ms           1.78ms     1.10  timeseries.DatetimeIndex.time_reset_index_tz
-           2.20s            1.98s     0.90  timeseries.SemiMonthOffset.time_begin_incr_rng
-      9.25±0.2ms       8.11±0.1ms     0.88  timeseries.ResampleDataFrame.time_max_string
-          56.4ms           49.4ms     0.88  timeseries.DatetimeIndex.time_infer_freq_none
-          15.5ms       13.6±0.4ms     0.87  timeseries.ToDatetime.time_iso8601_format
-          9.38ms           8.14ms     0.87  timeseries.DatetimeIndex.time_normalize
-          8.87ms           7.55ms     0.85  timeseries.DatetimeIndex.time_infer_dst
-      34.1±0.9ms       28.1±0.3ms     0.82  timeseries.ResampleSeries.time_period_downsample_mean
-        305±10μs          244±1μs     0.80  timeseries.TimeSeries.time_timeseries_slice_minutely
-          18.2ms           13.7ms     0.76  timeseries.DatetimeIndex.time_dti_factorize
-          22.9ms           16.4ms     0.72  timeseries.DatetimeIndex.time_dti_tz_factorize
-          1.82ms           1.30ms     0.71  timeseries.DatetimeIndex.time_reset_index
-          5.19ms           2.97ms     0.57  timeseries.DatetimeIndex.time_add_offset_delta
-          23.3ms           12.5ms     0.54  timeseries.DatetimeIndex.time_add_offset_fast
-        38.1±1ms       18.4±0.7ms     0.48  timeseries.AsOfDataFrame.time_asof_nan_single
-          6.16ms           2.96ms     0.48  timeseries.DatetimeIndex.time_add_timedelta
-      19.1±0.2ms       8.98±0.3ms     0.47  timeseries.DatetimeAccessor.time_dt_accessor_normalize
-           2.54s            1.19s     0.47  timeseries.DatetimeIndex.time_add_offset_slow
-      23.1±0.8ms       10.8±0.2ms     0.47  timeseries.AsOf.time_asof_nan
-        36.4±2ms       16.9±0.4ms     0.46  timeseries.AsOfDataFrame.time_asof_single
-      12.5±0.8ms       5.72±0.1ms     0.46  timeseries.AsOf.time_asof_nan_single
-         139±4μs         63.2±2μs     0.46  timeseries.AsOf.time_asof_single
-      25.2±0.9ms       11.0±0.3ms     0.44  timeseries.AsOf.time_asof
-        143±10ms        61.7±10ms     0.43  timeseries.AsOfDataFrame.time_asof_nan
-        128±20ms        48.7±10ms     0.38  timeseries.AsOfDataFrame.time_asof
-        149±10μs         50.9±1μs     0.34  timeseries.AsOf.time_asof_single_early
-        547±50μs          166±7μs     0.30  timeseries.AsOfDataFrame.time_asof_single_early
-         153±4μs         44.4±1μs     0.29  timeseries.DatetimeAccessor.time_dt_accessor
-      14.0±0.5ms      2.62±0.09ms     0.19  frame_methods.frame_assign_timeseries_index.time_frame_assign_timeseries_index

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@jreback
Copy link
Contributor

jreback commented Sep 14, 2017

those are wildly inconsitent. do are using affinity?

@jbrockmendel
Copy link
Member Author

Did the previous asv run on the laptop. New one on the desktop:

asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
     [fa557f73]       [109d605e]
+         184±1ms          204±5ms     1.11  timeseries.SeriesArithmetic.time_add_offset_delta
-        202±10ms        182±0.8ms     0.90  timeseries.SeriesArithmetic.time_add_offset_fast
-         653±5ms          586±1ms     0.90  timeseries.SeriesArithmetic.time_add_offset_slow
-     9.57±0.04μs      7.68±0.04μs     0.80  timeseries.Offsets.time_timeseries_year_apply

@jreback
Copy link
Contributor

jreback commented Sep 15, 2017

ok that looks good. something failing.ping when green.

@codecov
Copy link

codecov bot commented Sep 15, 2017

Codecov Report

Merging #17526 into master will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17526      +/-   ##
==========================================
+ Coverage   91.18%    91.2%   +0.02%     
==========================================
  Files         163      163              
  Lines       49545    49606      +61     
==========================================
+ Hits        45179    45245      +66     
+ Misses       4366     4361       -5
Flag Coverage Δ
#multiple 88.99% <ø> (+0.04%) ⬆️
#single 40.19% <ø> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️
pandas/core/dtypes/concat.py 98.26% <0%> (-0.03%) ⬇️
pandas/core/indexes/datetimes.py 95.53% <0%> (ø) ⬆️
pandas/core/indexes/category.py 98.55% <0%> (ø) ⬆️
pandas/tseries/offsets.py 97.18% <0%> (+0.03%) ⬆️
pandas/core/indexes/period.py 92.77% <0%> (+0.03%) ⬆️
pandas/core/categorical.py 95.57% <0%> (+0.04%) ⬆️
pandas/core/common.py 91.98% <0%> (+0.32%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa557f7...0147c24. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 15, 2017

Codecov Report

Merging #17526 into master will increase coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17526      +/-   ##
==========================================
+ Coverage   91.18%    91.2%   +0.02%     
==========================================
  Files         163      163              
  Lines       49545    49606      +61     
==========================================
+ Hits        45179    45245      +66     
+ Misses       4366     4361       -5
Flag Coverage Δ
#multiple 88.99% <ø> (+0.04%) ⬆️
#single 40.19% <ø> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️
pandas/core/dtypes/concat.py 98.26% <0%> (-0.03%) ⬇️
pandas/core/indexes/datetimes.py 95.53% <0%> (ø) ⬆️
pandas/core/indexes/category.py 98.55% <0%> (ø) ⬆️
pandas/tseries/offsets.py 97.18% <0%> (+0.03%) ⬆️
pandas/core/indexes/period.py 92.77% <0%> (+0.03%) ⬆️
pandas/core/categorical.py 95.57% <0%> (+0.04%) ⬆️
pandas/core/common.py 91.98% <0%> (+0.32%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa557f7...0147c24. Read the comment docs.

@jbrockmendel
Copy link
Member Author

ping.

@jreback jreback added Clean Timezones Timezone data dtype labels Sep 15, 2017
@jreback jreback added this to the 0.21.0 milestone Sep 15, 2017
@jreback jreback merged commit 328c7e1 into pandas-dev:master Sep 15, 2017
@jreback
Copy link
Contributor

jreback commented Sep 15, 2017

thanks; would take a followup for _ renaming as well.

@jbrockmendel jbrockmendel deleted the tslibs-timezones5 branch September 15, 2017 15:30
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants