Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: de-duplicate DST tzconversion code #35077

Closed
wants to merge 46 commits into from

Conversation

jbrockmendel
Copy link
Member

This implements TZConvertInfo to de-duplicate a bunch of tzconversion code. This uses the new pattern on get_resolution. The other usages I intend to update in individual follow-ups in which I'll check that we have good asv coverage the affected functions.

Sits on top of #35075 (actually, that was broken off of this)

Perf-improving according to the newly implemented asvs:

$ asv continuous -E virtualenv -f 1.01 master HEAD -b resolution
[...]
       before           after         ratio
     [559189ad]       [8114413b]
-     1.89±0.08μs      1.64±0.05μs     0.87  tslibs.resolution.TimeResolution.time_get_resolution('s', 1, None)

L50-L57 in libresolution is going to end up being repeated, should become a helper function. Doing that separately since it tentatively looks like that may have a perf hit if not done carefully.

xref 652a3de#r30437889

@jreback jreback added Clean Timezones Timezone data dtype labels Jul 2, 2020
@jreback
Copy link
Contributor

jreback commented Jul 2, 2020

also rebase

@jbrockmendel
Copy link
Member Author

Pretty much every variant of this I can come up with segfaults on Linux but not on Mac. cc @chris-b1 any thoughts?

@WillAyd
Copy link
Member

WillAyd commented Jul 6, 2020

Any chance you've tried to debug the segfault using the steps outlined in #35100? Might be helpful to step through the debugger (replace lldb with gdb for linux most likely)

@WillAyd
Copy link
Member

WillAyd commented Jul 7, 2020

@jbrockmendel which test(s) are segfaulting? I don't see any in CI

@jbrockmendel
Copy link
Member Author

which test(s) are segfaulting? I don't see any in CI

They're not segfaulting ATM because they are being caught by debugging assertions first. e.g. https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=38537&view=logs&j=a3a13ea8-7cf0-5bdb-71bb-6ac8830ae35c&t=add65f64-6c25-5783-8fd6-d9aa1b63d9d4&l=160 the assertion that raises is https://github.com/pandas-dev/pandas/pull/35077/files?file-filters%5B%5D=.pxd&file-filters%5B%5D=.pyx#diff-03511a52390e3160aa5d428a7b45a4f4R57, but if we get to that line, then we should have already executed https://github.com/pandas-dev/pandas/pull/35077/files?file-filters%5B%5D=.pxd&file-filters%5B%5D=.pyx#diff-4260fb485bcfabb1da9cc5a85ed6d639R265, which isnt raising despite asserting the same thing.

Locally I just got a segfault inside tests.frame.test_to_csv, but running the tests on just that file generally doesnt produce the segfault (well, 2 runs out of the last 5)

@jbrockmendel
Copy link
Member Author

This seems to confirm the suspicion that this is a self.utcoffsets[self.positions[i]] lookup issue

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fffcfef50eb in __pyx_f_6pandas_5_libs_6tslibs_6arrays_2TZ_get_local_timestamp (__pyx_v_i=0, __pyx_v_utc_value=946886400000000000, __pyx_v_self=0x7ffedc7374c0) at pandas/_libs/tslibs/arrays.c:3530
3530	    __pyx_v_local_val = (__pyx_v_utc_value + (__pyx_v_self->utcoffsets[(__pyx_v_self->positions[__pyx_v_i])]));

@WillAyd
Copy link
Member

WillAyd commented Jul 7, 2020

My latest thought is that the positions array is getting freed once get_tzconverter exits, and subsequent access of positions is returning either strange data or segfaulting

@WillAyd
Copy link
Member

WillAyd commented Jul 7, 2020

^ expanding on the above, if within get_tzconverter I change the line that reads:

info.positions = <intp_t*>cnp.PyArray_DATA(pos)

to

from libc.string cimport memcpy
from cpython.mem cimport PyMem_Malloc

info.positions = <intp_t *>PyMem_Malloc(n * sizeof(intp_t))
memcpy(info.positions, pos.data, n * sizeof(intp_t))

Then the tests pass. This is probably not the best way to code this and leaks memory as is, but I think confirms the suspicion that memory management in get_tzconverter is the root problem

@jbrockmendel
Copy link
Member Author

This is probably not the best way to code this and leaks memory as is, but I think confirms the suspicion that memory management in get_tzconverter is the root problem

This seems right, good call.

In a different branch where I use a cdef class instead of a struct, retaining the pos ndarray as an attribute prevents the refcount from going to zero, and fixes the problem. (In this branch that uses a struct, it won't allow an ndarray for a struct member)

@jbrockmendel
Copy link
Member Author

Updated in a way that passes locally on linux, I think addresses the memory release issue @WillAyd pinpointed. Running asvs now. If this doesnt hurt perf, then ill want to use this pattern to de-verbose other places, xref #35168

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool looks good. yea this approach probably works better with automatic memory management

int noffsets
int64_t* utcoffsets
intp_t* positions
ndarray positions_arr # needed to avoid segfault
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need both positions_arr and positions or can we just use the former?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could just use the former, but i think we get a perf boost from indexing on the latter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can see a difference then sure, but if not would definitely be cleaner to just stick with the ndarray

@jreback jreback added this to the 1.1 milestone Jul 7, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, when you are happy @WillAyd

@jbrockmendel
Copy link
Member Author

ATM getting a non-trivial perf hit; I think I didnt see this with the struct version

time asv continuous -E virtualenv -f 1.01 master HEAD -b resolution
[...]
       before           after         ratio
     [031fb168]       [bb9c87f8]
     <master>         <ref-maybe_get_tz>
+        1.32±0μs      1.82±0.01μs     1.38  tslibs.resolution.TimeResolution.time_get_resolution('s', 1, datetime.timezone.utc)
+     1.32±0.01μs      1.79±0.04μs     1.36  tslibs.resolution.TimeResolution.time_get_resolution('m', 1, datetime.timezone.utc)
+     1.33±0.02μs      1.78±0.03μs     1.34  tslibs.resolution.TimeResolution.time_get_resolution('h', 1, datetime.timezone.utc)
+     1.33±0.01μs      1.78±0.05μs     1.34  tslibs.resolution.TimeResolution.time_get_resolution('D', 1, datetime.timezone.utc)
+     1.35±0.02μs      1.80±0.03μs     1.33  tslibs.resolution.TimeResolution.time_get_resolution('us', 1, datetime.timezone.utc)
+     1.34±0.02μs      1.79±0.02μs     1.33  tslibs.resolution.TimeResolution.time_get_resolution('ns', 1, datetime.timezone.utc)
+     1.41±0.02μs      1.76±0.02μs     1.24  tslibs.resolution.TimeResolution.time_get_resolution('ns', 1, None)
+     1.44±0.02μs      1.78±0.02μs     1.24  tslibs.resolution.TimeResolution.time_get_resolution('h', 1, None)
+     1.43±0.01μs         1.76±0μs     1.23  tslibs.resolution.TimeResolution.time_get_resolution('D', 1, None)
+     1.42±0.02μs      1.75±0.01μs     1.23  tslibs.resolution.TimeResolution.time_get_resolution('s', 1, None)
+     1.45±0.02μs      1.78±0.02μs     1.23  tslibs.resolution.TimeResolution.time_get_resolution('us', 1, None)
+     7.94±0.02μs      8.62±0.02μs     1.09  tslibs.resolution.TimeResolution.time_get_resolution('s', 100, datetime.timezone.utc)
+     7.86±0.02μs      8.53±0.02μs     1.09  tslibs.resolution.TimeResolution.time_get_resolution('ns', 100, datetime.timezone.utc)
+     8.05±0.02μs      8.72±0.02μs     1.08  tslibs.resolution.TimeResolution.time_get_resolution('h', 100, datetime.timezone.utc)
+     8.00±0.02μs      8.64±0.02μs     1.08  tslibs.resolution.TimeResolution.time_get_resolution('m', 100, datetime.timezone.utc)
+     8.38±0.04μs      9.04±0.06μs     1.08  tslibs.resolution.TimeResolution.time_get_resolution('us', 100, datetime.timezone.utc)
+     8.02±0.04μs      8.62±0.03μs     1.07  tslibs.resolution.TimeResolution.time_get_resolution('D', 100, datetime.timezone.utc)
+     8.51±0.02μs      9.11±0.02μs     1.07  tslibs.resolution.TimeResolution.time_get_resolution('us', 100, None)
+     7.97±0.03μs      8.51±0.03μs     1.07  tslibs.resolution.TimeResolution.time_get_resolution('ns', 100, None)
+     8.04±0.03μs      8.58±0.02μs     1.07  tslibs.resolution.TimeResolution.time_get_resolution('D', 100, None)
+     8.11±0.02μs      8.64±0.02μs     1.07  tslibs.resolution.TimeResolution.time_get_resolution('m', 100, None)
+      13.2±0.4μs       14.0±0.3μs     1.06  tslibs.resolution.TimeResolution.time_get_resolution('m', 100, datetime.timezone(datetime.timedelta(0, 3600)))
+     8.14±0.01μs      8.66±0.03μs     1.06  tslibs.resolution.TimeResolution.time_get_resolution('h', 100, None)
+     8.09±0.05μs      8.58±0.04μs     1.06  tslibs.resolution.TimeResolution.time_get_resolution('s', 100, None)
+      13.1±0.4μs       13.9±0.1μs     1.06  tslibs.resolution.TimeResolution.time_get_resolution('h', 100, datetime.timezone(datetime.timedelta(0, 3600)))
+      13.2±0.5μs      13.8±0.03μs     1.05  tslibs.resolution.TimeResolution.time_get_resolution('D', 100, datetime.timezone(datetime.timedelta(0, 3600)))
+      13.6±0.4μs      14.3±0.05μs     1.05  tslibs.resolution.TimeResolution.time_get_resolution('us', 100, datetime.timezone(datetime.timedelta(0, 3600)))
+      13.2±0.4μs      13.8±0.07μs     1.05  tslibs.resolution.TimeResolution.time_get_resolution('ns', 100, datetime.timezone(datetime.timedelta(0, 3600)))
+       717±0.9μs          749±3μs     1.04  tslibs.resolution.TimeResolution.time_get_resolution('us', 10000, datetime.timezone(datetime.timedelta(0, 3600)))
+         666±1μs          691±1μs     1.04  tslibs.resolution.TimeResolution.time_get_resolution('m', 10000, datetime.timezone.utc)
+         667±1μs        690±0.9μs     1.04  tslibs.resolution.TimeResolution.time_get_resolution('m', 10000, None)
+         674±2μs          697±3μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('h', 10000, datetime.timezone(datetime.timedelta(0, 3600)))
+         663±1μs          684±4μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('D', 10000, None)
+         658±2μs          677±2μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('ns', 10000, datetime.timezone.utc)
+         674±3μs          694±4μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('h', 10000, datetime.timezone.utc)
+         686±2μs          706±2μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('m', 10000, datetime.timezone(datetime.timedelta(0, 3600)))
+         684±3μs          703±2μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('s', 10000, datetime.timezone(datetime.timedelta(0, 3600)))
+         666±2μs        685±0.2μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('s', 10000, None)
+         656±2μs          675±2μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('ns', 10000, None)
+         678±2μs          697±3μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('ns', 10000, datetime.timezone(datetime.timedelta(0, 3600)))
+      67.0±0.2ms       68.8±0.1ms     1.03  tslibs.resolution.TimeResolution.time_get_resolution('h', 1000000, datetime.timezone.utc)
+      66.8±0.1ms      68.5±0.04ms     1.03  tslibs.resolution.TimeResolution.time_get_resolution('m', 1000000, None)
+         664±2μs        682±0.8μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('D', 10000, datetime.timezone.utc)
+         683±1μs          701±1μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('D', 10000, datetime.timezone(datetime.timedelta(0, 3600)))
+         890±5μs          913±4μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('us', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+       668±0.9μs          685±1μs     1.02  tslibs.resolution.TimeResolution.time_get_resolution('s', 10000, datetime.timezone.utc)
+         707±1μs        724±0.3μs     1.02  tslibs.resolution.TimeResolution.time_get_resolution('us', 10000, None)
+         708±2μs        725±0.2μs     1.02  tslibs.resolution.TimeResolution.time_get_resolution('us', 10000, datetime.timezone.utc)
+         799±3μs          818±2μs     1.02  tslibs.resolution.TimeResolution.time_get_resolution('us', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+         673±4μs        688±0.5μs     1.02  tslibs.resolution.TimeResolution.time_get_resolution('h', 10000, None)
+         761±3μs          776±1μs     1.02  tslibs.resolution.TimeResolution.time_get_resolution('D', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+       857±0.7μs          870±4μs     1.02  tslibs.resolution.TimeResolution.time_get_resolution('m', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+         766±1μs          776±2μs     1.01  tslibs.resolution.TimeResolution.time_get_resolution('m', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+         753±2μs          763±2μs     1.01  tslibs.resolution.TimeResolution.time_get_resolution('ns', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))

@jbrockmendel
Copy link
Member Author

If I can't get the perf figured out here, we can at least get some mileage de-duplication-wise by sharing+inlining some helpers following #35168

@WillAyd
Copy link
Member

WillAyd commented Jul 8, 2020

If the struct version is faster you could also memcpy the bytes needed; would just need a cleanup function to free them whenever done with the struct

@jbrockmendel
Copy link
Member Author

The struct version is a little bit faster, but still a non-trivial perf hit:

       before           after         ratio
     [031fb168]       [f154a31f]
     <master>         <ref-maybe_get_tz>
+      5.21±0.2μs       6.32±0.2μs     1.21  tslibs.resolution.TimeResolution.time_get_resolution('s', 1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      5.27±0.2μs       6.34±0.3μs     1.20  tslibs.resolution.TimeResolution.time_get_resolution('D', 1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+     5.31±0.09μs       6.37±0.3μs     1.20  tslibs.resolution.TimeResolution.time_get_resolution('ns', 1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      5.61±0.1μs       6.70±0.5μs     1.19  tslibs.resolution.TimeResolution.time_get_resolution('s', 1, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+     5.31±0.09μs       6.26±0.2μs     1.18  tslibs.resolution.TimeResolution.time_get_resolution('h', 1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      5.39±0.3μs       6.24±0.2μs     1.16  tslibs.resolution.TimeResolution.time_get_resolution('us', 1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      5.69±0.2μs       6.55±0.2μs     1.15  tslibs.resolution.TimeResolution.time_get_resolution('m', 1, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+      5.36±0.1μs       6.15±0.1μs     1.15  tslibs.resolution.TimeResolution.time_get_resolution('m', 1, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      5.72±0.1μs       6.56±0.3μs     1.15  tslibs.resolution.TimeResolution.time_get_resolution('ns', 1, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+      5.65±0.1μs       6.45±0.2μs     1.14  tslibs.resolution.TimeResolution.time_get_resolution('h', 1, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+      5.75±0.2μs       6.56±0.2μs     1.14  tslibs.resolution.TimeResolution.time_get_resolution('D', 1, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+      13.2±0.1μs       14.6±0.2μs     1.11  tslibs.resolution.TimeResolution.time_get_resolution('D', 100, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      14.0±0.2μs       15.5±0.1μs     1.11  tslibs.resolution.TimeResolution.time_get_resolution('us', 100, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      13.6±0.2μs       15.0±0.1μs     1.11  tslibs.resolution.TimeResolution.time_get_resolution('h', 100, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      13.7±0.3μs       15.1±0.1μs     1.10  tslibs.resolution.TimeResolution.time_get_resolution('ns', 100, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      13.0±0.2μs      14.3±0.08μs     1.10  tslibs.resolution.TimeResolution.time_get_resolution('D', 100, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+     12.9±0.06μs       14.2±0.2μs     1.10  tslibs.resolution.TimeResolution.time_get_resolution('h', 100, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+     13.1±0.06μs       14.4±0.2μs     1.10  tslibs.resolution.TimeResolution.time_get_resolution('ns', 100, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+      13.7±0.2μs       15.0±0.2μs     1.10  tslibs.resolution.TimeResolution.time_get_resolution('m', 100, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      13.8±0.2μs       15.1±0.3μs     1.09  tslibs.resolution.TimeResolution.time_get_resolution('s', 100, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+      5.88±0.1μs      6.41±0.09μs     1.09  tslibs.resolution.TimeResolution.time_get_resolution('us', 1, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+     13.0±0.06μs       14.1±0.1μs     1.08  tslibs.resolution.TimeResolution.time_get_resolution('s', 100, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+     13.1±0.09μs       14.1±0.3μs     1.08  tslibs.resolution.TimeResolution.time_get_resolution('m', 100, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+     1.40±0.03μs      1.48±0.03μs     1.06  tslibs.resolution.TimeResolution.time_get_resolution('m', 1, datetime.timezone.utc)
+       796±0.8μs          829±3μs     1.04  tslibs.resolution.TimeResolution.time_get_resolution('us', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+         762±3μs          790±3μs     1.04  tslibs.resolution.TimeResolution.time_get_resolution('m', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+       754±0.8μs          780±3μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('h', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+         853±1μs          882±3μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('ns', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+         862±2μs          890±3μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('h', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+       890±0.7μs          919±1μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('us', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+         756±2μs          779±3μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('ns', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+         765±3μs          788±2μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('s', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+         861±5μs        885±0.9μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('s', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+         864±2μs          887±3μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('m', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+         765±3μs          785±2μs     1.03  tslibs.resolution.TimeResolution.time_get_resolution('D', 10000, tzfile('/usr/share/zoneinfo/Asia/Tokyo'))
+         817±2μs          836±4μs     1.02  tslibs.resolution.TimeResolution.time_get_resolution('D', 10000, <DstTzInfo 'US/Pacific' LMT-1 day, 16:07:00 STD>)
+     13.2±0.01μs      13.4±0.04μs     1.02  tslibs.resolution.TimeResolution.time_get_resolution('D', 100, datetime.timezone(datetime.timedelta(0, 3600)))

@jbrockmendel
Copy link
Member Author

Closing, will revisit following #35168

@jbrockmendel jbrockmendel deleted the ref-maybe_get_tz branch November 20, 2021 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants