Bitesize offsets #17318

jbrockmendel · 2017-08-23T19:07:13Z

This is the first of several PRs cleaning up tseries.offsets. The ultimate goals of this series of PRs are:

Fix slow implementation of DateOffset.__eq__, as that gets called by Period.__eq__.
Make DateOffset immutable, since it is attached to a Period object which is supposed to be immutable (TODO: fill in the appropriate GH issue)
Move tseries.offsets into cython so that _libs.period and _libs.tslib can import it guilt-free and not need to do run-time imports. See TODO comment in _libs.__init__:

# TODO
# period is directly dependent on tslib and imports python
# modules, so exposing Period as an alias is currently not possible

The biggest impediment to the immutability goal is the kwds attribute, which is just a dict. The first couple of steps in this sequence is focused on whittling down the number of attributes set at runtime.

This PR is mainly fixing typos and removing redundant methods.

passes git diff upstream/master -u -- "*.py" | flake8 --diff

pep8speaks · 2017-08-23T19:46:55Z

Hello @jbrockmendel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 23, 2017 at 16:21 Hours UTC

jreback · 2017-08-24T12:51:01Z

pandas/tseries/offsets.py

+            plural = 's'
+        else:
+            plural = ''
+


this whole thing should just be moved to a separate function, much more clear this way

Not sure what you're referring to as "this whole thing".

this formatting function (e.g. repr should just call it, passing parameters in).

jreback · 2017-08-24T12:51:53Z

pandas/tseries/offsets.py

@@ -642,9 +623,6 @@ def get_str(td):
        else:
            return '+' + repr(self.offset)

-    def isAnchored(self):


pls don't simply remove things. instead in a separate PR deprecate them.

The method is still available. It's inherited verbatim from the parent class.

codecov · 2017-08-28T15:22:25Z

Codecov Report

Merging #17318 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17318      +/-   ##
==========================================
+ Coverage   91.01%   91.02%   +0.01%     
==========================================
  Files         162      162              
  Lines       49558    49564       +6     
==========================================
+ Hits        45105    45116      +11     
+ Misses       4453     4448       -5

Flag	Coverage Δ
#multiple	`88.8% <100%> (+0.02%)`	⬆️
#single	`40.26% <66.66%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
pandas/tseries/offsets.py	`97.28% <100%> (+0.13%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.72% <0%> (-0.1%)`	⬇️
pandas/errors/__init__.py	`100% <0%> (ø)`	⬆️
pandas/plotting/_converter.py	`65.05% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2bec750...c66b842. Read the comment docs.

codecov · 2017-08-28T15:22:28Z

Codecov Report

Merging #17318 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17318      +/-   ##
==========================================
- Coverage   91.22%   91.21%   -0.01%     
==========================================
  Files         163      163              
  Lines       49655    49673      +18     
==========================================
+ Hits        45296    45308      +12     
- Misses       4359     4365       +6

Flag	Coverage Δ
#multiple	`89% <100%> (+0.01%)`	⬆️
#single	`40.19% <64%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
pandas/tseries/frequencies.py	`96.11% <ø> (ø)`	⬆️
pandas/tseries/offsets.py	`97.14% <100%> (+0.14%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.77% <0%> (-0.1%)`	⬇️
pandas/core/series.py	`94.92% <0%> (ø)`	⬆️
pandas/core/generic.py	`91.99% <0%> (ø)`	⬆️
pandas/core/indexes/datetimes.py	`95.53% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2757a2...e52a791. Read the comment docs.

jbrockmendel · 2017-09-14T16:16:21Z

I think this test error is unrelated. Pls confirm.

jreback · 2017-09-14T16:22:16Z

this was a while ago, rebase and we will see

…tesize_offsets

jreback · 2017-09-15T01:39:57Z

needs a rebase

…tesize_offsets

jreback · 2017-09-15T10:14:08Z

pandas/tseries/offsets.py

+            plural = 's'
+        else:
+            plural = ''
+


this formatting function (e.g. repr should just call it, passing parameters in).

jreback · 2017-09-15T10:15:03Z

pandas/tseries/offsets.py

        return fstr

+    def _offset_str(self):


try not to add new things which just clutter up

_offset_str is currently a method of BusinessDay, which has its own implementation of freqstr. Adding this dummy method lets us get rid of the duplicate freqstr method. Same idea with __repr__/_repr_attrs. Strictly less clutter.

jreback · 2017-09-15T10:15:52Z

pandas/tseries/offsets.py

@@ -710,6 +689,7 @@ def __init__(self, **kwds):
        kwds['end'] = self._validate_time(kwds.get('end', '17:00'))
        self.kwds = kwds
        self.offset = kwds.get('offset', timedelta(0))
+        self._offset = self.offset  # alias for backward compat


why don't you just define .offset as a property to return ._offset?

I definitely tried this, don't remember off the top why it didn't work. Maybe when the caffeine kicks in.

ok, let's try that type of refactor again.

jreback · 2017-09-17T21:22:50Z

pandas/tseries/offsets.py

@@ -710,6 +689,7 @@ def __init__(self, **kwds):
        kwds['end'] = self._validate_time(kwds.get('end', '17:00'))
        self.kwds = kwds
        self.offset = kwds.get('offset', timedelta(0))
+        self._offset = self.offset  # alias for backward compat


ok, let's try that type of refactor again.

…tesize_offsets

jreback

can you run the freq asv's to make sure nothing has changed here.

jreback · 2017-09-22T21:42:31Z

pandas/tseries/offsets.py

+        if 'offset' in state:
+            # Older versions have offset attribute instead of _offset
+            assert '_offset' not in state, list(state.keys())
+            state['_offset'] = state['offset']


state['_offset'] = state.pop('offset')

can you remove the assert?

Good call on the pop. Change assert to ValueError if both keys are (somehow) there?

you may want to add a 0.20.3 pickle that adds things for every frequency.

in separate PR.
you can use pandas/tests/io/generate_legacy_storage_files.py, update to add all of the offsets. Then generate and add to the repo. All tests should pass. (this is all with 0.20.3), make the modification locally but run it with the older python

I'll make a note to do this once other fixes to offsets are done.

jreback · 2017-09-22T21:43:01Z

pandas/tseries/offsets.py

-        return out
+    @property
+    def offset(self):
+        # Alias for backward compat


better comment on what this attribute is (its not for backward compat, rather its the API).

Not sure I understand. It explicitly is for backward compat since we are trying to standardize on _offset.

offsetis a user API, the _offset is merely an implementation detail (e.g. how we implement it). add a doc-string on what this returns.

OK. Though to the extent that we can lock down what constitutes the user-facing API, offset probably doesn't belong in it.

jreback · 2017-09-22T21:43:20Z

pandas/tseries/offsets.py

@@ -507,8 +513,18 @@ def freqstr(self):
        else:
            fstr = code

+        try:
+            if self._offset:


is this still needed?

Yes. Not all subclasses define the _offset attribute. That is something I intend to standardize, but this PR is explicitly intended to be limited in scope.

jbrockmendel · 2017-09-22T23:39:03Z

can you run the freq asv's to make sure nothing has changed here.

Begrudgingly...

$ asv continuous -f 1.1 -E virtualenv master HEAD -b freq
[...]
       before           after         ratio
     [2bec750b]       [d1e9161b]
+          72.9ms            126ms     1.73  timeseries.DatetimeIndex.time_infer_freq_none
-          57.9ms           45.0ms     0.78  timeseries.DatetimeIndex.time_infer_freq_business
-          44.7ms           34.3ms     0.77  timeseries.DatetimeIndex.time_infer_freq_daily

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

$ asv continuous -f 1.1 -E virtualenv master HEAD -b freq
[...]
     [2bec750b]       [d1e9161b]
+          31.7ms           43.8ms     1.38  timeseries.DatetimeIndex.time_infer_freq_business
+     1.52±0.04μs      1.99±0.05μs     1.30  timestamp.TimestampProperties.time_freqstr
+          43.1ms           50.4ms     1.17  timeseries.DatetimeIndex.time_infer_freq_none

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

$ asv continuous -f 1.1 -E virtualenv master HEAD -b freq
[...]
     [2bec750b]       [d1e9161b]
+          25.6ms           30.7ms     1.20  timeseries.DatetimeIndex.time_infer_freq_business
-          27.2ms           24.3ms     0.89  timeseries.DatetimeIndex.time_infer_freq_daily

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

$ asv continuous -f 1.1 -E virtualenv master HEAD -b freq
[...]
      before           after         ratio
     [2bec750b]       [d1e9161b]
-          60.6ms           53.9ms     0.89  timeseries.DatetimeIndex.time_infer_freq_none
-          34.6ms           28.7ms     0.83  timeseries.DatetimeIndex.time_infer_freq_business
-          30.9ms           23.4ms     0.76  timeseries.DatetimeIndex.time_infer_freq_daily
-      2.87±0.3μs      2.14±0.07μs     0.75  timestamp.TimestampProperties.time_freqstr

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

jreback · 2017-09-22T23:49:25Z

This is repeatable.

    before     after       ratio
  [f797c1dc] [d1e9161b]
+  529.00μs    12.36ms     23.36  timeseries.DatetimeIndex.time_infer_freq_business
+   20.68ms    24.21ms      1.17  timeseries.DatetimeIndex.time_infer_freq_none

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

jreback · 2017-09-22T23:50:12Z

[2bec750b] is a pretty old commit, FYI

jbrockmendel · 2017-09-23T01:01:46Z

After updating, I get the same timings too. But it looks like this may be due to a bug in master. Manually stepping through the benchmark, timeseries.DatetimeIndex includes the code

rng8 = date_range(start='1/1/1700', freq='B', periods=100000)

which for me (py27, ubuntu...) is giving:

OverflowError: Python int too large to convert to C long
Exception OverflowError: 'Python int too large to convert to C long' in 'pandas._libs.tslib._delta_to_nanoseconds' ignored

and the resulting rng8 object is a DatetimeIndex object with 5 entries. Before anything else, can you reproduce this?

jbrockmendel · 2017-09-23T01:06:22Z

Same on py35

jbrockmendel · 2017-09-23T01:27:43Z

git bisect tells me the first bad commit was b59f107

jreback · 2017-09-23T01:31:07Z

yeah looks like this was a bug introduced there, hmm.

jreback · 2017-09-23T01:33:40Z

let me see if I can work around: #17637

…tesize_offsets

jbrockmendel · 2017-09-23T16:13:51Z

After #17637, benchmarks are now unaffected.

jreback · 2017-09-23T16:15:15Z

After #17637, benchmarks are now unaffected.

does this mean they are the same (worse) or the same as master?

jbrockmendel · 2017-09-23T16:19:29Z

asv continuous -f 1.1 -E virtualenv master HEAD -b freq
[...]
BENCHMARKS NOT SIGNIFICANTLY CHANGED.

jbrockmendel · 2017-09-23T16:22:11Z

Just pushed. Besides the requested docstring, had to edit the asv to avoid the new OverflowError.

jreback · 2017-09-23T16:44:55Z

asv_bench/benchmarks/timeseries.py

@@ -56,7 +56,7 @@ def setup(self):
        self.no_freq = self.rng7[:50000].append(self.rng7[50002:])
        self.d_freq = self.rng7[:50000].append(self.rng7[50000:])

-        self.rng8 = date_range(start='1/1/1700', freq='B', periods=100000)
+        self.rng8 = date_range(start='1/1/1700', freq='B', periods=75000)


jreback · 2017-09-23T16:45:56Z

lgtm. ping on green.

side note, I believe you can make _params readonly_cached if i read it correctly (another PR of course :>)

jbrockmendel · 2017-09-23T17:08:54Z

side note, I believe you can make _params readonly_cached if i read it correctly (another PR of course :>)

That is pretty much the original motivation here. There are a couple more steps between here and there, since until we get kwds locked down (see WIP #17458) and make relevant attrs immutable, cache invalidation is a PITA.

jreback · 2017-09-23T17:36:31Z

thanks!

jbrockmendel added 4 commits August 23, 2017 11:46

Fix typos

3822f99

unify __repr__ and freqstr implementations

309dd54

Remove methods identical to those of parent class

87f659b

alias ._offset-->.offset so freqstr compat

3d1e6f8

gfyoung added Clean Internals Related to non-user accessible pandas implementation labels Aug 23, 2017

flake8 fixup

0f5b2a6

jbrockmendel mentioned this pull request Aug 23, 2017

offsets pickle issues #17313

Closed

jreback reviewed Aug 24, 2017

View reviewed changes

jbrockmendel mentioned this pull request Aug 25, 2017

Remove property that re-computed microsecond #17331

Merged

4 tasks

dummy commit to force CI

c66b842

Dummy commit to force CI

f016855

jreback added the Frequency DateOffsets label Sep 14, 2017

Merge branch 'master' of https://github.com/pandas-dev/pandas into bi…

1fbe61d

…tesize_offsets

Merge branch 'master' of https://github.com/pandas-dev/pandas into bi…

030afde

…tesize_offsets

jreback requested changes Sep 15, 2017

View reviewed changes

jreback requested changes Sep 17, 2017

View reviewed changes

jbrockmendel added 3 commits September 19, 2017 20:08

Merge branch 'master' of https://github.com/pandas-dev/pandas into bi…

2ca3bf2

…tesize_offsets

Make offset a property to alias _offset for backward compat

1f4b5cd

Rename offset attribute in unpickled instance of older version

d1e9161

jreback requested changes Sep 22, 2017

View reviewed changes

reviewer comments, pop key, dont assert

3d3c2a6

jreback mentioned this pull request Sep 23, 2017

BUG: overflow in Timedelta arithmetic #17637

Closed

jbrockmendel added 2 commits September 23, 2017 09:00

Merge branch 'master' of https://github.com/pandas-dev/pandas into bi…

24f6eee

…tesize_offsets

Reduce periods for B freq to avoid OverflowError

c22ee24

docstring per reviewer request

e52a791

jreback reviewed Sep 23, 2017

View reviewed changes

jreback added this to the 0.21.0 milestone Sep 23, 2017

jreback approved these changes Sep 23, 2017

View reviewed changes

jreback merged commit 2eb568a into pandas-dev:master Sep 23, 2017

jbrockmendel deleted the bitesize_offsets branch October 30, 2017 16:23

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

Bitesize offsets (pandas-dev#17318)

5ad3039

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Bitesize offsets (pandas-dev#17318)

825ebf7

Bitesize offsets #17318

Bitesize offsets #17318

Conversation

jbrockmendel commented Aug 23, 2017

pep8speaks commented Aug 23, 2017 • edited Loading

Comment last updated on September 23, 2017 at 16:21 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 28, 2017

Codecov Report

codecov bot commented Aug 28, 2017 • edited Loading

Codecov Report

jbrockmendel commented Sep 14, 2017

jreback commented Sep 14, 2017

jreback commented Sep 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback Sep 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel Sep 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 22, 2017

jreback commented Sep 22, 2017

jreback commented Sep 22, 2017

jbrockmendel commented Sep 23, 2017

jbrockmendel commented Sep 23, 2017

jbrockmendel commented Sep 23, 2017

jreback commented Sep 23, 2017

jreback commented Sep 23, 2017

jbrockmendel commented Sep 23, 2017

jreback commented Sep 23, 2017

jbrockmendel commented Sep 23, 2017

jbrockmendel commented Sep 23, 2017

Choose a reason for hiding this comment

jreback commented Sep 23, 2017

jbrockmendel commented Sep 23, 2017

jreback commented Sep 23, 2017

pep8speaks commented Aug 23, 2017 •

edited

Loading

codecov bot commented Aug 28, 2017 •

edited

Loading

jreback Sep 22, 2017 •

edited

Loading

jbrockmendel Sep 23, 2017 •

edited

Loading