ENH: Add numba engine for rolling apply #30151

mroeschke · 2019-12-09T06:07:08Z

closes Add Numba as an optional dependency for rolling.apply for pandas 1.0 #28987
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pandas/core/window/rolling.py

TomAugspurger · 2019-12-09T14:53:48Z

Random question: do you have benchmarks on this anywhere? It would be good to give guidance on when this is helpful (data sizes, type of user-defined function, ratio of time spent in windowing vs. the UDF, etc.).

jreback · 2019-12-10T12:44:03Z

pandas/core/window/rolling.py

-
-    *args, **kwargs
-        Arguments and keyword arguments to be passed into func.
+    args : tuple, default None


args,kwargs should be at the end

jreback · 2019-12-10T12:44:32Z

pandas/core/window/rolling.py

+        raw=False,
+        args=None,
+        kwargs=None,
+        engine="cython",


type new arguments

pandas/core/window/rolling.py

mroeschke · 2019-12-11T07:28:34Z

@TomAugspurger performance testing is lightly described in twosigma#29, but good point to have a dedicated section in the documentation describing when the numba engine would be advantageous

jreback · 2019-12-11T22:06:53Z

pandas/core/window/numba_.py

+
+
+def _generate_numba_apply_func(
+    args: Tuple, kwargs: Dict, func: Callable, engine_kwargs: Optional[Dict]


can you add some comments here

pandas/tests/window/test_api.py

alimcmaster1 · 2019-12-24T20:52:17Z

Code check failures are related to https://github.com/pandas-dev/pandas/pull/30370/files#r361222880
I can submit a PR to fix this

Check for non-standard imports
##[error]pandas/io/stata.py:12:from collections.abc import Iterator
##[error]pandas/io/sas/sas7bdat.py:16:from collections.abc import Iterator
##[error]pandas/io/sas/sas_xport.py:10:from collections.abc import Iterator
##[error]pandas/io/common.py:5:from collections.abc import Iterator
##[error]pandas/io/parsers.py:6:from collections.abc import Iterator
##[error]pandas/io/json/_json.py:2:from collections.abc import Iterator

PR here: #30455

WillAyd

Few more comments. I don't want to go overboard on annotations so those in particular aren't blockers, but generally if subtypes can be added for anything they would be preferred

WillAyd · 2019-12-24T23:40:51Z

pandas/core/window/numba_.py

+from pandas.compat._optional import import_optional_dependency
+
+
+def make_rolling_apply(func, args, nogil, parallel, nopython):


Not a blocker here since this is large enough, but would be nice to annotate this in a follow up

WillAyd · 2019-12-24T23:41:58Z

pandas/core/window/rolling.py

@@ -92,6 +93,7 @@ def __init__(
        self.win_freq = None
        self.axis = obj._get_axis_number(axis) if axis is not None else None
        self.validate()
+        self._numba_func_cache: Dict = dict()


I think I missed this in previous review but similar comment as rest; not a block but subtypes make annotations much more insightful to the reader

WillAyd · 2019-12-24T23:42:32Z

pandas/core/window/rolling.py

@@ -442,6 +444,7 @@ def _apply(
        floor: int = 1,
        is_weighted: bool = False,
        name: Optional[str] = None,
+        use_numba_cache: Optional[bool] = False,


This can just be annotated as bool no? Or do we need to explicitly handle None?

WillAyd · 2019-12-24T23:44:06Z

pandas/tests/window/test_api.py

@@ -342,3 +342,32 @@ def test_multiple_agg_funcs(self, func, window_size, expected_vals):
        )

        tm.assert_frame_equal(result, expected)
+
+
+class TestEngine:


I think this class would be more logically placed in test_apply

WillAyd · 2019-12-24T23:45:42Z

pandas/tests/window/test_apply.py

+
+
+def test_all_apply(engine, raw):
+    if engine == "numba":


Can we just skip or xfail instead? I think more indicative to reader that it is an invalid combination rather than re-assigning

could make engine_and_raw fixture using the pattern in #30456 if this is going to come up repeatedly

WillAyd · 2019-12-24T23:46:50Z

pandas/tests/window/test_numba.py

+import pandas.util.testing as tm
+
+
+@td.skip_if_no("numba", "0.46.0")


Can just keep as skip_if_no("numba") since the minimum is already enforced by requirements files

jbrockmendel · 2019-12-25T00:13:19Z

doc/source/whatsnew/v1.0.0.rst

+We've added an ``engine`` keyword to :meth:`~Rolling.apply` that allows the user to execute the
+routine using `Numba <https://numba.pydata.org/>`__ instead of Cython. Using the Numba engine
+can yield significant performance gains if the apply function can operate on numpy arrays and
+the data set is larger. For more details, see :ref:`rolling apply documentation <stats.rolling_apply>`


"the data set is larger" here is pretty vague. is the perf gain a function of the array size or more about the user-defined function?

Somewhat both, but more obvious with the data size. I can make this more specific.

jbrockmendel · 2019-12-25T00:13:40Z

doc/source/whatsnew/v1.0.0.rst

@@ -428,6 +439,8 @@ Optional libraries below the lowest tested version may still work, but are not c
 +-----------------+-----------------+---------+
 | matplotlib      | 2.2.2           |         |
 +-----------------+-----------------+---------+
+| numba           | 0.46.0          |         |


add an X for this

pandas/core/window/rolling.py

jbrockmendel · 2019-12-25T00:21:08Z

This looks really nice. way to go

mroeschke · 2019-12-26T05:59:53Z

Thanks for the review everyone. Think I addressed all the comments.

jreback

lgtm. small questions.

cc @jorisvandenbossche @TomAugspurger @jbrockmendel

if any comments

pandas/core/window/common.py

jreback · 2019-12-26T13:12:03Z

pandas/core/window/numba_.py

+    parallel: bool,
+    nopython: bool,
+):
+    numba = import_optional_dependency("numba")


can you add a doc-string that says what this function does (the parameters are already documented elsewhere, maybe just mention that)

pandas/core/window/numba_.py

pandas/core/window/rolling.py

WillAyd

lgtm

pandas/core/window/common.py

mroeschke · 2019-12-27T17:21:34Z

Thanks for another round of review. Addressed all the comments.

TomAugspurger · 2019-12-27T18:28:33Z

Looks good.

jreback · 2019-12-27T19:27:48Z

thanks @mroeschke very nice!

jbrockmendel · 2019-12-27T20:46:11Z

@mroeschke heads up that im now seeing this in test output locally:

pandas/tests/window/test_numba.py::TestApply::test_cache[False-True-False-False]
  /Users/bmendel/Desktop/pd/dtmean/pandas/core/window/numba_.py:75: NumbaPerformanceWarning: 
  The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.
  
  To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.
  
  File "pandas/core/window/numba_.py", line 59:
  
              def impl(window, *_args):
              ^
  
    result[i] = numba_func(window, *args)

mroeschke · 2019-12-27T20:48:00Z

I'll make a followup to suppress this warning (no baring on what is being test).

…ndexing-1row-df * upstream/master: (333 commits) CI: troubleshoot Web_and_Docs failing (pandas-dev#30534) WARN: Ignore NumbaPerformanceWarning in test suite (pandas-dev#30525) DEPR: camelCase in offsets, get_offset (pandas-dev#30340) PERF: implement scalar ops blockwise (pandas-dev#29853) DEPR: Remove Series.compress (pandas-dev#30514) ENH: Add numba engine for rolling apply (pandas-dev#30151) [ENH] Add to_markdown method (pandas-dev#30350) DEPR: Deprecate pandas.np module (pandas-dev#30386) ENH: Add ignore_index for df.drop_duplicates (pandas-dev#30405) BUG: The setting xrot=0 in DataFrame.hist() doesn't work with by and subplots pandas-dev#30288 (pandas-dev#30491) CI: Fix GBQ Tests (pandas-dev#30478) Bug groupby quantile listlike q and int columns (pandas-dev#30485) ENH: Add ignore_index for df.sort_values and series.sort_values (pandas-dev#30402) TYP: Typing hints in pandas/io/formats/{css,csvs}.py (pandas-dev#30398) BUG: raise on non-hashable Index name, closes pandas-dev#29069 (pandas-dev#30335) Replace "foo!r" to "repr(foo)" syntax pandas-dev#29886 (pandas-dev#30502) BUG: preserve EA dtype in transpose (pandas-dev#30091) BLD: add check to prevent tempita name error, clsoes pandas-dev#28836 (pandas-dev#30498) REF/TST: method-specific files for test_append (pandas-dev#30503) marked unused parameters (pandas-dev#30504) ...

mroeschke · 2023-05-23T22:45:40Z

pandas/core/window/numba_.py

+        numba_func = func
+    else:
+
+        @numba.generated_jit(nopython=nopython, nogil=nogil, parallel=parallel)


@stuartarchibald sorry for the ping, but I see that generated_jit has been deprecated in numba 0.57. IIRC you helped me add this a while back and am lost on how to write this in terms of overload

@mroeschke no problem, I can try and help with this. I think it needs to look a bit like this (for reference, this is untested, I am just guessing from the context! Also, the pandas variant is obviously wrapped to close over some configuration which I've omitted, so consider this as the function body of make_rolling_apply. I've left comments inline to try and explain what's going on):

import types import numpy as np from numba.extending import overload, is_jitted from numba import njit import numba # this provides a local definition to overload def overload_target(window, *_args): # If JIT is disabled, this function will run, so write the implementation here! pass nopython = True nogil = True parallel = False # pretend this is an arg to `make_rolling_apply` def func(window, *args): return window * 2 + args[0] @overload(overload_target, jit_options={'nopython':nopython, 'nogil':nogil, 'parallel':parallel}) def ol_overload_target(window, *_args): # This function "overloads" `overload_target`, whenever the Numba compiler # "sees" `overload_target` it will use this function. # Using `is_jitted` to avoid `isinstance` on # `numba.targets.registry.CPUDispatcher` as that may be considered an # internal Numba detail. if is_jitted(func): # it's already JIT compiled so just reference it overload_target_impl = func elif getattr(np, func.__name__, False) is func or isinstance( func, types.BuiltinFunctionType ): # it's a NumPy function or builtin so just reference it overload_target_impl = func else: # it's a Python function, so register it as JIT compilable and reference # that overload_target_impl = numba.jit(func, nopython=nopython, nogil=nogil) # This is the Numba implementation of the overload, it will just be JIT # compiled whenever the compiler "sees" a reference to "overload_target" in # code it is compiling. def impl(window, *_args): return overload_target_impl(window, *_args) return impl # demo @njit def roll_apply(window, *_args): return overload_target(window, *_args) print(roll_apply(np.arange(10.), 1.23))

@overload is basically saying to Numba "when you see this specific python function (the one in the first argument in the @overload decorator) use this implementation". The concept about there being a "typing" part that can be used to dispatch different variants based on type is exactly the same as in @generated_jit. The largest difference is what happens if the JIT compiler is turned off. In the case of @overload the python function being overloaded will run, i.e., the code just executes as would be expected in the interpreter. Whereas in the case of @generated_jit, because the pure python implementation and the Numba implementation are the same function, if you turn the JIT compiler off it will just break (the value returned when calling a @generated_jit function is a function implementing the Numba specialisation). Essentially, @generated_jit is like doing @overload but the function being decorated is also the function being overloaded.

Hope this helps?

Thanks for the reply! We had a PR recently that refactored this to use extending.register_jittable. Would that be a sufficient alternative? https://github.com/pandas-dev/pandas/pull/53455/files

No problem! I just took a look at the patch above, I think it'd work but think it might lose some of the dispatch ability offered by generated_jit/overload. As I understand it, the original code would have let a NumPy function or a built-in be passed in as the "user function", whereas I think the register_jittable version requires a user defined Python function. It may be that the register_jittable version is a sufficient alternative for the need/use cases in practice, in which case, it seems appropriate.

Great thanks for the context! Yeah this function should expect a custom UDF so thanks for the confirmation

Glad to get this resolved, thanks for confirming too! It sounds like the replacement above is appropriate. If there are any more issues/queries feel free to open issues on the Numba issue tracker (or ping here!).

@stuartarchibald I'm running into a rolling apply issue with pandas 2.1.1 and numba 0.58 that might be related. Discussion is here:
https://numba.discourse.group/t/pandas-source-of-old-style-error-capturing-warning/2169/8

Matt Roeschke added 6 commits December 8, 2019 11:54

Add numba to import_optional_dependencies

3b9bff8

Start adding keywords

9a302bf

Modify apply for numba and cython

0e9a600

Merge remote-tracking branch 'upstream/master' into numba_rolling_apply

36a77ed

Add numba as optional dependency

dbb2a9b

Add premil tests

f0e9a4d

mroeschke added Dependencies Required and optional dependencies Enhancement Window rolling, ewma, expanding labels Dec 9, 2019

mroeschke added this to the 1.0 milestone Dec 9, 2019

jreback requested changes Dec 9, 2019

View reviewed changes

pandas/core/window/rolling.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into numba_rolling_apply

1250aee

jreback requested changes Dec 10, 2019

View reviewed changes

Matt Roeschke added 3 commits December 10, 2019 22:40

Merge remote-tracking branch 'upstream/master' into numba_rolling_apply

4e7fd1a

Add numba to requirements-dev, type and reorder signature in apply

cb976cf

Move numba routines to its own file

45420bb

Matt Roeschke added 3 commits December 10, 2019 23:30

Adjust signature in top level function as well

17851cf

Merge remote-tracking branch 'upstream/master' into numba_rolling_apply

20767ca

Generate requirements-dev.txt using script

9619f8d

jreback reviewed Dec 11, 2019

View reviewed changes

Matt Roeschke added 8 commits December 12, 2019 22:04

Merge remote-tracking branch 'upstream/master' into numba_rolling_apply

66fa69c

Add skip test decorator, add numba to a few builds

b8908ea

black

135f2ad

don't rejit a user's jitted function

34a5687

Add numba/cython comparison test

6da8199

Merge remote-tracking branch 'upstream/master' into numba_rolling_apply

123f77e

Remove typing for now

54e74d1

Remove sub description for doc failures?

04d3530

Type Callable in generate_numba_apply_func

af3fe50

Matt Roeschke added 3 commits December 24, 2019 14:25

Merge remote-tracking branch 'upstream/master' into numba_rolling_apply

eb7b5e1

use ellipsis, cannot specify np.ndarray as well

f7dfcf4

Remove trailing whitespace in apply docstring

a42a960

WillAyd requested changes Dec 24, 2019

View reviewed changes

jbrockmendel reviewed Dec 25, 2019

View reviewed changes

pandas/core/window/rolling.py Show resolved Hide resolved

Matt Roeschke added 2 commits December 24, 2019 20:03

Address Will's and Brock's comments

d019830

Fix typing

29d145f

jreback approved these changes Dec 26, 2019

View reviewed changes

WillAyd approved these changes Dec 26, 2019

View reviewed changes

topper-123 requested changes Dec 26, 2019

View reviewed changes

pandas/core/window/common.py Outdated Show resolved Hide resolved

Matt Roeschke added 2 commits December 26, 2019 10:00

Merge remote-tracking branch 'upstream/master' into numba_rolling_apply

248149c

Address followup comments

a3da51e

jreback merged commit a9fcdc5 into pandas-dev:master Dec 27, 2019

mroeschke deleted the numba_rolling_apply branch December 27, 2019 19:28

mroeschke mentioned this pull request Dec 28, 2019

WARN: Ignore NumbaPerformanceWarning in test suite #30525

Merged

AlexKirko pushed a commit to AlexKirko/pandas that referenced this pull request Dec 29, 2019

ENH: Add numba engine for rolling apply (pandas-dev#30151)

7bfa883

mroeschke commented May 23, 2023

View reviewed changes

stuartarchibald mentioned this pull request Oct 3, 2023

BUG: NumbaPendingDeprecationWarning with numba 0.58.0 #55247

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add numba engine for rolling apply #30151

ENH: Add numba engine for rolling apply #30151

mroeschke commented Dec 9, 2019 •

edited

Loading

TomAugspurger commented Dec 9, 2019

jreback Dec 10, 2019

jreback Dec 10, 2019

mroeschke commented Dec 11, 2019

jreback Dec 11, 2019

alimcmaster1 commented Dec 24, 2019 •

edited

Loading

WillAyd left a comment

WillAyd Dec 24, 2019

WillAyd Dec 24, 2019

WillAyd Dec 24, 2019

WillAyd Dec 24, 2019

WillAyd Dec 24, 2019

jbrockmendel Dec 25, 2019

WillAyd Dec 24, 2019

jbrockmendel Dec 25, 2019

mroeschke Dec 25, 2019

jbrockmendel Dec 25, 2019

jbrockmendel commented Dec 25, 2019

mroeschke commented Dec 26, 2019

jreback left a comment

jreback Dec 26, 2019

WillAyd left a comment

mroeschke commented Dec 27, 2019

TomAugspurger commented Dec 27, 2019

jreback commented Dec 27, 2019

jbrockmendel commented Dec 27, 2019

mroeschke commented Dec 27, 2019

mroeschke May 23, 2023

stuartarchibald Jun 9, 2023

mroeschke Jun 9, 2023

stuartarchibald Jun 9, 2023

mroeschke Jun 9, 2023

stuartarchibald Jun 9, 2023

kartiksubbarao Sep 29, 2023



		def _generate_numba_apply_func(
		args: Tuple, kwargs: Dict, func: Callable, engine_kwargs: Optional[Dict]

		from pandas.compat._optional import import_optional_dependency


		def make_rolling_apply(func, args, nogil, parallel, nopython):

		import pandas.util.testing as tm


		@td.skip_if_no("numba", "0.46.0")

ENH: Add numba engine for rolling apply #30151

ENH: Add numba engine for rolling apply #30151

Conversation

mroeschke commented Dec 9, 2019 • edited Loading

TomAugspurger commented Dec 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke commented Dec 11, 2019

Choose a reason for hiding this comment

alimcmaster1 commented Dec 24, 2019 • edited Loading

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Dec 25, 2019

mroeschke commented Dec 26, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

mroeschke commented Dec 27, 2019

TomAugspurger commented Dec 27, 2019

jreback commented Dec 27, 2019

jbrockmendel commented Dec 27, 2019

mroeschke commented Dec 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke commented Dec 9, 2019 •

edited

Loading

alimcmaster1 commented Dec 24, 2019 •

edited

Loading