PERF: support parallel calculation of nancorr #24795

noamher · 2019-01-16T08:47:15Z

This is a proposal for using openmp to speedup the nancorr function (used by pd.DataFrame.corr).

If this is something that is useful, it can probably be implemented for other cython algorithms implemented in algos.pyx.

Also, the interface has to be decided on: how to choose whether to use parallelization or not, how many cpus, schedule strategy for the prange, etc.

I am not sure what the implications are for adding openmp to compilation and linkage in terms of portability.

Using 4 cpus I got ~60% speedup on a pd.DataFrame.corr (of size 20000 x 1300).

pep8speaks · 2019-01-16T08:47:19Z

Hello @noamher! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 16, 2019 at 08:57 Hours UTC

gfyoung · 2019-01-16T10:53:56Z

@noamher : Thanks for opening this PR! While the performance increases sounds appealing from the onset, a couple of points that could improve this proposal:

Given the degree of personal uncertainty in your own implementation, we generally prefer that you open an issue PRIOR TO the PR. The issue is a better place to iron out those uncertainties before making a concrete proposal to change the code.
You should run the ASV benchmarks (under the asv_bench directory) to provide more concrete evidence that this is indeed a performance speedup. If the tests don't indicate any performance speedup, trying adding your new benchmark to provide that evidence.

TomAugspurger · 2019-01-16T12:17:54Z

I don't think we're quite ready for this. We need to do some more discussion on when and how we'll use parallelism.

jbrockmendel · 2019-01-16T14:48:42Z

pandas/_libs/algos.pyx


 from libc.stdlib cimport malloc, free
 from libc.string cimport memmove
 from libc.math cimport fabs, sqrt
+from cpython cimport bool


does cython treat this differently from built-in bool?

I believe this is the type used by cython to declare a python boolean (an object which means many operations can only be taken on it when gil is taken).

jbrockmendel · 2019-01-16T14:49:08Z

pandas/_libs/algos.pyx

        int64_t nobs = 0
        float64_t vx, vy, sumx, sumy, sumxx, sumyy, meanx, meany, divisor
+        int64_t blah = 0


whats this?

Unnecessary, will be removed.

jbrockmendel · 2019-01-16T14:50:30Z

pandas/_libs/algos.pyx

+                nancorr_single_row(mat, N, K, result, xi, mask, minpv, cov)
+    else:
+        with nogil:
+            for xi in range(K):


could this be collapsed with something like range_func = range(K) if not parallel else prange(K, schedule='dynamic')?

AFAIK No, because prange is not really a function, it is converted into a #pragma of sorts in the c code.

OK. I think there is an issue on Cython's GH about making a maybe_nogil that would be semantically similar to maybe_prange. Might be worth commenting there

jbrockmendel · 2019-01-16T14:51:45Z

setup.py

@@ -677,10 +678,11 @@ def srcpath(name=None, suffix='.pyx', subdir='src'):
    obj = Extension('pandas.{name}'.format(name=name),
                    sources=sources,
                    depends=data.get('depends', []),
-                    include_dirs=include,
+                    include_dirs=include + [numpy.get_include()],


is this different from the pkg_resources version that is already in there?

jbrockmendel · 2019-01-16T14:52:33Z

setup.py

                    language=data.get('language', 'c'),
                    define_macros=data.get('macros', macros),
-                    extra_compile_args=extra_compile_args)
+                    extra_compile_args=['-fopenmp'] + extra_compile_args,
+                    extra_link_args=['-fopenmp'])


is this platform-dependent in any way? i.e. do we need to check that it is available?

I believe this is the gcc flag to link against openmp and I think clang uses the same flag but I'm not sure.

jbrockmendel · 2019-01-16T14:53:52Z

pandas/_libs/algos.pyx

@@ -230,14 +232,15 @@ def kth_smallest(numeric[:] a, Py_ssize_t k) -> numeric:

 @cython.boundscheck(False)
 @cython.wraparound(False)
-def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None):
+def nancorr(float64_t[:, :] mat, bint cov=0, minp=None, bool parallel=False):


+1 on changing to memoryviews where viable. if we don't alter mat inplace, might want to add the const modifier.

jbrockmendel · 2019-01-16T14:54:23Z

pandas/_libs/algos.pyx

+                             bint minpv,
+                             bint cov=0) nogil:
+    for yi in range(xi + 1):
+        nancorr_single(mat, N, K, result, xi, yi, mask, minpv, cov)


if nancorr_single isn't used elsewhere, I'd inline it here

jreback · 2019-01-16T15:15:16Z

I agree, this is out of scope for now w/o further API discussion. We don't have a generic parallelism story ATM. this would have to be threaded down to the function and have the abilitty to generically turn parallelism on/off at a much higher level for other frameworks.

@noamher if you would open an issue and try to reference a couple of open issues (pls do a search). for discussions.

jreback · 2019-01-16T15:15:37Z

@noamher the memory view changes are ok, would take a PR on those.

noamher · 2019-01-16T15:27:43Z

@jreback I'll open a relevant issue and see if I can find other relevant issues.

I'll do the memory view changes.

Support parallel calculation of nancorr

04a384c

noamher force-pushed the nh-parallel-nancorr branch from 2f182a9 to 04a384c Compare January 16, 2019 08:57

noamher changed the title ~~Support parallel calculation of nancorr~~ PERF: Support parallel calculation of nancorr Jan 16, 2019

noamher changed the title ~~PERF: Support parallel calculation of nancorr~~ PERF: support parallel calculation of nancorr Jan 16, 2019

gfyoung added Build Library building on various platforms Performance Memory or execution speed performance Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Needs Discussion Requires discussion from core team before further action labels Jan 16, 2019

gfyoung requested review from jreback and jbrockmendel January 16, 2019 10:54

jbrockmendel reviewed Jan 16, 2019

View reviewed changes

jreback closed this Jan 16, 2019

noamher mentioned this pull request Jan 25, 2019

CLN: Refactor cython to use memory views #24932

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: support parallel calculation of nancorr #24795

PERF: support parallel calculation of nancorr #24795

noamher commented Jan 16, 2019 •

edited

Loading

pep8speaks commented Jan 16, 2019 •

edited

Loading

gfyoung commented Jan 16, 2019 •

edited

Loading

TomAugspurger commented Jan 16, 2019

jbrockmendel Jan 16, 2019

noamher Jan 16, 2019

jbrockmendel Jan 16, 2019

noamher Jan 16, 2019

jbrockmendel Jan 16, 2019

noamher Jan 16, 2019

jbrockmendel Jan 16, 2019

jbrockmendel Jan 16, 2019

jbrockmendel Jan 16, 2019

noamher Jan 16, 2019

jbrockmendel Jan 16, 2019

noamher Jan 16, 2019

jbrockmendel Jan 16, 2019

noamher Jan 16, 2019

jreback commented Jan 16, 2019

jreback commented Jan 16, 2019

noamher commented Jan 16, 2019

PERF: support parallel calculation of nancorr #24795

PERF: support parallel calculation of nancorr #24795

Conversation

noamher commented Jan 16, 2019 • edited Loading

pep8speaks commented Jan 16, 2019 • edited Loading

Comment last updated on January 16, 2019 at 08:57 Hours UTC

gfyoung commented Jan 16, 2019 • edited Loading

TomAugspurger commented Jan 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 16, 2019

jreback commented Jan 16, 2019

noamher commented Jan 16, 2019

noamher commented Jan 16, 2019 •

edited

Loading

pep8speaks commented Jan 16, 2019 •

edited

Loading

gfyoung commented Jan 16, 2019 •

edited

Loading