ENH: add signature argument to vectorize for vectorizing like generalized ufuncs #8054

shoyer · 2016-09-15T08:15:04Z

I brought this up on the mailing list today and received some support, so I
thought I would share a preliminary patch.

~~Still needs tests and more extensive documentation.~~

Example usage (from the docstring):

Vectorized calculation of Pearson correlation coefficient and its p-value:

>>> import scipy.stats
>>> pearsonr = np.vectorize(scipy.stats.pearsonr,
...                         signature='(n),(n)->(),()')
>>> pearsonr([[0, 1, 2, 3]], [[1, 2, 3, 4], [4, 3, 2, 1]])
(array([ 1., -1.]), array([ 0.,  0.]))

Vectorized convolution:

>>> convolve = np.vectorize(np.convolve, signature='(n),(m)->(k)')
>>> convolve(np.eye(4), [1, 2, 1])
array([[ 1.,  2.,  1.,  0.,  0.,  0.],
       [ 0.,  1.,  2.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  2.,  1.,  0.],
       [ 0.,  0.,  0.,  1.,  2.,  1.]])

A note on speed: a vectorize with signature has about 6x the fixed overhead of vectorize (60us vs 10us) and also a much larger overhead per loop (2us vs 100ns). I don't think these numbers will make any difference for the intended use case (only trivial Python functions runs in less than 10us) but I thought I would toss them out there anyways. I have yet to make any attempts at optimization.

# run on python 3.5.2
import numpy as np
import operator

vectorize_add = np.vectorize(operator.add, otypes='i')
guvectorize_add = np.vectorize(operator.add, signature='(),()->()', otypes='i')

x = np.ones(1000000, dtype=int)

%timeit vectorize_add(1, 1)
#100000 loops, best of 3: 10.6 µs per loop

%timeit guvectorize_add(1, 1)
#10000 loops, best of 3: 63.4 µs per loop

%timeit vectorize_add(x, x)
#10 loops, best of 3: 112 ms per loop

%timeit guvectorize_add(x, x)
#1 loop, best of 3: 2.22 s per loop

Also fixes #5868

shoyer · 2016-09-26T03:44:21Z

doc/source/reference/c-api.generalized-ufuncs.rst

@@ -1,3 +1,5 @@
+.. _c-api.generalized-ufuncs:


I'd like to add a reference to this doc page from the docstring for guvectorize. Do we have a standard way to do that? I guess a URL under "References" is probably most robust, though I was thinking a sphinx reference might work...

shoyer · 2016-09-26T03:50:31Z

I've added tests, so this is ready for review.

juliantaylor

looks like a nice feature

do we need a new api name guvectorize? could this be put into vectorize with an optional signature argument?

juliantaylor · 2016-09-29T13:48:25Z

numpy/lib/function_base.py

+    core_dims : Tuple[str, ...]
+        Core dimensions for this argument.
+    """
+    if core_dims:


if not core_dims: return instead of an extra indent level for the whole function might be nicer

agreed, done

juliantaylor · 2016-09-29T13:49:42Z

numpy/lib/function_base.py

+        self.pyfunc = pyfunc
+        self.signature = signature
+
+        if not re.match(_SIGNATURE, signature):


this part is just _parse_gufunc_signature?

yes, indeed, good catch

juliantaylor · 2016-09-29T13:53:17Z

numpy/lib/function_base.py

+_SIGNATURE = '^%s->%s$' % (_ARGUMENT_LIST, _ARGUMENT_LIST)
+
+
+def _parse_gufunc_signature(signature):


shouldn't these functions just be private methods of the guvectorize class?

I prefer writing functions for isolated functionality. Among other virtues, it makes things easier to test.

you could use @classmethod, can be tested in isolation of creating the object.

Sure, could use @classmethod or @staticmethod here, but my feeling is that it's poor style to use a class merely as a namespace. (It's actually against our internal style guide.)

shoyer · 2016-09-30T01:31:24Z

do we need a new api name guvectorize? could this be put into vectorize with an optional signature argument?

@juliantaylor This is a good point. A signature argument would indeed be a cleaner way to do this. I will give that a try.

Example usage (from the docstring): Vectorized convolution: >>> convolve = np.vectorize(np.convolve, signature='(n),(m)->(k)') >>> convolve(np.eye(4), [1, 2, 1]) array([[ 1., 2., 1., 0., 0., 0.], [ 0., 1., 2., 1., 0., 0.], [ 0., 0., 1., 2., 1., 0.], [ 0., 0., 0., 1., 2., 1.]])

shoyer · 2016-09-30T16:12:30Z

This is ready for further review.

I rewrote this to simply add a signature argument to vectorize instead of writing a whole new function. It is indeed much cleaner this way.

mhvk · 2016-10-03T04:05:12Z

numpy/lib/function_base.py

+    Tuple of input and output core dimensions parsed from the signature, each
+    of the form List[Tuple[str, ...]].
+    """
+    if not re.match(_SIGNATURE, signature):


Does this test carry much overhead? My always-try-to-optimize the normal path temptation would be to put the return statement inside a try/except clause like

try: return ... except Exception: if not re.match... raise ValueError(...) else: raise

the parsing is not really performance relevant, it is done on the object creation, the time consuming part is executing it on the data
but if you want to improve it, precompiling the regex might be worthwhile

I agree with @juliantaylor that this is unlikely to be a performance bottleneck, given that it is done on construction. My timing shows it takes around 10us, which is really in micro-benchmark territory.

My understanding is that in most cases it isn't necessary to worry about precompiling a regex because the re module maintains its own cache: https://docs.python.org/2/library/re.html#re.compile

mhvk · 2016-10-03T04:07:20Z

numpy/lib/function_base.py

@@ -2242,17 +2243,116 @@ def disp(mesg, device=None, linefeed=True):
    return


+# See http://docs.scipy.org/doc/numpy/reference/c-api.generalized-ufuncs.html
+_DIMENSION_NAME = r'\w+'
+_CORE_DIMENSION_LIST = '(?:%s(?:,%s)*)?' % (_DIMENSION_NAME, _DIMENSION_NAME)


Here and below: being relative new to python, I got used to new-style formats immediately, and always found the old-style ones a bit clunky. Would it be an idea to try to use new-style?

Sure, that's a little better. Neither is especially readable for building regular expressions, to be honest.

mhvk · 2016-10-03T04:15:22Z

This looks really nice. Only two trivial comments, which you can feel free to ignore.

shoyer · 2016-10-07T17:16:52Z

Note to self: this would be a good time to fix #5868, by adding a sensible error message if otypes is not provided and the output is empty.

shoyer · 2016-10-07T17:20:47Z

CC @mforbes (the last person to significantly update vectorize)

shoyer · 2016-10-07T17:23:04Z

numpy/lib/tests/test_function_base.py

+        r = f([0, 3, 6, 9], [1, 3, 5, 7])
+        assert_array_equal(r, [1, 6, 1, 2])
+
+    def test_siganture_mean_last(self):


siganture is a mispelling

shoyer · 2016-10-11T17:45:37Z

I added the fix for size 0 inputs, in case anyone wants to take another look.

shoyer · 2016-10-17T16:27:22Z

I'd love to merge this in time for v1.12.0.

@mhvk @juliantaylor would one of you mind taking another look at my revisions? Thanks.

mhvk · 2016-10-17T17:21:29Z

numpy/lib/function_base.py

+                            % (len(input_core_dims), len(args)))
+        args = tuple(asanyarray(arg) for arg in args)
+
+        # consider checking for size 0 inputs?


Is this comment still necessary? You do seem to check for size 0 things at the end.

Indeed, this is outdated. Removed.

mhvk · 2016-10-17T17:22:14Z

@shoyer - yes, indeed, this should be in 1.2, it is a very nice addition! I looked over the code fairly carefully again, and only had a a comment about a comment...

charris · 2016-10-17T22:51:35Z

@shoyer Sounds like time to cleanup the commit history.

shoyer · 2016-10-17T22:56:36Z

@charris how do you feel about GitHub's "Squash and merge" option? I think that can result in something that looks very similar to what I would make by hand here.

charris · 2016-10-17T23:31:08Z

@shoyer Don't like it, it squashes, then fast forwards the commit to master, so there is no record of the merge or the PR #. Some folks dislike the "bubble" git history resulting from lots of small merges, but for us I think it is a plus as both the authors and committers are documented along with the PR #, which makes it easy to automate part of the release documentation and link to the relevant PR. Maybe we could fix up the github function at some point if there is a hook. Don't remember what happens with the documentation.

shoyer · 2016-10-17T23:39:03Z

@charris OK, no problem I will rebase.

I disabled "Allow squash merging" and "Allow rebase merging" in GitHub settings so those buttons no longer tempt me :).

I also disabled force pushing to master while I was at it, since I'm pretty sure we don't want to allow that and this avoids accidental force pushes (which can happen).

charris · 2016-10-17T23:41:42Z

That said, I only ran the experiment once, so I might have missed some options along the way. Might be worth creating a test repo and experimenting with the buttons.

shoyer · 2016-10-17T23:47:24Z

"Squash and merge" puts everything in one commit, which by default has the PR number in the title, e.g., for this PR:

ENH: add signature argument to vectorize for vectorizing like generalized ufuncs (#8054)

The bodies for each commit get combined into the body of the squashed commit.

I like it because figuring out how to rebase to cleanup git history can be a burden for new committers.

charris · 2016-10-18T00:04:14Z

Hmm, so the commit message comes from the PR title rather than the first commit? What about the commit message body, was that an option to add?

shoyer · 2016-10-18T00:06:22Z

@charris You should try clicking the "Squash and merge" button once (I just re-enabled it). It doesn't merge until you click a second time to confirm, and it will show you an editable preview of the commit message.

charris · 2016-10-18T00:12:39Z

I think there is documentation saying that issue numbers should be part to the commit message summary line. We should change that if we allow squashing so we can reliably pull out the PR number as opposed to issues. See the tools/announce.py script to see how that done now by searching merges only. I'd still like to document the committer as well as the contributor. If we could disable the fast-forwarding I'd be happy. I think...

charris · 2016-10-18T00:22:11Z

A quick grep on the commit log exposes what looks like three uses of the squash button. One is mine, I suppose the other two are yours.

charris · 2016-10-18T00:36:56Z

Another solution would be to not bother squashing. My thinking about this is colored by trying to generate useful logs automatically as well as tracking committers. But I agree that the squashed commit message has much to recommend it and loss of merge tracking might not be so bad.

charris · 2016-10-18T01:33:15Z

Giving it a shot. I think the button needs to be used with discrimination.

charris · 2016-10-18T01:40:52Z

OK, one thing that is off is that the summary line of the commit, taken from the PR title, is too long at 88 characters. So one thing to do when using this option is to edit the title line to fix it up. Note that the messages from the consolidated commits also get indented an extra four spaces. All in all, I think manually squashing is a better option at this point but using the squash button is a possiblity.

shoyer force-pushed the guvectorize branch from 9d49774 to f3f5893 Compare September 15, 2016 08:16

charris added 01 - Enhancement component: numpy.lib labels Sep 16, 2016

shoyer mentioned this pull request Sep 23, 2016

New function for applying vectorized functions for unlabeled arrays to xarray objects pydata/xarray#964

Merged

shoyer force-pushed the guvectorize branch from 4a882cc to b2c9495 Compare September 26, 2016 03:42

shoyer commented Sep 26, 2016

View reviewed changes

shoyer mentioned this pull request Sep 26, 2016

MAINT: fix assert_raises_regex when used as a context manager #8093

Merged

shoyer force-pushed the guvectorize branch from a9ae09c to 43a176f Compare September 26, 2016 18:33

juliantaylor reviewed Sep 29, 2016

View reviewed changes

shoyer force-pushed the guvectorize branch from 43a176f to 77e7ff8 Compare September 30, 2016 03:43

shoyer changed the title ~~ENH: add guvectorize for vectorizing like generalized ufuncs~~ ENH: add signature argument to vectorize for vectorizing like generalized ufuncs Sep 30, 2016

mhvk reviewed Oct 3, 2016

View reviewed changes

Use str.format rather than %

1678a72

shoyer added this to the 1.12.0 release milestone Oct 3, 2016

shoyer commented Oct 7, 2016

View reviewed changes

shoyer added 2 commits October 11, 2016 09:31

Fix spelling typo

7d14629

BUG: fix np.vectorize for size 0 inputs

0498f28

shoyer mentioned this pull request Oct 11, 2016

vectorize fails for zero-dimensioned arrays #5868

Closed

DOC: add vectorize to 1.12.0 release notes

6d2a998

mhvk reviewed Oct 17, 2016

View reviewed changes

[ci-skip] Remove outdated comment

138cbd3

charris merged commit 0a02bb6 into numpy:master Oct 18, 2016

shoyer deleted the guvectorize branch October 18, 2016 17:43

mgeier mentioned this pull request Feb 12, 2018

vectorizing functions with multiple return values? pybind/pybind11#763

Open

shoyer mentioned this pull request Mar 4, 2018

numpy 1.11 support for apply_ufunc pydata/xarray#1956

Closed

		_SIGNATURE = '^%s->%s$' % (_ARGUMENT_LIST, _ARGUMENT_LIST)


		def _parse_gufunc_signature(signature):

ENH: add signature argument to vectorize for vectorizing like generalized ufuncs #8054

ENH: add signature argument to vectorize for vectorizing like generalized ufuncs #8054

Conversation

shoyer commented Sep 15, 2016 • edited Loading

Choose a reason for hiding this comment

shoyer commented Sep 26, 2016

juliantaylor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Sep 30, 2016

shoyer commented Sep 30, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk commented Oct 3, 2016

shoyer commented Oct 7, 2016

shoyer commented Oct 7, 2016

Choose a reason for hiding this comment

shoyer commented Oct 11, 2016

shoyer commented Oct 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk commented Oct 17, 2016

charris commented Oct 17, 2016

shoyer commented Oct 17, 2016

charris commented Oct 17, 2016

shoyer commented Oct 17, 2016 • edited Loading

charris commented Oct 17, 2016

shoyer commented Oct 17, 2016 • edited Loading

charris commented Oct 18, 2016

shoyer commented Oct 18, 2016

charris commented Oct 18, 2016

charris commented Oct 18, 2016

charris commented Oct 18, 2016

charris commented Oct 18, 2016

charris commented Oct 18, 2016

shoyer commented Sep 15, 2016 •

edited

Loading

shoyer commented Oct 17, 2016 •

edited

Loading

shoyer commented Oct 17, 2016 •

edited

Loading