Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorizing POD types #2258

Open
chaitan94 opened this issue Jun 21, 2020 · 13 comments
Open

Vectorizing POD types #2258

chaitan94 opened this issue Jun 21, 2020 · 13 comments

Comments

@chaitan94
Copy link
Contributor

chaitan94 commented Jun 21, 2020

Issue description

In the documentation for vectorization, this is mentioned:

Only arithmetic, complex, and POD types passed by value or by const & reference are vectorized; all other arguments are passed through as-is.

However, it seems that if we use a POD type, py::vectorize does not work as expected. When I check the tests, it seems we do test that non-POD types pass through without vectorization, but there are no tests/examples for how to vectorize POD types.

Reproducible example code

#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>

struct PODClass {
    uint32_t value;
};

PYBIND11_MODULE(example, m) {
    pybind11::class_<PODClass>(m, "PODClass")
        .def(pybind11::init<>())
        .def_readwrite("value", &PODClass::value);

    PYBIND11_NUMPY_DTYPE(PODClass, value);

    m.def("pod_passthrough", pybind11::vectorize(
        [](PODClass a) {
            return a;
        }
    ));

    static_assert(std::is_pod<PODClass>::value);  // To ensure we actually do have a POD type
}

This compiles. When we try to use in python, however:

import example
import numpy as np

object_1 = example.PODClass()
object_1.value = 10
object_2 = example.PODClass()
object_2.value = 20
vec = np.array([object_1, object_2])
example.pod_passthrough(vec)
>       example.pod_passthrough(a)
E       TypeError: pod_passthrough(): incompatible function arguments. The following argument types are supported:
E           1. (arg0: numpy.ndarray[pybind11_tests.numpy_vectorize.PODClass]) -> object
E       
E       Invoked with: array([<pybind11_tests.numpy_vectorize.PODClass object at 0x7f662db9de30>,
E              <pybind11_tests.numpy_vectorize.PODClass object at 0x7f6638a3d970>],
E             dtype=object)

Not sure if this is expected or I am doing something wrong.

@EricCousineau-TRI
Copy link
Collaborator

If you're looking for custom dtypes, I've enumerated what (I think) the options are here:
#2259

You may want dtype=object, as that's the easiest to deal with: #1152

@chaitan94
Copy link
Contributor Author

Not sure if I understood you correctly. Do you mean I should try the following?

vec = np.array([object_1, object_2], dtype=np.object)

If so, this doesn't work either. In fact, I think it is equivalent to my original snippet, as in the error it seems that pybind was already inferring the dtype=object part.

@chaitan94
Copy link
Contributor Author

From #1152 it seems support for dtype=object is still a WIP (via the introduction of PYBIND11_NUMPY_OBJECT_DTYPE), is that correct?

If so, the current documentation about vectorize, to some extent, is a bit misleading I feel:

POD types passed by value or by const & reference are vectorized

I will try and deep dive on #1152 further soon, and can try to contribute if possible - if it is in the project's plan to in fact add that feature.

@EricCousineau-TRI
Copy link
Collaborator

EricCousineau-TRI commented Jun 21, 2020

Ah, actually, for the POD type, have you tried structured dtypes?
https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html#structured-types

(I mentioned it as option (1) in #2259, but I should've put that at the forefront here - sorry!)

@EricCousineau-TRI
Copy link
Collaborator

So really, I think the docs mean "structured dtype-compatible POD types"

@EricCousineau-TRI
Copy link
Collaborator

Oh, derp, sorry x 2! You already did by way of PYBIND11_NUMPY_DTYPE...
Lemme see if I can repro.

@EricCousineau-TRI
Copy link
Collaborator

@chaitan94 I wasn't able to reproduce your problem; I modified the unittest here, and it seemed to work fine?
#2260

@EricCousineau-TRI
Copy link
Collaborator

Oh, triple derp, and sorry x 3. It's b/c of the mixture of object and the structured dtype.
Will expand on PR to denote the diffs.

@EricCousineau-TRI
Copy link
Collaborator

Just pushed 337dc9c showing a reproduction of your issue.

Basically, it's just a mix of how NumPy can represent scalars (either as np.array().shape == () or a subclass of np.generic, in this case np.void) and how pybind11s py::detail::type_caster<py::array_t<Class>> and py::detail::type_caster<Class> works with those types.

Dunno what the right solution is, but at least this captures some of the weirdness?

@chaitan94
Copy link
Contributor Author

Well, I did try this before raising this issue:

import example
import numpy as np

object_1 = example.PODClass()
object_1.value = 10
object_2 = example.PODClass()
object_2.value = 20

pod_dtype = np.dtype([('value', np.int32)])
vec = np.array([object_1, object_2], dtype=pod_dtype)

But this fails as well, with the following error:

    vec = np.array([object_1, object_2], dtype=pod_dtype)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'example.PODClass'

But from you sample unit test, it seems right now astuple and fromtuple seem to be the key. This works:

import example
import numpy as np

object_1 = (10, )
object_2 = (20, )

vec = np.array([object_1, object_2])
print(example.pod_passthrough(vec))

Constructing with example.PODClass() directly, however, still does not work.

@chaitan94
Copy link
Contributor Author

Also, the create_recarray approach seems to work as well. I think the issue is when the structs are created directly in python.

@chaitan94
Copy link
Contributor Author

chaitan94 commented Jun 25, 2020

Hey @EricCousineau-TRI, I have tried pulling in the changes from #1152 and tried re-running the original code I put in this issue. I am still facing the same error (incompatible function arguments). Is this expected, or did I misunderstand you somewhere?

If the changes in #1152 do add support for dtype=object, could you share an example usage of it for creating a vectorized function for POD types (say SimpleStruct)?

Or maybe perhaps #1152 only adds support for dtype=object itself, while py::vectorize still needs some additional effort to work with dtype=object

@EricCousineau-TRI
Copy link
Collaborator

Hi @chaitan94! Sorry for the delay - can I ask if you're still facing this issue?

Also, for tracking - here's your request for a doc change: #2260 (comment)
I can take a whack at this at some point in the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants