Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add format_descriptor<> & npy_format_descriptor<> PyObject * specializations. #4674

Merged
merged 24 commits into from
May 23, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
5168c13
Add `npy_format_descriptor<PyObject *>` to enable `py::array_t<PyObje…
May 17, 2023
d53a796
resolve clang-tidy warning
May 17, 2023
5bea2a8
Use existing constructor instead of adding a static method. Thanks @S…
May 17, 2023
50eaa3a
Add `format_descriptor<PyObject *>`
May 17, 2023
82ce80f
Add test_format_descriptor_format
May 18, 2023
20b9baf
Ensure the Eigen `type_caster`s do not segfault when loading arrays w…
May 18, 2023
0640eb3
Use `static_assert()` `!std::is_pointer<>` to replace runtime guards.
May 18, 2023
ddb625e
Add comments to explain how to check for ref-count bugs. (NO code cha…
May 18, 2023
03dafde
Make the "Pointer types ... are not supported" message Eigen-specific…
May 18, 2023
28492ed
Change "format_descriptor_format" implementation as suggested by @Lal…
May 18, 2023
1593ebc
resolve clang-tidy warning
May 18, 2023
3f04188
Account for np.float128, np.complex256 not being available on Windows…
May 18, 2023
38aa697
Fully address i|q|l ambiguity (hopefully).
May 18, 2023
7f124bb
Remove the new `np.format_parser()`-based test, it's much more distra…
May 19, 2023
d432ce7
Use bi.itemsize to disambiguate "l" or "L"
May 19, 2023
18e1bd2
Use `py::detail::compare_buffer_info<T>::compare()` to validate the `…
May 19, 2023
029b157
Add `buffer_info::compare<T>` to make `detail::compare_buffer_info<T>…
May 19, 2023
d9e3bd3
silence clang-tidy warning
May 19, 2023
e9a289c
pytest-compatible access to np.float128, np.complex256
May 19, 2023
8abe0e9
Revert "pytest-compatible access to np.float128, np.complex256"
May 19, 2023
b09e75b
Use `sizeof(long double) == sizeof(double)` instead of `std::is_same<>`
May 19, 2023
ba7063e
Report skipped `long double` tests.
May 19, 2023
a4d61b4
Change the name of the new `buffer_info` member function to `item_typ…
May 19, 2023
ef34d29
Change `item_type_is_equivalent_to<>()` from `static` function to mem…
May 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions include/pybind11/detail/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -1025,6 +1025,20 @@ PYBIND11_RUNTIME_EXCEPTION(reference_cast_error, PyExc_RuntimeError) /// Used in
template <typename T, typename SFINAE = void>
struct format_descriptor {};

template <typename T>
struct format_descriptor<
T,
detail::enable_if_t<detail::is_same_ignoring_cvref<T, PyObject *>::value>> {
static constexpr const char c = 'O';
static constexpr const char value[2] = {c, '\0'};
static std::string format() { return std::string(1, c); }
};

// Common message for `static_assert()`s, which are useful to easily preempt much less obvious
// errors in code that does not support `format_descriptor<PyObject *>`.
#define PYBIND11_MESSAGE_POINTER_TYPES_ARE_NOT_SUPPORTED \
"Pointer types (in particular `PyObject *`) are not supported."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this message Eigen specific

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I introduced pybind11/eigen/common.h to have central location for the message. (Previously I shied away from doing that, that's why the message wasn't specific. But that was really just taking a shortcut. What we have now this is definitely better.)


PYBIND11_NAMESPACE_BEGIN(detail)
// Returns the index of the given type in the type char array below, and in the list in numpy.h
// The order here is: bool; 8 ints ((signed,unsigned)x(8,16,32,64)bits); float,double,long double;
Expand Down
12 changes: 12 additions & 0 deletions include/pybind11/eigen/matrix.h
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,8 @@ handle eigen_encapsulate(Type *src) {
template <typename Type>
struct type_caster<Type, enable_if_t<is_eigen_dense_plain<Type>::value>> {
using Scalar = typename Type::Scalar;
static_assert(!std::is_pointer<Scalar>::value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be able to move these asserts to EigenProps, which would reduce the amount of redundant code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was ambivalent before and still am a little bit, but decided to keep the 5 static_assert():

Cons: 2-3 x 2 more lines of code.

Pros:

The static_assert()s appear at or near the top of each of the type_casters, which makes them essentially a kind of documentation.

In case a static_assert() fails, it will be much more obvious that the message we want to send is: the type_caster does not support pointer types as scalar types.

PYBIND11_MESSAGE_POINTER_TYPES_ARE_NOT_SUPPORTED);
using props = EigenProps<Type>;

bool load(handle src, bool convert) {
Expand Down Expand Up @@ -405,6 +407,9 @@ struct type_caster<Type, enable_if_t<is_eigen_dense_plain<Type>::value>> {
// Base class for casting reference/map/block/etc. objects back to python.
template <typename MapType>
struct eigen_map_caster {
static_assert(!std::is_pointer<typename MapType::Scalar>::value,
PYBIND11_MESSAGE_POINTER_TYPES_ARE_NOT_SUPPORTED);

private:
using props = EigenProps<MapType>;

Expand Down Expand Up @@ -457,6 +462,8 @@ struct type_caster<
using Type = Eigen::Ref<PlainObjectType, 0, StrideType>;
using props = EigenProps<Type>;
using Scalar = typename props::Scalar;
static_assert(!std::is_pointer<Scalar>::value,
PYBIND11_MESSAGE_POINTER_TYPES_ARE_NOT_SUPPORTED);
using MapType = Eigen::Map<PlainObjectType, 0, StrideType>;
using Array
= array_t<Scalar,
Expand Down Expand Up @@ -604,6 +611,9 @@ struct type_caster<
// regular Eigen::Matrix, then casting that.
template <typename Type>
struct type_caster<Type, enable_if_t<is_eigen_other<Type>::value>> {
static_assert(!std::is_pointer<typename Type::Scalar>::value,
PYBIND11_MESSAGE_POINTER_TYPES_ARE_NOT_SUPPORTED);

protected:
using Matrix
= Eigen::Matrix<typename Type::Scalar, Type::RowsAtCompileTime, Type::ColsAtCompileTime>;
Expand Down Expand Up @@ -632,6 +642,8 @@ struct type_caster<Type, enable_if_t<is_eigen_other<Type>::value>> {
template <typename Type>
struct type_caster<Type, enable_if_t<is_eigen_sparse<Type>::value>> {
using Scalar = typename Type::Scalar;
static_assert(!std::is_pointer<Scalar>::value,
PYBIND11_MESSAGE_POINTER_TYPES_ARE_NOT_SUPPORTED);
using StorageIndex = remove_reference_t<decltype(*std::declval<Type>().outerIndexPtr())>;
using Index = typename Type::Index;
static constexpr bool rowMajor = Type::IsRowMajor;
Expand Down
4 changes: 4 additions & 0 deletions include/pybind11/eigen/tensor.h
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,8 @@ PYBIND11_WARNING_POP

template <typename Type>
struct type_caster<Type, typename eigen_tensor_helper<Type>::ValidType> {
static_assert(!std::is_pointer<typename Type::Scalar>::value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not add the assert to eigen_tensor_helper instead to avoid duplicate lines?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

PYBIND11_MESSAGE_POINTER_TYPES_ARE_NOT_SUPPORTED);
using Helper = eigen_tensor_helper<Type>;
static constexpr auto temp_name = get_tensor_descriptor<Type, false>::value;
PYBIND11_TYPE_CASTER(Type, temp_name);
Expand Down Expand Up @@ -359,6 +361,8 @@ struct get_storage_pointer_type<MapType, void_t<typename MapType::PointerArgType
template <typename Type, int Options>
struct type_caster<Eigen::TensorMap<Type, Options>,
typename eigen_tensor_helper<remove_cv_t<Type>>::ValidType> {
static_assert(!std::is_pointer<typename Type::Scalar>::value,
PYBIND11_MESSAGE_POINTER_TYPES_ARE_NOT_SUPPORTED);
using MapType = Eigen::TensorMap<Type, Options>;
using Helper = eigen_tensor_helper<remove_cv_t<Type>>;

Expand Down
18 changes: 12 additions & 6 deletions include/pybind11/numpy.h
Original file line number Diff line number Diff line change
Expand Up @@ -564,6 +564,8 @@ class dtype : public object {
m_ptr = from_args(args).release().ptr();
}

/// Return dtype for the given typenum (one of the NPY_TYPES).
/// https://numpy.org/devdocs/reference/c-api/array.html#c.PyArray_DescrFromType
explicit dtype(int typenum)
: object(detail::npy_api::get().PyArray_DescrFromType_(typenum), stolen_t{}) {
if (m_ptr == nullptr) {
Expand Down Expand Up @@ -1283,12 +1285,16 @@ struct npy_format_descriptor<
public:
static constexpr int value = values[detail::is_fmt_numeric<T>::index];

static pybind11::dtype dtype() {
if (auto *ptr = npy_api::get().PyArray_DescrFromType_(value)) {
return reinterpret_steal<pybind11::dtype>(ptr);
}
pybind11_fail("Unsupported buffer format!");
}
static pybind11::dtype dtype() { return pybind11::dtype(/*typenum*/ value); }
};

template <typename T>
struct npy_format_descriptor<T, enable_if_t<is_same_ignoring_cvref<T, PyObject *>::value>> {
static constexpr auto name = const_name("object");

static constexpr int value = npy_api::NPY_OBJECT_;

static pybind11::dtype dtype() { return pybind11::dtype(/*typenum*/ value); }
};

#define PYBIND11_DECL_CHAR_FMT \
Expand Down
28 changes: 28 additions & 0 deletions tests/test_buffers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,40 @@
BSD-style license that can be found in the LICENSE file.
*/

#include <pybind11/complex.h>
#include <pybind11/stl.h>

#include "constructor_stats.h"
#include "pybind11_tests.h"

TEST_SUBMODULE(buffers, m) {

#define PYBIND11_LOCAL_DEF(...) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's worth changing, but I tend to not like macros that return values.

I would have written this as:

std::map<std::string, std::string> values;

#define ASSIGN_HELPER(...)
    values[#__VA_ARGS__] = return py::format_descriptor<__VA_ARGS__>::format();

ASSIGN_HELPER(bool);
ASSIGN_HELPER(PyObject*);

return values[cpp_name];

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

if (cpp_name == #__VA_ARGS__) \
return py::format_descriptor<__VA_ARGS__>::format();

m.def("format_descriptor_format", [](const std::string &cpp_name) {
PYBIND11_LOCAL_DEF(PyObject *)
PYBIND11_LOCAL_DEF(bool)
PYBIND11_LOCAL_DEF(std::int8_t)
PYBIND11_LOCAL_DEF(std::uint8_t)
PYBIND11_LOCAL_DEF(std::int16_t)
PYBIND11_LOCAL_DEF(std::uint16_t)
PYBIND11_LOCAL_DEF(std::int32_t)
PYBIND11_LOCAL_DEF(std::uint32_t)
PYBIND11_LOCAL_DEF(std::int64_t)
PYBIND11_LOCAL_DEF(std::uint64_t)
PYBIND11_LOCAL_DEF(float)
PYBIND11_LOCAL_DEF(double)
PYBIND11_LOCAL_DEF(long double)
PYBIND11_LOCAL_DEF(std::complex<float>)
PYBIND11_LOCAL_DEF(std::complex<double>)
PYBIND11_LOCAL_DEF(std::complex<long double>)
return std::string("UNKNOWN");
});

#undef PYBIND11_LOCAL_DEF

// test_from_python / test_to_python:
class Matrix {
public:
Expand Down
26 changes: 26 additions & 0 deletions tests/test_buffers.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,32 @@
np = pytest.importorskip("numpy")


@pytest.mark.parametrize(
("cpp_name", "expected_codes"),
[
("PyObject *", ["O"]),
("bool", ["?"]),
("std::int8_t", ["b"]),
("std::uint8_t", ["B"]),
("std::int16_t", ["h"]),
("std::uint16_t", ["H"]),
("std::int32_t", ["i"]),
("std::uint32_t", ["I"]),
("std::int64_t", ["q"]),
("std::uint64_t", ["Q"]),
("float", ["f"]),
("double", ["d"]),
("long double", ["g", "d"]),
("std::complex<float>", ["Zf"]),
("std::complex<double>", ["Zd"]),
("std::complex<long double>", ["Zg", "Zd"]),
("", ["UNKNOWN"]),
],
)
def test_format_descriptor_format(cpp_name, expected_codes):
assert m.format_descriptor_format(cpp_name) in expected_codes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an assert that the format descriptor is a valid numpy format descriptor

https://numpy.org/doc/stable/reference/generated/numpy.format_parser.html#numpy-format-parser

assert np. format_parser(blah).dtype is not None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done-ish. See my comments in the test code.

What a lucky man I was to never have seen #1908 before.

I see Windows isn't happy about my latest version of the test. Fixing...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an assert that the format descriptor is a valid numpy format descriptor

@lalaland Thanks a lot for nudging me in that direction!

Using np. format_parser() didn't work out, but via d432ce7 I got on the right track, and after that taught me what I needed to look for, I quickly found it here:

template <typename T, typename SFINAE = void>
struct compare_buffer_info {
static bool compare(const buffer_info &b) {
return b.format == format_descriptor<T>::format() && b.itemsize == (ssize_t) sizeof(T);
}
};
template <typename T>
struct compare_buffer_info<T, detail::enable_if_t<std::is_integral<T>::value>> {
static bool compare(const buffer_info &b) {
return (size_t) b.itemsize == sizeof(T)
&& (b.format == format_descriptor<T>::value
|| ((sizeof(T) == sizeof(long))
&& b.format == (std::is_unsigned<T>::value ? "L" : "l"))
|| ((sizeof(T) == sizeof(size_t))
&& b.format == (std::is_unsigned<T>::value ? "N" : "n")));
}
};

From there it was only a small step to add a public (not in namespace detail) interface for the existing functionality (029b157), and then your validation idea basically fell into place.

The new public interface is also exactly what I need for a clean solution here:

https://github.com/pybind/pybind11_abseil/blob/a8fed7557747fa33e5a04844738baf5f6a1c8d1b/pybind11_abseil/absl_casters.h#L442



def test_from_python():
with pytest.raises(RuntimeError) as excinfo:
m.Matrix(np.array([1, 2, 3])) # trying to assign a 1D array
Expand Down
26 changes: 26 additions & 0 deletions tests/test_numpy_array.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -523,4 +523,30 @@ TEST_SUBMODULE(numpy_array, sm) {
sm.def("test_fmt_desc_const_double", [](const py::array_t<const double> &) {});

sm.def("round_trip_float", [](double d) { return d; });

sm.def("pass_array_pyobject_ptr_return_sum_str_values",
[](const py::array_t<PyObject *> &objs) {
std::string sum_str_values;
for (const auto &obj : objs) {
sum_str_values += py::str(obj.attr("value"));
}
return sum_str_values;
});

sm.def("pass_array_pyobject_ptr_return_as_list",
[](const py::array_t<PyObject *> &objs) -> py::list { return objs; });

sm.def("return_array_pyobject_ptr_cpp_loop", [](const py::list &objs) {
py::size_t arr_size = py::len(objs);
py::array_t<PyObject *> arr_from_list(static_cast<py::ssize_t>(arr_size));
PyObject **data = arr_from_list.mutable_data();
for (py::size_t i = 0; i < arr_size; i++) {
assert(data[i] == nullptr);
data[i] = py::cast<PyObject *>(objs[i].attr("value"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a silly question, but does this appropriately increase the reference count?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a silly question, this was something I was struggling with quite a bit:

// Note that `cast<PyObject *>(obj)` increments the reference count of `obj`.
// This is necessary for the case that `obj` is a temporary, and could
// not possibly be different, given
// 1. the established convention that the passed `handle` is borrowed, and
// 2. we don't want to force all generic code using `cast<T>()` to special-case
// handling of `T` = `PyObject *` (to increment the reference count there).
// It is the responsibility of the caller to ensure that the reference count
// is decremented.
template <typename T,
typename Handle,
detail::enable_if_t<detail::is_same_ignoring_cvref<T, PyObject *>::value
&& detail::is_same_ignoring_cvref<Handle, handle>::value,
int>
= 0>
T cast(Handle &&handle) {
return handle.inc_ref().ptr();
}

}
return arr_from_list;
});

sm.def("return_array_pyobject_ptr_from_list",
[](const py::list &objs) -> py::array_t<PyObject *> { return objs; });
}
71 changes: 71 additions & 0 deletions tests/test_numpy_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -595,3 +595,74 @@ def test_round_trip_float():
arr = np.zeros((), np.float64)
arr[()] = 37.2
assert m.round_trip_float(arr) == 37.2


# HINT: An easy and robust way (although only manual unfortunately) to check for
Copy link
Collaborator

@EthanSteinberg EthanSteinberg May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to do this automatically using weakrefs: https://docs.python.org/3/library/weakref.html#weakref.ref

I'm not sure it's worth changing for this PR.

Copy link
Collaborator Author

@rwgk rwgk May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be possible to do this automatically using weakrefs:

Yes, but: That only works when making strong assumptions about the reference-counting and garbage collection behavior. In practice, when targeting C Python only, that's usually fine. Fundamentally though such tests are creating tech debt that might get in the way of core interpreter developments in the future. Therefore I generally write tests involving weakref and sys.getrefcount() only as a last resort.

What would be great to have: a pytest feature, e.g. @pytest.mark.empirical_leak_check(), with options like --empirical_leak_check=quick and --empirical_leak_check=thorough, that run each marked test in a loop for a few seconds and monitor RES programmatically, with heuristics to decide how long to try (to deal with noise) and when to flag a potential leak or not. (It will probably be a bit of black magic to get to a useful stable implementation.)

# ref-count leaks in the test_.*pyobject_ptr.* functions below is to
# * temporarily insert `while True:` (one-by-one),
# * run this test, and
# * run the Linux `top` command in another shell to visually monitor
# `RES` for a minute or two.
# If there is a leak, it is usually evident in seconds because the `RES`
# value increases without bounds. (Don't forget to Ctrl-C the test!)


# For use as a temporary user-defined object, to maximize sensitivity of the tests below:
# * Ref-count leaks will be immediately evident.
# * Sanitizers are much more likely to detect heap-use-after-free due to
# other ref-count bugs.
class PyValueHolder:
def __init__(self, value):
self.value = value


def WrapWithPyValueHolder(*values):
return [PyValueHolder(v) for v in values]


def UnwrapPyValueHolder(vhs):
return [vh.value for vh in vhs]


def test_pass_array_pyobject_ptr_return_sum_str_values_ndarray():
# Intentionally all temporaries, do not change.
assert (
m.pass_array_pyobject_ptr_return_sum_str_values(
np.array(WrapWithPyValueHolder(-3, "four", 5.0), dtype=object)
)
== "-3four5.0"
)


def test_pass_array_pyobject_ptr_return_sum_str_values_list():
# Intentionally all temporaries, do not change.
assert (
m.pass_array_pyobject_ptr_return_sum_str_values(
WrapWithPyValueHolder(2, "three", -4.0)
)
== "2three-4.0"
)


def test_pass_array_pyobject_ptr_return_as_list():
# Intentionally all temporaries, do not change.
assert UnwrapPyValueHolder(
m.pass_array_pyobject_ptr_return_as_list(
np.array(WrapWithPyValueHolder(-1, "two", 3.0), dtype=object)
)
) == [-1, "two", 3.0]


@pytest.mark.parametrize(
("return_array_pyobject_ptr", "unwrap"),
[
(m.return_array_pyobject_ptr_cpp_loop, list),
(m.return_array_pyobject_ptr_from_list, UnwrapPyValueHolder),
],
)
def test_return_array_pyobject_ptr_cpp_loop(return_array_pyobject_ptr, unwrap):
# Intentionally all temporaries, do not change.
arr_from_list = return_array_pyobject_ptr(WrapWithPyValueHolder(6, "seven", -8.0))
assert isinstance(arr_from_list, np.ndarray)
assert arr_from_list.dtype == np.dtype("O")
assert unwrap(arr_from_list) == [6, "seven", -8.0]