RFC: add `searchsorted` to the specification #688

kgryte · 2023-09-20T01:30:50Z

This RFC proposes adding support to the array API specification for finding the indices where elements should be inserted in order to maintain order.

Overview

Based on array comparison data, the API is available across all considered libraries.

Furthermore, all considered libraries support the side keyword argument, and all considered libraries, except for TensorFlow, support the sorter keyword argument.

JAX supports an additional kwarg, method, which is used based on device/size performance optimization considerations. PyTorch and TensorFlow support specifying the output data type, but differ in naming conventions.

All array libraries support one-dimensional arrays. PyTorch and TensorFlow generalize to any n-dimensional ndarray (stacking).

Prior Art

Proposal

def searchsorted(x1: array, x2: array, /, *, side="left", sorter=None)

x1: one-dimensional array. If sorter is None, x1 must be sorted in ascending order.
x2: one-dimensional array.
side: if "left", the returned index i satisfies x1[i-1] < x2[j] <= x1[i]. Otherwise, if "right", the returned index i satisfies x1[i-1] <= x2[j] < x1[i]. If no suitable index, then i is either 0 or N, respectively, were N is the length of x1.
sorter: array of integer indices that sort x1 in ascending order (e.g., as might be produced via argsort).

Questions

Should the API be extended to support stacking as in PyTorch/TensorFlow?
Should the API support a scalar value for x2? NumPy, PyTorch, JAX, Dask support scalars. CuPy and TensorFlow do not.

The text was updated successfully, but these errors were encountered:

rgommers · 2023-09-21T05:02:27Z

Thanks @kgryte, this LGTM. I checked usage of searchsorted across scikit-learn, SciPy, pandas and Matplotlib, and it's used quite a bit in all libraries. The side keyword is the more heavily used one; there's only two instances in total of the sorter keyword (but it still seems okay to include).

Should the API be extended to support stacking as in PyTorch?

Not for now at least, that's too much of a burden for implementers and there doesn't seem to be a real need for this.

kgryte added the API extension Adds new functions or objects to the API. label Sep 20, 2023

kgryte added this to the v2023 milestone Sep 20, 2023

kgryte mentioned this issue Sep 20, 2023

Tracking issue for the 2023 revision of the array API specification #643

Closed

17 tasks

ogrisel mentioned this issue Sep 27, 2023

FIX array_api support for non-integer n_components in PCA scikit-learn/scikit-learn#27431

Merged

kgryte self-assigned this Nov 2, 2023

kgryte mentioned this issue Nov 6, 2023

Add searchsorted to the specification #699

Merged

lucascolley mentioned this issue Nov 25, 2023

follow-up actions for array API support in cluster scipy/scipy#18866

Open

7 tasks

rgommers closed this as completed in #699 Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: add `searchsorted` to the specification #688

RFC: add `searchsorted` to the specification #688

kgryte commented Sep 20, 2023 •

edited

Loading

rgommers commented Sep 21, 2023

RFC: add searchsorted to the specification #688

RFC: add searchsorted to the specification #688

Comments

kgryte commented Sep 20, 2023 • edited Loading

Overview

Prior Art

Proposal

Questions

rgommers commented Sep 21, 2023

RFC: add `searchsorted` to the specification #688

RFC: add `searchsorted` to the specification #688

kgryte commented Sep 20, 2023 •

edited

Loading