Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: add searchsorted to the specification #688

Closed
kgryte opened this issue Sep 20, 2023 · 1 comment · Fixed by #699
Closed

RFC: add searchsorted to the specification #688

kgryte opened this issue Sep 20, 2023 · 1 comment · Fixed by #699
Assignees
Labels
API extension Adds new functions or objects to the API.
Milestone

Comments

@kgryte
Copy link
Contributor

kgryte commented Sep 20, 2023

This RFC proposes adding support to the array API specification for finding the indices where elements should be inserted in order to maintain order.

Overview

Based on array comparison data, the API is available across all considered libraries.

Furthermore, all considered libraries support the side keyword argument, and all considered libraries, except for TensorFlow, support the sorter keyword argument.

JAX supports an additional kwarg, method, which is used based on device/size performance optimization considerations. PyTorch and TensorFlow support specifying the output data type, but differ in naming conventions.

All array libraries support one-dimensional arrays. PyTorch and TensorFlow generalize to any n-dimensional ndarray (stacking).

Prior Art

Proposal

def searchsorted(x1: array, x2: array, /, *, side="left", sorter=None)
  • x1: one-dimensional array. If sorter is None, x1 must be sorted in ascending order.
  • x2: one-dimensional array.
  • side: if "left", the returned index i satisfies x1[i-1] < x2[j] <= x1[i]. Otherwise, if "right", the returned index i satisfies x1[i-1] <= x2[j] < x1[i]. If no suitable index, then i is either 0 or N, respectively, were N is the length of x1.
  • sorter: array of integer indices that sort x1 in ascending order (e.g., as might be produced via argsort).

Questions

  • Should the API be extended to support stacking as in PyTorch/TensorFlow?
  • Should the API support a scalar value for x2? NumPy, PyTorch, JAX, Dask support scalars. CuPy and TensorFlow do not.
@kgryte kgryte added the API extension Adds new functions or objects to the API. label Sep 20, 2023
@kgryte kgryte added this to the v2023 milestone Sep 20, 2023
@rgommers
Copy link
Member

Thanks @kgryte, this LGTM. I checked usage of searchsorted across scikit-learn, SciPy, pandas and Matplotlib, and it's used quite a bit in all libraries. The side keyword is the more heavily used one; there's only two instances in total of the sorter keyword (but it still seems okay to include).

Should the API be extended to support stacking as in PyTorch?

Not for now at least, that's too much of a burden for implementers and there doesn't seem to be a real need for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API extension Adds new functions or objects to the API.
Projects
Status: Stage 2
Development

Successfully merging a pull request may close this issue.

2 participants