Add `take` specification for returning elements of an array along a specified axis #416

kgryte · 2022-04-07T09:51:27Z

This PR

adds a specification for take, an API for returning elements of an array along a specified axis. This API was initially proposed in Proposal: add APIs for getting and setting elements via a list of indices (i.e., take, put, etc) #177.
supports only a single keyword argument: axis. This is derived from comparison data.
requires that the axis keyword argument always be provided, as suggested in comment, but only when the input array has more than one dimension.
does not currently support providing a multi-dimensional array of indices. Array libraries supporting multi-dimensional arrays for indices treat the input array as a flattened 1-dimensional array. If we were to follow similarly, we'd need to specify the flattened layout.
does not currently support lists/tuples/sequences for the indices argument. NumPy et al allow array-like. Torch requires LongTensor. Was not immediately clear to me how accepting we should be for non-arrays, so went with the most conservative choice.
does not support an integer for the indices argument. Instead, users should use normal slice/indexing operations.

Questions

Should the API be extended to support multi-dimensional indices? No, not yet.
Should the axis argument be required? Yes, but only for multi-dimensional x.
Should the indices argument be more liberal in the types of allowed arguments? No, should be restricted to only array values.

honno · 2022-04-07T11:56:12Z

Should this PR be blocked until an initial spec release?

asmeurer · 2022-04-07T18:01:39Z

Regarding axis and flattening, it's worth noting np.put doesn't have axis, so requiring it here would either break the symmetry with a potential put function, or require put to be modified from NumPy (see #177 (comment)).

shoyer · 2022-04-07T18:56:12Z

Two thoughts:

We could make axis optional only in the case of 1D arrays
I would not necessarily be limited by constraints on np.put, which in my opinion is much more niche functionality than take. I don't even know if we want put in the API standard.

kgryte · 2022-04-18T08:36:05Z

@rgommers Before moving forward with this PR, we need to ensure buy-in from PyTorch. The current proposal deviates from supported PyTorch behavior in the following ways:

PyTorch does not currently support an axis kwarg.
PyTorch always flattens the input array.

We'd need to ensure that PyTorch is (1) okay with adding an axis kwarg to support taking along a specified axis and (2) making an axis kwarg mandatory, thus breaking current behavior (i.e., based on the current proposal, no way to index into a flattened array using take which is the current PyTorch behavior).

For NumPy-inspired libraries, the axis kwarg is supported and, provided the spec-compliant API is in a sub-namespace, xp.take(x, indices, axis=-1) can simply wrap a call to np.take. Dask already effectively requires an axis kwarg, as the default axis value is 0.

Accordingly, PyTorch is the odd one out here, atm, so would be good to get a feel for potential specification constraints in order to ensure broad API compatibility, while also addressing fancy indexing concerns (indexing arrays via arrays) and the proposal to require an axis.

rgommers · 2022-07-06T20:42:20Z

For NumPy-inspired libraries, the axis kwarg is supported and, provided the spec-compliant API is in a sub-namespace, xp.take(x, indices, axis=-1) can simply wrap a call to np.take. Dask already effectively requires an axis kwarg, as the default axis value is 0.

This requires a backwards-incompatible change in NumPy as well, when we want to move to support in the main namespace. It'd be a good change to make though. Dask not following the NumPy default is fairly exceptional for Dask, and was probably done because the flattening behavior is indeed not useful (EDIT: it was added in dask/dask#26, no discussion there on axis default).

I just had a look at SciPy, and it has quite a few np.take usages but none with the default axis=None. Scikit-learn does have some, I need to look in a little more detail to verify whether those are all for 1-D arrays or not.

rgommers · 2022-07-07T13:11:41Z

I looked at all usages in scikit-learn and most inputs to np.take are 1-D, in which case the axis argument is not used/needed, and for the one case with 2-D array input the code uses axis=0:

# `processed` is 1-D
sklearn/cluster/_optics.py:    unproc = np.compress(~np.take(processed, indices), indices)
--
# `X` is 2-D, that's why this uses `axis=0`
sklearn/cluster/_optics.py:        dists = pairwise_distances(P, np.take(X, unproc, axis=0),
--
# `reachability_` is 1-D
sklearn/cluster/_optics.py:    improved = np.where(rdists < np.take(reachability_, unproc))
--
# `compute_class_weight` returns a 1-D array
sklearn/utils/class_weight.py:            weight_k = np.take(compute_class_weight(class_weight_k,
sklearn/utils/class_weight.py-                                                    classes=classes_subsample,
sklearn/utils/class_weight.py-                                                    y=y_subsample),
sklearn/utils/class_weight.py-                               np.searchsorted(classes_subsample,
sklearn/utils/class_weight.py-                                               classes_full),
sklearn/utils/class_weight.py-                               mode='clip')
--
# `labels_multiclass` and `labels_binary` are lists of strings (so 1-D)
sklearn/utils/estimator_checks.py:    y_names_multiclass = np.take(labels_multiclass, y_multiclass)
sklearn/utils/estimator_checks.py:    y_names_binary = np.take(labels_binary, y_binary)
sklearn/utils/estimator_checks.py:    y_names_binary = np.take(labels_binary, y_binary)
--
# `nominal_attributes` is a dict, so first argument to `take` is a 1-D object array
sklearn/datasets/_openml.py:                    np.take(
sklearn/datasets/_openml.py-                        np.asarray(nominal_attributes.pop(col_name),
sklearn/datasets/_openml.py-                                   dtype='O'),
sklearn/datasets/_openml.py-                        y[:, i:i + 1].astype(int, copy=False))
--
# `perm` is 1-D
sklearn/model_selection/tests/test_split.py:        y_perm = np.take(perm, y)
--
# `perm` is 1-D
sklearn/metrics/tests/test_common.py:        y_true_perm = np.take(perm, y_true)
--
# `labels_multiclass` is a list of strings (so 1-D)
sklearn/semi_supervised/tests/test_self_training.py:    y_strings = np.take(labels_multiclass, y)

rgommers · 2022-07-07T13:37:20Z

does not currently support lists/tuples/sequences for the indices argument. NumPy et al allow array-like. Torch requires LongTensor. Was not immediately clear to me how accepting we should be for non-arrays, so went with the most conservative choice.

As a rule we don't allow array-like inputs in the array API standard, because (a) it's ill-defined, and (b) it turns out not to be the best idea to allow it anyway. So it's fine to be consistent with that here.

We could make axis optional only in the case of 1D arrays

This makes sense to me. And seems useful, based on my analysis of how scikit-learn uses np.take.

The current proposal deviates from supported PyTorch behavior in the following ways:
1. PyTorch does not currently support an `axis` kwarg.

2. PyTorch always flattens the input array.
We'd need to ensure that PyTorch is (1) okay with adding an axis kwarg to support taking along a specified axis and (2) making an axis kwarg mandatory, thus breaking current behavior (i.e., based on the current proposal, no way to index into a flattened array using take which is the current PyTorch behavior).

I don't think there's an issue with progressing this PR:

torch.take with two arguments is actually compatible with numpy.take with two arguments,
adding an axis keyword is not controversial for PyTorch, and would be fully backwards compatible if it were to default to None,
the introduction path would add axis=None (or better, dim=None) first, and then have the appropriate warnings raised before switching the default to 0,
the bc-breaking change would be to add axis=0; if as discussed above the axis keyword is mandatory for >=2-D array input, then there's still no risk of silent breakage, just some code churn. And this would be the same for PyTorch as for NumPy,
I'll note that this will be low-prio for PyTorch to work on though, because there's already take_along_dim that does what we propose here (again similar to np.take_along_axis), so it's just some churn but doesn't add functionality.

rgommers

Thanks @kgryte, overall looks good - a few comments.

One formatting comment: please wrap lines at 80 characters, that's standard in Python and far easier to read/review.

spec/API_specification/signatures/indexing_functions.py

seberg · 2022-07-07T16:30:47Z

We could make axis optional only in the case of 1D arrays

Seems OK, since it is compatible with None as a default. But, is there much reason for it as in that case arr[index] is practically identical (unless you need to ensure a copy and index is a scalar)?

(Plausibly, in a few cases sklearn uses it because many years ago take was faster and either the myth or the code stuck around. One of the example uses mode="clip" though.)

I am not immediately sure whether 0 or -1 would make the better default. None is useful for reduce-likes, but take is maybe not quite reduce-like enough (but probably inherits it from there).

take_along_dim that does what we propose here (again similar to np.take_along_axis), so it's just some churn but doesn't add functionality.

take_along_axis is a very different functionality, it is only identical for 1-D inputs. take_along_axis specificalaly takes a arr.ndim dimensional index array as input.

rgommers · 2022-07-07T17:13:43Z

Seems OK, since it is compatible with None as a default. But, is there much reason for it as in that case arr[index] is practically identical (unless you need to ensure a copy and index is a scalar)?

Except integer array indexing is not in the standard (yet at least), which is why there was so much interest in adding take.

take_along_axis is a very different functionality,

Good point, I overlooked that.

…add-take

kgryte · 2022-11-03T05:47:08Z

As discussed in the Consortium meeting on 22 September 2022, consensus was to limit the indices argument to one-dimensional arrays.

For now, the API will not support multi-dimensional arrays for the indices kwarg. In the interim, array API consumers should be able to use reshape to achieve something similar. If, based on feedback and usage, multi-dimensional indices support is desired, we can revisit the specification and update accordingly.

This PR has been updated to reflect current consensus.

kgryte · 2022-11-03T17:50:37Z

As discussed in the 3 November 2022 meeting, no objections were raised. This PR should be ready for merge.

@rgommers would you mind taking a last look and then approving?

rgommers · 2022-11-16T20:36:52Z

I went through the whole discussion here again, and it looks like we're all happy and there aren't any remaining blockers. This wasn't an easy one! In it goes, thanks all!

kgryte added 3 commits April 7, 2022 02:36

Add take specification

250d38f

Update copy

2b9484b

Update index

5f09307

kgryte added API extension Adds new functions or objects to the API. topic: Indexing Array indexing. labels Apr 7, 2022

kgryte requested a review from asmeurer April 7, 2022 09:53

rgommers mentioned this pull request Apr 7, 2022

Proposal: add APIs for getting and setting elements via a list of indices (i.e., take, put, etc) #177

Open

Clarify behavior when provided a zero-dimensional array

370675b

kgryte added this to the v2021 milestone Apr 21, 2022

kgryte modified the milestones: v2021, v2022 May 2, 2022

rgommers requested changes Jul 7, 2022

View reviewed changes

spec/API_specification/signatures/indexing_functions.py Outdated Show resolved Hide resolved

spec/API_specification/signatures/indexing_functions.py Outdated Show resolved Hide resolved

spec/API_specification/signatures/indexing_functions.py Outdated Show resolved Hide resolved

kgryte added 6 commits September 18, 2022 22:55

Merge branch 'main' of https://github.com/pydata-apis/array-api into …

187af16

…add-take

Rephrase description and add guidance concerning when axis is required

915c026

Fix module

ff155cc

Fix missing import

0024b6a

Remove support for providing an integer for the indices kwarg

b14151a

Merge branch 'main' of https://github.com/pydata-apis/array-api into …

0c45cfd

…add-take

rgommers approved these changes Nov 16, 2022

View reviewed changes

rgommers merged commit d5d7c6c into main Nov 16, 2022

rgommers deleted the add-take branch November 16, 2022 20:37

rgommers mentioned this pull request Jul 5, 2023

Implement torch.take with axis argument data-apis/array-api-compat#34

Closed

kgryte mentioned this pull request Jul 5, 2023

Fix optional keyword argument in take signature #644

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `take` specification for returning elements of an array along a specified axis #416

Add `take` specification for returning elements of an array along a specified axis #416

kgryte commented Apr 7, 2022 •

edited

Loading

honno commented Apr 7, 2022

asmeurer commented Apr 7, 2022

shoyer commented Apr 7, 2022

kgryte commented Apr 18, 2022 •

edited

Loading

rgommers commented Jul 6, 2022 •

edited

Loading

rgommers commented Jul 7, 2022

rgommers commented Jul 7, 2022

rgommers left a comment

seberg commented Jul 7, 2022

rgommers commented Jul 7, 2022

kgryte commented Nov 3, 2022

kgryte commented Nov 3, 2022

rgommers commented Nov 16, 2022

Add take specification for returning elements of an array along a specified axis #416

Add take specification for returning elements of an array along a specified axis #416

Conversation

kgryte commented Apr 7, 2022 • edited Loading

Questions

honno commented Apr 7, 2022

asmeurer commented Apr 7, 2022

shoyer commented Apr 7, 2022

kgryte commented Apr 18, 2022 • edited Loading

rgommers commented Jul 6, 2022 • edited Loading

rgommers commented Jul 7, 2022

rgommers commented Jul 7, 2022

rgommers left a comment

Choose a reason for hiding this comment

seberg commented Jul 7, 2022

rgommers commented Jul 7, 2022

kgryte commented Nov 3, 2022

kgryte commented Nov 3, 2022

rgommers commented Nov 16, 2022

Add `take` specification for returning elements of an array along a specified axis #416

Add `take` specification for returning elements of an array along a specified axis #416

kgryte commented Apr 7, 2022 •

edited

Loading

kgryte commented Apr 18, 2022 •

edited

Loading

rgommers commented Jul 6, 2022 •

edited

Loading