Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please expose __cuda_array_interface__ via the xarray.__array__() function if present #6847

Open
MurrayData opened this issue Jul 29, 2022 · 5 comments

Comments

@MurrayData
Copy link

MurrayData commented Jul 29, 2022

Is your feature request related to a problem?

When using an array type with GPU support, such as CuPy arrays, Numba device arrays or Numba mapped arrays (shared), __cuda_array_interface__ is not exposed by the xarray.__array() function.

I'm using large NetCDF files which I wish to process against reference dataframes and use GPU acceleration to do this.

For example, Numba mapped array:

>>> points = np.random.randn(2, 3)
>>> map_points = nb.cuda.mapped_array_like(points)
>>> map_points.__array_interface__
{'data': (140399865758208, False),
 'strides': None,
 'descr': [('', '<f8')],
 'typestr': '<f8',
 'shape': (2, 3),
 'version': 3}

>>> map_points.__cuda_array_interface__
{'shape': (2, 3),
 'strides': None,
 'data': (140399865758208, False),
 'typestr': '<f8',
 'stream': None,
 'version': 3}

When copied to xarray:

>>> data = xr.DataArray(map_points, dims=("x", "y"), coords={"x": [10, 20]})
>>> data
xarray.DataArray x: 2y: 3
array([[0., 0., 0.],
       [0., 0., 0.]])
Coordinates:
x (x) int64 10 20
Attributes: (0)

Array interface confirms same address for the base (CPU) array as above, i.e. Zero Copy

>>> data.__array__().__array_interface__
{'data': (140399865758208, False),
 'strides': None,
 'descr': [('', '<f8')],
 'typestr': '<f8',
 'shape': (2, 3),
 'version': 3}

However the __cuda_array_interface__ is not exposed

>>> data.__array__().__cuda_array_interface__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [23], in <cell line: 1>()
----> 1 data.__array__().__cuda_array_interface__

AttributeError: 'numpy.ndarray' object has no attribute '__cuda_array_interface__'

Describe the solution you'd like

Expose __cuda_array_interface__ via the xarray.__array() function so it is available to CuPy and Numba CUDA functions.

Describe alternatives you've considered

As a workaround, I'm not using xarray for NetCDF files. Instead I'm converting them into an dictionary of arrays which provides me with the GPU interfaces.

Additional context

No response

@dcherian
Copy link
Contributor

dcherian commented Jul 29, 2022

Do you have to go through __array__ (see #6845) or would accessing the underlying array using DataArray.data work for you?

We could also add some properties under the DataArray.cupy namespace for convenience (See https://github.com/xarray-contrib/cupy-xarray)

It'd be good to see a minimal example showcasing the operations you'd like to work. This would also make a great contribution to https://cupy-xarray.readthedocs.io/

@dcherian

This comment was marked as off-topic.

@jacobtomlinson
Copy link
Contributor

I think ideally you could pass a DataArray to something that takes GPU arrays (like Numba kernels). If that doesn't make sense then perhaps passing the DataArray.data would be simpler. @rabernat made some interesting points on Twitter around not doing this though.

@MurrayData
Copy link
Author

I think ideally you could pass a DataArray to something that takes GPU arrays (like Numba kernels). If that doesn't make sense then perhaps passing the DataArray.data would be simpler. @rabernat made some interesting points on Twitter around not doing this though.

My thoughts were similar until I read @rabernat's comments as well and I see his point.

@MurrayData
Copy link
Author

Do you have to go through __array__ (see #6845) or would accessing the underlying array using DataArray.data work for you?

We could also add some properties under the DataArray.cupy namespace for convenience (See https://github.com/xarray-contrib/cupy-xarray)

It'd be good to see a minimal example showcasing the operations you'd like to work. This would also make a great contribution to https://cupy-xarray.readthedocs.io/

Yes, I'll share a workflow example shortly. Ideally I'd like it to be agnostic, rather than CuPy, for example using Numba mapped arrays for arrays which are larger then GPU RAM. I have several which are a lot larger then the 48GB on the RTX8000 GPUs I'm using for this. I have a mix of a dataframe with points of interest, spatial references tables for coordinate transformation (similar to NTv2 grids), and then use interpolation to estimate characteristics from data in a NetCDF file around the local points of interest. At present I have a workaround where I convert the NetCDF file into a dictionary of arrays which is pickled. The image below shows the mapped output of this process on UK rainfall in 2019 (Data source: UK Met Office)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants