Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature[cartesian]: Add support for HIP/ROCm #1278

Merged
merged 9 commits into from
Aug 3, 2023
Merged

Conversation

stubbiali
Copy link
Contributor

@stubbiali stubbiali commented Jun 28, 2023

Adaptations to generate GridTools/DaCe code targeting AMD GPUs. Complemented by GridTools/gridtools#1759.

Copy link
Contributor

@egparedes egparedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a review, just a couple of suggestions

@@ -243,6 +245,18 @@ def allocate_gpu(
if device_field.ndim > 0:
device_field = device_field[tuple(slice(0, s, None) for s in shape)]

if gt_config.GT4PY_USE_HIP:
# HIP/ROCm lack support for __cuda_array_interface__
device_field.__hip_array_interface__ = {
Copy link
Contributor

@egparedes egparedes Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be called __cuda_array_interface__ to avoid any changes in the gridtools bindings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, unless things have changed, it seems that the community just always uses the cuda_array_interface name even for AMD device buffers. (e.g. most importantly cupy)

Copy link
Contributor Author

@stubbiali stubbiali Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree that __cuda_array_interface__ should be the way to go, but that's not feasible at the moment. CuPy does not allow to either read or overload __cuda_array_interface__. See this issue and links therein: cupy/cupy#7431. This said, my solution is clearly a hack. DLPack seems the solution the community is converging to, but I must said I'm not very familiar with it, so I need to dig into it first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of cupy/cupy#7431 is that this is some particular case where you would want to wrap cpp side allocated data in a cupy buffer. Have you tried and ran into the error? Last time we ran AMD a while back we used __cuda_array_interface__ happily to pass cupy arrays to stencils.

Copy link
Contributor Author

@stubbiali stubbiali Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tried and the GridTools bindings threw the exception AttributeError: HIP/ROCm does not support cuda array interface. Same happens if you try to read __cuda_array_interface__ from within Python. But I cannot exclude I'm missing something important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for DLPack.
I've seen accelerated adoption lately. Is it on the radar @egparedes to add it ?
Feels not urgent, but it's something we could contribute later down the road

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this error is there, is it safe to just work around it in this way? There must be a reason why it is not supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm not an expert here, but it's very likely that this workaround only works in simple scenarios, as ours (e.g. using a single stream).

Copy link
Contributor

@egparedes egparedes Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for DLPack. I've seen accelerated adoption lately. Is it on the radar @egparedes to add it ? Feels not urgent, but it's something we could contribute later down the road

@FlorianDeconinck Yes, we're looking into supporting DLPack. There is no timeline yet, but hopefully not far in the future.

@@ -32,6 +32,8 @@

CUDA_HOST_CXX: Optional[str] = os.environ.get("CUDA_HOST_CXX", None)

GT4PY_USE_HIP: int = int(os.environ.get("GT4PY_USE_HIP", 0))
Copy link
Contributor

@egparedes egparedes Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a way to autodetect if cupy has been compiled with HIP. This is a naive suggestion to do it using dlpack, but probably there is a simpler way using some esoteric cupy API endpoint (maybe cp.show_config()?).

Suggested change
GT4PY_USE_HIP: int = int(os.environ.get("GT4PY_USE_HIP", 0))
if "GT4PY_USE_HIP" in os.environ:
GT4PY_USE_HIP: bool = bool(int(os.environ["GT4PY_USE_HIP"]))
else:
# Autodetect cupy with ROCm/HIP using DLPack device type
# Reference: https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack_device__.html#array_api.array.__dlpack_device__
try:
import cupy as _cp
GT4PY_USE_HIP = _cp.empty((1,)).__dlpack_device__()[0] == 10
del _cp
except Exception:
GT4PY_USE_HIP = False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I partially followed your suggestion using cupy.cuda.get_hipcc_path().

Copy link
Contributor

@gronerl gronerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small comment.
I wonder if there is a way at CSCS to run something on AMD as part of CI.

src/gt4py/cartesian/backend/pyext_builder.py Outdated Show resolved Hide resolved
@FlorianDeconinck
Copy link
Contributor

If this needs an extra set of tests NASA/NOAA can wiggle some AMD systems, hit me up.

(Great to see this!)

havogt added a commit to GridTools/gridtools that referenced this pull request Jul 18, 2023
Light-weight PR to make the Python bindings work with AMD GPU buffers. Complementary to GridTools/gt4py#1278.

---------

Co-authored-by: Hannes Vogt <hannes@havogt.de>
@havogt
Copy link
Contributor

havogt commented Jul 18, 2023

cscs-ci run

Copy link
Contributor

@havogt havogt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvcc needs -isystem=

src/gt4py/cartesian/backend/pyext_builder.py Outdated Show resolved Hide resolved
src/gt4py/cartesian/backend/pyext_builder.py Outdated Show resolved Hide resolved
@havogt
Copy link
Contributor

havogt commented Jul 19, 2023

cscs-ci run

Comment on lines +100 to +101
"-isystem{}".format(gt_include_path),
"-isystem{}".format(gt_config.build_settings["boost_include_path"]),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hipcc (clang) requires -isystem [dir].

@havogt
Copy link
Contributor

havogt commented Jul 26, 2023

cscs-ci run

@havogt havogt merged commit e7e403f into GridTools:main Aug 3, 2023
26 checks passed
havogt added a commit to GridTools/gridtools that referenced this pull request Aug 16, 2023
Light-weight PR to make the Python bindings work with AMD GPU buffers. Complementary to GridTools/gt4py#1278.

---------

Co-authored-by: Hannes Vogt <hannes@havogt.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants