ROCm gfx10 and gfx11 not supported #1429

Madouura · 2023-10-15T05:11:54Z

Looks like only gfx9 and below is supported here ATM.
gfx1010, gfx1012, gfx1030, gfx1100, gfx1101, and gfx1102 are affected.
Here's the error log. I'll link you to the build environment once I submit a PR to nixpkgs.

ginkgo-hpc-hip> In file included from /build/source/build/hip/preconditioner/jacobi_advanced_apply_instantiate.64.hip.cpp:66:
ginkgo-hpc-hip> /build/source/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernel.hpp.inc:84:9: error: no matching function for call to 'tiled_partition'
ginkgo-hpc-hip>         group::tiled_partition<subwarp_size>(group::this_thread_block());
ginkgo-hpc-hip>         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ginkgo-hpc-hip> /build/source/build/hip/preconditioner/jacobi_advanced_apply_instantiate.64.hip.cpp:98:21: note: in instantiation of function template specialization 'gko::kernels::hip::jacobi::kernel::advanced_adaptive_apply<64, 64, 4, float, int>' requested here
ginkgo-hpc-hip>             kernel::advanced_adaptive_apply<max_block_size, subwarp_size,
ginkgo-hpc-hip>                     ^
ginkgo-hpc-hip> /build/source/hip/components/cooperative_groups.hip.hpp:392:5: note: candidate template ignored: requirement '64UL <= kernels::hip::config::warp_size' was not satisfied [with Size = 64]
ginkgo-hpc-hip>     tiled_partition(const Group&)
ginkgo-hpc-hip>     ^
ginkgo-hpc-hip> In file included from /build/source/build/hip/preconditioner/jacobi_advanced_apply_instantiate.64.hip.cpp:66:
ginkgo-hpc-hip> /build/source/common/cuda_hip/preconditioner/jacobi_advanced_apply_kernel.hpp.inc:50:9: error: no matching function for call to 'tiled_partition'
ginkgo-hpc-hip>         group::tiled_partition<subwarp_size>(group::this_thread_block());
ginkgo-hpc-hip>         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ginkgo-hpc-hip> /build/source/build/hip/preconditioner/jacobi_advanced_apply_instantiate.64.hip.cpp:105:21: note: in instantiation of function template specialization 'gko::kernels::hip::jacobi::kernel::advanced_apply<64, 64, 4, float, int>' requested here
ginkgo-hpc-hip>             kernel::advanced_apply<max_block_size, subwarp_size,
ginkgo-hpc-hip>

upsj · 2023-10-15T05:20:13Z

How did you determine the affected versions? It looks like they still use wavefronts of 64 threads, so there might be something going wrong with the macro we use to determine whether we are compiling for CUDA or ROCm.

Madouura · 2023-10-15T05:21:39Z

Simple. Ran the build like 6 times and added whichever ones I saw were failing in the log to the gpu targets exclude list in my package.

Madouura · 2023-10-15T05:32:26Z

PR: #261155
Relevant files: NixOS/nixpkgs@93464fe

upsj · 2023-10-15T05:32:30Z

Can reproduce this outside nixos

upsj · 2023-10-15T05:46:35Z

First observations: This is the failing command

/opt/rocm-5.6.0/bin/hipcc
-c /home/tribizel/ginkgo/build-tmp/hip/preconditioner/jacobi_advanced_apply_instantiate.64.hip.cpp
-o /home/tribizel/ginkgo/build-tmp/hip/CMakeFiles/ginkgo_hip.dir/preconditioner/./ginkgo_hip_generated_jacobi_advanced_apply_instantiate.64.hip.cpp.o
-std=c++14
-DGKO_COMPILING_HIP
--amdgpu-target=gfx1010
-fPIC
-fPIC
-O3
-DNDEBUG
-I/opt/rocm-5.6.0/include -I/home/tribizel/ginkgo/build-tmp/hip/.. -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include -I/home/tribizel/ginkgo/build-tmp/include -I/home/tribizel/ginkgo/include -I/home/tribizel/ginkgo -I/usr/local/include -I/home/tribizel/ginkgo/build-tmp/include -I/home/tribizel/ginkgo/include -I/home/tribizel/ginkgo -I/usr/local/include -I/usr/local/include -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include/hiprand -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include/rocrand -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include -I/opt/rocm-5.6.0/include

Removing --amdgpu-target (which is a deprecated flag BTW, we should change it) fixes it. hipcc calls two compilation passes (one host, one device), where the second one includes the generated GPU binary blob. With --amdgpu-target, the first compilation already fails. This might have to do with /opt/rocm-5.6.0/amdgcn/bitcode/oclc_wavefrontsize64_off.bc that is being added for gfx1010.

upsj · 2023-10-15T05:49:48Z

Related: https://github.com/RadeonOpenCompute/llvm-project/blob/7c3a4e693e3ae2ea69f34f9fdbb84dd9889835cb/amd/device-libs/doc/OCML.md#controls

Madouura · 2023-10-15T05:54:04Z

You mean here?

ginkgo/cmake/hip.cmake

Line 201 in 2f3720f

list(APPEND GINKGO_AMD_ARCH_FLAGS --amdgpu-target=${target})

I already replace it with --offload-arch.

# `--amdgpu-target` is deprecated
substituteInPlace cmake/hip.cmake \
  --replace "--amdgpu-target" "--offload-arch"

Madouura · 2023-10-15T06:00:24Z

Removing list(APPEND GINKGO_AMD_ARCH_FLAGS --amdgpu-target=${target}) works, but I'm not sure if that should be done.
We specify those flags for a reason right?

upsj · 2023-10-15T06:01:48Z

Probable fix: https://llvm.org/docs/AMDGPUUsage.html#target-features use -mwavefrontsize64 for those targets. Not sure if it has an effect in targets that support size 64 wavefronts natively.

Madouura · 2023-10-15T06:09:26Z

I'm getting this error when prepending -mwavefrontsize64 to --amdgpu-target.

In file included from /build/source/include/ginkgo/core/base/exception.hpp:41:
ginkgo-hpc-hip> In file included from /build/source/include/ginkgo/core/base/types.hpp:50:
ginkgo-hpc-hip> /nix/store/7y1f77gd62zy52b52ivapa4inkjcb9mq-clr-5.7.0/include/hip/hip_runtime.h:41:2: error: HIP is not supported on the specified GPU ARCH with wavefront size 64
ginkgo-hpc-hip> #error HIP is not supported on the specified GPU ARCH with wavefront size 64
ginkgo-hpc-hip>  ^
ginkgo-hpc-hip> [ 20%] Building CXX object reference/CMakeFiles/ginkgo_reference.dir/solver/cb_gmres_kernels.cpp.o
ginkgo-hpc-hip> 1 warning and 1 error generated when compiling for gfx1010.
ginkgo-hpc-hip> CMake Error at ginkgo_hip_generated_residual_norm_kernels.hip.cpp.o.cmake:200 (message):
ginkgo-hpc-hip>   Error generating file
ginkgo-hpc-hip>   /build/source/build/hip/CMakeFiles/ginkgo_hip.dir/stop/./ginkgo_hip_generated_residual_norm_kernels.hip.cpp.o
ginkgo-hpc-hip> 
ginkgo-hpc-hip> make[2]: *** [hip/CMakeFiles/ginkgo_hip.dir/build.make:399: hip/CMakeFiles/ginkgo_hip.dir/stop/ginkgo_hip_generated_residual_norm_kernels.hip.cpp.o] Error 1
ginkgo-hpc-hip> make[2]: *** Waiting for unfinished jobs....
ginkgo-hpc-hip> [ 21%] Building CXX object reference/CMakeFiles/ginkgo_reference.dir/solver/common_gmres_kernels.cpp.o

#if __HIP_DEVICE_COMPILE__ && !__GFX8__ && !__GFX9__ && __AMDGCN_WAVEFRONT_SIZE == 64
#error HIP is not supported on the specified GPU ARCH with wavefront size 64
#endif

upsj · 2023-10-15T10:09:55Z

Thanks for the report, that's disappointing. It should be possible to compile Ginkgo with wavefront size 32, but unfortunately we don't have any gfx10/11 GPUs available right now to test this, and I'm not comfortable claiming support for it without checking that the tests run correctly. Some dependencies (rocBLAS, rocSPARSE, rocFFT) might also not work on them? Though I haven't checked yet.

Madouura · 2023-10-15T10:14:52Z

If you can give me a patch set I can give you at least a preliminary indicator of whether it works or not.
I'm on gfx1030. Theoretically if it works for that it should work for 1010, 1012, 1100, 1101, and 1102.

upsj · 2023-10-15T10:24:42Z

I just noticed we are doing this almost correctly already. As a first check, you could try

diff --git a/hip/CMakeLists.txt b/hip/CMakeLists.txt
index 21b573b6cd..6740a83d99 100644
--- a/hip/CMakeLists.txt
+++ b/hip/CMakeLists.txt
@@ -69,7 +69,3 @@ endif()
 
-if (GINKGO_HIP_PLATFORM MATCHES "${HIP_PLATFORM_NVIDIA_REGEX}")
-    set(GKO_HIP_JACOBI_MAX_BLOCK_SIZE 32)
-else()
-    set(GKO_HIP_JACOBI_MAX_BLOCK_SIZE 64)
-endif()
+set(GKO_HIP_JACOBI_MAX_BLOCK_SIZE 32)
 if(GINKGO_JACOBI_FULL_OPTIMIZATIONS)

and check which tests run correctly. I only see a handful that will definitely fail.

Madouura · 2023-10-15T13:16:25Z

Looks like that fixed pretty much everything.
Going to try that test with only one GPU target and see if everything passes.
https://github.com/NixOS/nixpkgs/blob/59dabc593819109a57df9bf03731ecf217beb9dc/pkgs/development/libraries/ginkgo-hpc/default.nix#L175-L188
ginkgo-test.log

upsj · 2023-10-15T13:23:54Z

Note that this is not a complete fix, it's is intended only a workaround for consumer-grade GPUs. Though I'm not sure if we actually do need larger than 32 Jacobi blocks.

Madouura · 2023-10-15T13:32:32Z

ginkgo-test.log
Effectively the same test output using only gfx1030.

upsj · 2023-10-15T13:53:12Z

That's how it looks when no kernels are being run in CUDA/we compiled for the wrong architecture. We don't seem to be catching errors of that kind explicitly. I guess it's safe to say that for now, we can only support server-grade GPUs (Radeon VII and Instinct series)

yhmtsai · 2023-10-15T14:13:44Z

Could you also try appending the following change in hip/base/config.hip.hpp?

diff --git a/hip/base/config.hip.hpp b/hip/base/config.hip.hpp
index 8a53037ffa..ea33ce5f09 100644
--- a/hip/base/config.hip.hpp
+++ b/hip/base/config.hip.hpp
@@ -56,7 +56,7 @@ struct config {
      * The type containing a bitmask over all lanes of a warp.
      */
 #if GINKGO_HIP_PLATFORM_HCC
-    using lane_mask_type = uint64;
+    using lane_mask_type = uint32;
 #else  // GINKGO_HIP_PLATFORM_NVCC
     using lane_mask_type = uint32;
 #endif
@@ -66,7 +66,7 @@ struct config {
      * `device_functions.h`.
      */
 #if GINKGO_HIP_PLATFORM_HCC
-    static constexpr uint32 warp_size = warpSize;
+    static constexpr uint32 warp_size = 32;
 #else  // GINKGO_HIP_PLATFORM_NVCC
     static constexpr uint32 warp_size = 32;
 #endif

warpSize to 32 might not be neccessary because AMD should use 32 already for those archs.

upsj · 2023-10-15T14:33:53Z

@yhmtsai that likely won't make a difference, since uint32 and uint64 are inter-convertible. The use of warpSize is why this issue came up in the first place, otherwise we wouldn't be trying to instantiate the 64 blocksize Jacobi kernels.

yhmtsai · 2023-10-15T14:47:32Z

The voting function will consider the all bits of mask type.

upsj · 2023-10-15T15:01:35Z

The result of __ballot gets cast to uint64 from uint32, which means the upper 32 bits are zero

yhmtsai · 2023-10-15T15:17:56Z

Yes, but it only works for checking any but does not work for all because some of the bits will be zero.
The all will always be false. We did safer operations on SYCL backend but in HIP we assume the mask type bits == the wavefront size.
I thinkg ctz or something related function also rely on the length of mask type.
But of cousre, there are something I might miss because I do not go into detail for checking.

upsj · 2023-10-15T15:34:54Z

I think we're safe there:

        if (Size == config::warp_size) {
            return __all(predicate);
        } else {
            return (__ballot(predicate) & data_.mask) == data_.mask;
        }

but you are right, many of the failing tests use either cooperative groups or rocBLAS.

Madouura · 2023-10-16T08:50:21Z

Could you also try appending the following change in hip/base/config.hip.hpp?

diff --git a/hip/base/config.hip.hpp b/hip/base/config.hip.hpp
index 8a53037ffa..ea33ce5f09 100644
--- a/hip/base/config.hip.hpp
+++ b/hip/base/config.hip.hpp
@@ -56,7 +56,7 @@ struct config {
      * The type containing a bitmask over all lanes of a warp.
      */
 #if GINKGO_HIP_PLATFORM_HCC
-    using lane_mask_type = uint64;
+    using lane_mask_type = uint32;
 #else  // GINKGO_HIP_PLATFORM_NVCC
     using lane_mask_type = uint32;
 #endif
@@ -66,7 +66,7 @@ struct config {
      * `device_functions.h`.
      */
 #if GINKGO_HIP_PLATFORM_HCC
-    static constexpr uint32 warp_size = warpSize;
+    static constexpr uint32 warp_size = 32;
 #else  // GINKGO_HIP_PLATFORM_NVCC
     static constexpr uint32 warp_size = 32;
 #endif

warpSize to 32 might not be neccessary because AMD should use 32 already for those archs.

Sorry for the late reply.
With these changes more tests are passing.
ginkgo-test.log

upsj · 2023-10-16T08:53:47Z

That is encouraging. Based on the test output, I would be so bold to put the blame for all non-MPI failures on rocBLAS, which we use for our GEMM operations, which we use in all tests for the system matrix. Can you try running the rocBLAS test suite?

Madouura · 2023-10-16T09:13:37Z

I assume you're talking about gtest in https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/develop/clients.
It will take me a bit of time since I'll need to rearrange the derivation, but sure.

upsj · 2023-10-16T09:16:28Z

I'll also have a gfx1102 available soon, so if this is too much effort, I can also take over ;)

Madouura · 2023-10-16T09:25:42Z

I need to rearrange a lot of the rocm derivations to use a layout more like ginkgo in the linked PR.
Basically, I'll get to it. Just might take a while since rocBLAS is one of the later things in the toolchain.
By all means though, if you can get to it faster please do.

upsj · 2023-10-16T09:28:49Z

I would just build rocBLAS separately outside of nix, since I would assume if it's an issue, it's less likely to be a configuration issue, and more likely AMD not testing rocBLAS extensively on all of their GPUs.

Madouura · 2023-10-16T09:42:56Z

I would just build rocBLAS separately outside of nix, since I would assume if it's an issue, it's less likely to be a configuration issue

While likely not a nix issue, I'm not going to discount it's possibility. I can probably spin up a VM or something and do some A-B testing with each having a rocBLAS installation to make sure.

and more likely AMD not testing rocBLAS extensively on all of their GPUs

I've packaged most of their software, and I can tell you that's very likely.

upsj mentioned this issue Dec 19, 2023

Unify more CUDA and HIP files into common #1516

Merged

greole mentioned this issue Jan 19, 2024

installation errors related to does not support platform specification, but platform hpsim/OGL#103

Open

upsj mentioned this issue May 17, 2024

ROCm / HIP PATHs related ginkgo installation problems on Ubuntu 22.04 #1614

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm gfx10 and gfx11 not supported #1429

ROCm gfx10 and gfx11 not supported #1429

Madouura commented Oct 15, 2023

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023 •

edited

Loading

Madouura commented Oct 15, 2023

upsj commented Oct 15, 2023

upsj commented Oct 15, 2023 •

edited

Loading

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023 •

edited

Loading

Madouura commented Oct 15, 2023 •

edited

Loading

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023 •

edited

Loading

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023 •

edited

Loading

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023

upsj commented Oct 15, 2023

yhmtsai commented Oct 15, 2023

upsj commented Oct 15, 2023

yhmtsai commented Oct 15, 2023

upsj commented Oct 15, 2023

yhmtsai commented Oct 15, 2023 •

edited

Loading

upsj commented Oct 15, 2023

Madouura commented Oct 16, 2023

upsj commented Oct 16, 2023

Madouura commented Oct 16, 2023

upsj commented Oct 16, 2023

Madouura commented Oct 16, 2023

upsj commented Oct 16, 2023 •

edited

Loading

Madouura commented Oct 16, 2023

ROCm gfx10 and gfx11 not supported #1429

ROCm gfx10 and gfx11 not supported #1429

Comments

Madouura commented Oct 15, 2023

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023 • edited Loading

Madouura commented Oct 15, 2023

upsj commented Oct 15, 2023

upsj commented Oct 15, 2023 • edited Loading

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023 • edited Loading

Madouura commented Oct 15, 2023 • edited Loading

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023 • edited Loading

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023 • edited Loading

upsj commented Oct 15, 2023

Madouura commented Oct 15, 2023

upsj commented Oct 15, 2023

yhmtsai commented Oct 15, 2023

upsj commented Oct 15, 2023

yhmtsai commented Oct 15, 2023

upsj commented Oct 15, 2023

yhmtsai commented Oct 15, 2023 • edited Loading

upsj commented Oct 15, 2023

Madouura commented Oct 16, 2023

upsj commented Oct 16, 2023

Madouura commented Oct 16, 2023

upsj commented Oct 16, 2023

Madouura commented Oct 16, 2023

upsj commented Oct 16, 2023 • edited Loading

Madouura commented Oct 16, 2023

Madouura commented Oct 15, 2023 •

edited

Loading

upsj commented Oct 15, 2023 •

edited

Loading

Madouura commented Oct 15, 2023 •

edited

Loading

Madouura commented Oct 15, 2023 •

edited

Loading

Madouura commented Oct 15, 2023 •

edited

Loading

Madouura commented Oct 15, 2023 •

edited

Loading

yhmtsai commented Oct 15, 2023 •

edited

Loading

upsj commented Oct 16, 2023 •

edited

Loading