7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880

Mushoz · 2022-12-24T10:58:17Z

Issue Type

Bug

Tensorflow Version

Tensorflow-rocm v2.11.0-3797-gfe65ef3bbcf 2.11.0

rocm Version

5.4.1

Custom Code

Yes

OS Platform and Distribution

Archlinux: Kernel 6.1.1

Python version

3.10

GPU model and memory

7900 XTX 24GB

Current Behaviour?

I am not entirely sure whether this is an upstream (ROCM) issue, or with Tensorflow-rocm specifically, so I am reporting it to both repo's. A toy example refuses to run and dumps core. I would have expected it to train successfully.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

features = np.random.randn(10000,25)
targets = np.random.randn(10000)

model = tf.keras.Sequential([
     tf.keras.layers.Dense(1)
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
              loss=tf.keras.losses.MeanSquaredError())

model.fit(x=features, y=targets)

Relevant log output

[jaap@Jaap-Desktop code]$ pipenv run python testNN.py
2022-12-24 11:18:37.178811: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
python: /build/hsa-rocr/src/ROCR-Runtime-rocm-5.4.1/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char*, AssembleTarget, void*&, size_t&) const: Assertion `code_buf != NULL && "Code buffer allocation failed"' failed.

The text was updated successfully, but these errors were encountered:

sofiageo · 2022-12-24T11:27:00Z

It's probably a packaging issue for Arch, try with opencl-amd and opencl-amd-dev from AUR and see if it makes a difference.

p.s damn that GPU must be a beast 💯

Mushoz · 2022-12-24T12:03:54Z

Unfortunately that doesn't seem to work. First it tries to remove the conflicting packages:

:: opencl-amd and rocm-opencl-runtime are in conflict. Remove rocm-opencl-runtime? [y/N] y
:: opencl-amd and hip-runtime-amd are in conflict (hip). Remove hip-runtime-amd? [y/N] y

However, answering Y to both question still results in a failure to install:

error: failed to commit transaction (conflicting files)
opencl-amd: /opt/rocm exists in filesystem
Errors occurred, no packages were upgraded.
 -> exit status 1

Are you sure these packages are even required though? From what I understand, tensorflow-rocm does NOT use opencl at all. As a matter of fact, I upgraded from a 6900XT which was able to run tensorflow-rocm with the exact same packages I have currently installed just fine.

sofiageo · 2022-12-24T12:34:08Z

The package name is just that for historical reasons, nothing to do with OpenCL. The reason you get these conflicts errors is because it's not properly handling the conflicts. It's something I will try to fix soon but it's not there yet. So you have to manually remove any rocm-arch package yourself if you want to try opencl-amd.

p.s I don't want to spam the rocm issue tracker with arch packaging comments, so if you are still interested to try it feel free to comment on the AUR page and we can continue the discussion there.

Mushoz · 2022-12-24T14:23:16Z

I just uninstalled all previous rocm packages and went with the opencl-amd + opencl-amd-dev, but that's just making the example run on the CPU rather than the GPU. So unfortunately it does not fix the issue at hand. Any ideas? :)

sofiageo · 2022-12-24T15:44:44Z

I guess it's because your GPU is not supported yet in ROCm. I ran your example with my 5700 XT and it's working fine (although it didn't complete in 10 minutes and I had to cancel it). Maybe you can try to HSA_OVERRIDE_GFX_VERSION=10.3.0 python sample.py or something similar.

Mushoz · 2022-12-25T19:51:06Z

That just makes it crash with an out of memory error, which is bogus for such a small example with 24GB memory:

[jaap@Jaap-Desktop code]$ HSA_OVERRIDE_GFX_VERSION=10.3.0 pipenv run python testNN.py
2022-12-25 20:49:47.446031: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-25 20:49:48.428818: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.466946: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.466999: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.467209: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-25 20:49:48.468937: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469011: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469044: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469138: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469176: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469209: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24060 MB memory:  -> device: 0, name: AMD Radeon Graphics, pci bus id: 0000:2d:00.0
2022-12-25 20:49:48.492206: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.492218: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.492221: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.512792: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.512801: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.512804: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.512811: I tensorflow/compiler/xla/stream_executor/stream.cc:1038] [stream=0x55edc5775fb0,impl=0x55edc5775770] did not wait for [stream=0x55edc5794720,impl=0x55edc5775e20]
2022-12-25 20:49:48.512815: I tensorflow/compiler/xla/stream_executor/stream.cc:1038] [stream=0x55edc5794720,impl=0x55edc5775e20] did not wait for [stream=0x55edc5775fb0,impl=0x55edc5775770]
2022-12-25 20:49:48.533248: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.533265: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.533270: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.553530: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.553539: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.553543: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.573939: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.573949: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.573953: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.582885: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-25 20:49:48.668485: I tensorflow/compiler/xla/stream_executor/stream.cc:1038] [stream=0x55edc57947d0,impl=0x55edc5794920] did not wait for [stream=0x55edc5775fb0,impl=0x55edc5775770]

Syntax3rror404 · 2022-12-29T16:00:51Z

7900xtx with rocm would be awsome!!!! @Mushoz do you get it working now? I have the same usecase

jannesklee · 2022-12-29T16:21:38Z

The problem also occurs with 7900xt. Also with arch linux rocm packages from aur. Is there anything that can be done in order to make it run?

Edit: I reproduced the same output with the samples/0_Intro/bit_extract in https://github.com/ROCm-Developer-Tools/HIP.git as an easier minimal example.

Syntax3rror404 · 2022-12-29T16:23:05Z

So this means this problem are only exits on arch linux? And not on ubuntu or debian?

jannesklee · 2022-12-29T16:52:48Z

Installing opencl-amd and opencl-amd-dev seems to work for me.

@Mushoz Did you install llvm with version >= 15 (arch still has 14)

You can also have a look at:
https://www.phoronix.com/review/rx7900xt-rx7900xtx-linux
https://www.reddit.com/r/linux_gaming/comments/zk0462/amd_radeon_rx_7900_xtx_rx_7900_xt_linux_support/

There it states what is needed:

llvm >= 15
new mesa version compiled against llvm >= 15
the firmware needed to be added manually, but I think it is now already included (at least in arch)

Mushoz · 2022-12-29T17:36:49Z

@jannesklee I am running llvm-minimal-git. Everything is working as it should game-wise. It's just that rocm is broken. Are you able to run the example in my first post just fine? And you are certain it's running on the GPU and not the CPU? Could you run the following python script and show the output?

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

jannesklee · 2022-12-29T17:47:30Z

I got the same error when testing the minimal example shown above, and other samples and it vanished when I used the other packages. When I check the usage with nvtop it shows me that the dedicated graphic card is in use.

Maybe the llvm-minimal-git version is not enough. At https://aur.archlinux.org/pkgbase/llvm-git Lone_Wolf states that llvm-minimal-git focuses on providing stuff needed for AUR mesa-git. Doesn't support cross-compiling or any bindings for external stuff like ocaml & python.

Unfortunately I am currently not capable to install tensorflow, because I get compilation errors, but this is something else I guess. I try to make it run but without success.

Mushoz · 2022-12-29T17:53:29Z

@jannesklee no need to compile tensorflow. You can install tensorflow-rocm via pip or pipenv if you want to keep it contained within its own virtual environment. Would you mind running my previously mentioned script?

jannesklee · 2022-12-29T18:22:50Z

My output is. I do not completely understand it to be honest..

python samply.py 
2022-12-29 19:22:12.450100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-29 19:22:12.510354: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-12-29 19:22:13.488612: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.488649: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521375: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521408: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521421: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1990] Ignoring visible gpu device (device: 0, name: AMD Radeon Graphics, pci bus id: 0000:16:00.0) with AMDGPU version : gfx1100. The supported AMDGPU versions are gfx1030, gfx900, gfx906, gfx908, gfx90a.
2022-12-29 19:22:13.521448: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1990] Ignoring visible gpu device (device: 1, name: AMD Radeon Graphics, pci bus id: 0000:38:00.0) with AMDGPU version : gfx1036. The supported AMDGPU versions are gfx1030, gfx900, gfx906, gfx908, gfx90a.
2022-12-29 19:22:13.521638: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-29 19:22:13.531205: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.531933: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.532580: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.534466: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.534728: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.535002: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.537283: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.538129: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.540706: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.541347: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.546382: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.546865: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.551241: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.551819: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.552166: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.552624: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.555412: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.555920: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.556342: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.556773: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.557366: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.558349: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.558775: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.559037: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.562307: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.565317: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.567121: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.579977: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.580617: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.581283: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.581875: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.583109: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.583446: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.583800: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.584342: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.584655: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.683227: E tensorflow/core/framework/node_def_util.cc:675] NodeDef mentions attribute grad_a which is not in the op definition: Op<name=_MklMatMul; signature=a:T, b:T -> product:T; attr=transpose_a:bool,default=false; attr=transpose_b:bool,default=false; attr=T:type,allowed=[DT_BFLOAT16, DT_FLOAT]> This may be expected if your graph generating binary is newer  than this binary. Unknown attributes will be ignored. NodeDef: {{node gradient_tape/sequential/dense/MatMul/MatMul}}
2022-12-29 19:22:13.684957: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
190/313 [=================>............] - ETA: 0s - loss: 2.4990 2022-12-29 19:22:13.768593: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.768931: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.769222: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
313/313 [==============================] - 0s 257us/step - loss: 2.1766

saadrahim · 2022-12-29T18:24:39Z

Support for this GPU is not enabled on ROCm 5.4.1. Please await the 5.5.0 release announcement to check for support.

Syntax3rror404 · 2022-12-29T18:34:38Z

When we can expect a release of 5.5.0 are there any date scheduled?

Mushoz · 2022-12-29T20:10:26Z

@jannesklee I have the same output. Unfortunately it specifically states that it is ignoring the GPU because it is unsupported.

@saadrahim when can we expect 5.5.0 to release? CUDA is so much easier in this regard. It just works. In order for ROCM to be able to compete with CUDA it really has to step up in terms of communication so that users can rely on ROCM as they can on CUDA

cgmb · 2023-01-03T22:41:55Z

I'm a bit surprised that you're having trouble with ROCm 5.4.1 on the 7900 XTX, as that architecture is gfx1100 and most of the AMD-provided binaries for ROCm 5.4.1 contain gfx1100 code objects. It's not listed as officially supported in the GPU support table for ROCm 5.4, but I would have expected it would mostly work anyway. Is this problem specific to Tensorflow? e.g., do other libraries packaged by Arch work? A quick check might be to build and run Arch's test.cpp for rocrand.

I guess it's because your GPU is not supported yet in ROCm. I ran your example with my 5700 XT and it's working fine (although it didn't complete in 10 minutes and I had to cancel it). Maybe you can try to HSA_OVERRIDE_GFX_VERSION=10.3.0 python sample.py or something similar.

When you set HSA_OVERRIDE_GFX_VERSION=10.3.0, you're telling libhsakmt to pretend that your GPU is Navi 21 (gfx1030). To my knowledge, that works just fine for all the RDNA 2 GPUs, since they all use the same instruction set.

The RDNA 1 instruction sets are similar enough to the RDNA 2 instruction set that sometimes you can successfully run code that was compiled for RDNA 2 on an RDNA 1 GPU (as you are doing with your 5700 XT), however, this is not guaranteed to work. The instruction sets are not identical and if the code you're running happens to use an RDNA 2 instruction that worked differently in RDNA 1 (or doesn't exist at all in RDNA 1), then your program may not function correctly.

Similarly, the RDNA 3 instruction sets are different from the RDNA 2 instruction set. If you try to run code compiled for RDNA 2 on an RDNA 3 GPU using HSA_OVERRIDE_GFX_VERSION, the result may not work correctly.

jannesklee · 2023-01-04T11:25:04Z

My assumption is also that it is a problem from tensorflow side. I tested above the samples from https://github.com/ROCm-Developer-Tools/HIP

Example bit_extract:

    make
    ./bit_extract

gives me

    info: running on device #0 AMD Radeon Graphics
    info: allocate host mem (  7.63 MB)
    info: allocate device mem (  7.63 MB)
    info: copy Host2Device
    info: launch 'bit_extract_kernel' 
    info: copy Device2Host
    info: check result
    PASSED!

I can also see some activity with nvtop, but unfortunately I do not know exactly how to give more details here.

Regarding your example I unfortunately get a core dump, when running ./test.sh:

In file included from test.cpp:1:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand.hpp:35:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand_kernel.h:54:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand_kernel_hcc.h:37:
In file included from /opt/rocm-5.4.1/include/rocrand/rocrand_kernel.h:28:
/opt/rocm-5.4.1/include/rocrand/rocrand_common.h:74:6: warning: "Disabled inline asm, because the build target does not support it." [-W#warnings]
    #warning "Disabled inline asm, because the build target does not support it."
     ^
1 warning generated when compiling for gfx1036.
In file included from test.cpp:1:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand.hpp:35:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand_kernel.h:54:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand_kernel_hcc.h:37:
In file included from /opt/rocm-5.4.1/include/rocrand/rocrand_kernel.h:28:
/opt/rocm-5.4.1/include/rocrand/rocrand_common.h:74:6: warning: "Disabled inline asm, because the build target does not support it." [-W#warnings]
    #warning "Disabled inline asm, because the build target does not support it."
     ^
1 warning generated when compiling for gfx1100.
./test.sh: line 5:  7225 Segmentation fault      (core dumped) "$OUT"/test

Mushoz · 2023-01-04T11:33:03Z

@jannesklee I am not so sure. @saadrahim Specifically stated that ROCM 5.5.0 is required for these cards to run tensorflow. I am also not surprised you are able to run that HIP example. There is some preliminary support for the 7900 series, given that Blender can also use the HIP backend just fine: https://www.phoronix.com/review/rx7900-blender-opencl

That has me thinking though. It would be interesting to see if pytorch-rocm is able to run. I can see that there are docker images available, and some tags are using rocm 5.4.1. That would take packaging issues AND tensorflow out of the equation, and would allow us to see if these cards are able to do any machine learning with the current rocm stack. I might try this out tonight.

Docker images in case you want to give it a shot: https://hub.docker.com/r/rocm/pytorch/tags

AndersStendevad · 2023-01-09T17:25:14Z

@jannesklee did it work ?

wsippel · 2023-01-11T12:54:26Z

@Mushoz pytorch-rocm doesn't appear to work, either. Can't find the GPU at all by default and segfaults with HSA_OVERRIDE_GFX_VERSION set.

Mushoz · 2023-01-19T15:53:10Z

@wsippel Ah, I just replied to you on the AUR but only just now realized you are active here as well. A week ago changes for RDNA3 were merged for MIOpen: https://github.com/ROCmSoftwarePlatform/MIOpen/commits/develop

See the 11th of January. Do you reckon we could get it to work by compiling MIOpen from source?

Kardi5 · 2023-01-23T01:57:58Z

@wsippel @Mushoz I can confirm that with some effort a build of pytorch 1.13.1 against AMD RX 7900 XTX with ROCm 5.4.2 works and is functional for my use case of running models.

Rough outline for build is the usage of an Ubuntu (20.04/22.04) Docker image as AMD provides ROCm repos for it and installing all required deps without kernel module. See https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/Dockerfile#L67 basically edit 5.3 to 5.4.2 and run all commands till line 67. I also adapted the amdgpu install command to amdgpu-install -y --usecase=graphics,rocm,lrt,hip,hiplibsdk --no-dkms as some libs were missing for the torch build.

Maybe you can build tensorflow via instructions from https://www.tensorflow.org/install/source and adapting the build command to (in venv):
TF_NEED_ROCM=1 python configure.py

Mushoz · 2023-01-23T07:56:13Z

@Kardi5 Would you mind sharing the final dockerfile that you used? I would love to try and replicate that for Tensorflow. Please leave in all the pytorch specific things as well. I will try to do something similar for Tensorflow.

Kardi5 · 2023-01-23T13:47:20Z

@Mushoz Sure, but I don't have a complete one myself right now. It was more of an interactive trial and error until all builds worked out. I hope to create a complete dockerfile tonight/tomorrow based on the notes I took.

aaronmondal · 2023-01-24T15:56:31Z

This issue also affects Gentoo when installing ROCm via portage. Installing dev-libs/rocm-opencl-runtime, which currently defaults to the older 5.3.3 will cause clinfo to raise the OPs error:

clinfo: /var/tmp/portage/dev-libs/rocr-runtime-5.3.3/work/ROCR-Runtime-rocm-5.3.3/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char 
*, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion `code_buf != NULL && "Code buffer allocation failed"' failed.
Aborted (core dumped)

Im rather certain that this particular error is not related to TensorFlow or MIOpen, as I was able to repro the error above with only a basic installation of the Rocm OpenCL runtime and friends.

The changes from ROCR 5.4.1 to 5.4.2 have not been downstreamed to GitHub yet, making it tricky to reproduce the workaround @Kardi5 proposed for other distros. I guess I'll try with 5.4.1 for now.

Kardi5 · 2023-01-25T00:20:55Z

@Mushoz So far I could only create a rough draft of a complete Dockerfile. Maybe you will find it useful nonetheless.
Current main problem is that my compilation of Magma shows a lot of error'd calls to ROCm as during docker build I can not attach any device like I can during docker build.

Over at https://github.com/pytorch/pytorch/blob/master/.circleci/docker/ubuntu-rocm/Dockerfile there is a more complete example even though much more complex. Their Magma build script (https://github.com/pytorch/pytorch/blob/master/.circleci/docker/common/install_rocm_magma.sh) might be the solution to my troubles but I did not have time to look through it in more detail.

There might still be errors besides Magma building after line WORKDIR /build/magma/build

Draft Torch + Torchvision Dockerfile

FROM ubuntu:22.04

### START SECTION AMD ROCm install
# based on https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/Dockerfile
ARG DEBIAN_FRONTEND=noninteractive
ARG USE_MLIR="OFF"

# Support multiarch
RUN dpkg --add-architecture i386

# Install preliminary dependencies
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
    apt-utils \
    ca-certificates \
    curl \
    libnuma-dev \
    gnupg2 \
    wget

#Add gpg keys
ENV APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=DontWarn
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 9386B48A1A693C5C && \
    wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -

# Check the AMD repo for exact package name
RUN wget https://repo.radeon.com/amdgpu-install/5.4.2/ubuntu/jammy/amdgpu-install_5.4.50402-1_all.deb --no-check-certificate
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
    ./amdgpu-install_5.4.50402-1_all.deb

# Add rocm repository
# Note: The ROCm version with $USE_MLIR should keep in sync with default ROCm version
# unless MLIR library is incompatible with current ROCm.
RUN export ROCM_APT_VER=5.4.2;\
echo $ROCM_APT_VER &&\
sh -c 'echo deb [arch=amd64 trusted=yes] http://repo.radeon.com/rocm/apt/$ROCM_APT_VER/ ubuntu main > /etc/apt/sources.list.d/rocm.list'
RUN sh -c "echo deb http://mirrors.kernel.org/ubuntu jammy main universe | tee -a /etc/apt/sources.list"

RUN amdgpu-install -y --usecase=rocm,graphics,rocmdev,rocmdevtools,lrt,hip,hiplibsdk,mllib,mlsdk --no-dkms

# Install dependencies
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
    build-essential \
    cmake \
    clang-format-12 \
    doxygen \
    gdb \
    git \
    lcov \
    libncurses5-dev \
    llvm-amdgpu \
    miopengemm \
    pkg-config \
    python3-dev \
    python3-pip \
    python3-venv \
    rocblas \
    rpm \
    software-properties-common

# Setup ubsan environment to printstacktrace
ENV UBSAN_OPTIONS=print_stacktrace=1
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

### START SECTION install Magma (torch dep) and PyTorch deps
# For Magma
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
    libmkl-core libmkl-def libmkl-dev libmkl-full-dev libmkl-intel-thread libmkl-gnu-thread gfortran

# For PyTorch
RUN apt-get update && \ 
DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends --allow-unauthenticated \
    build-essential ca-certificates ccache cmake curl git libjpeg-dev libpng-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

### START SECTION Magma and Torch build
RUN useradd -m -G video -U --shell /bin/bash roc && \
    mkdir /build && \
    chown roc:roc /build
USER roc
WORKDIR /build

# Download latest Magma version: http://icl.utk.edu/projectsfiles/magma/downloads/
# Install steps found here: https://salsa.debian.org/science-team/magma/-/tree/master/
RUN wget -qnc "https://icl.utk.edu/projectsfiles/magma/downloads/magma-2.7.0.tar.gz" -O "magma.tar.gz" && \
    tar -xzf magma.tar.gz && \
    rm magma.tar.gz && \
    mv magma* magma && \
    mkdir magma/build

WORKDIR /build/magma/build

# ERRORS START HERE, RUN THE REST OF THIS INTERACTIVELY

# You may want to adopt gfx1100 to something else: https://llvm.org/docs/AMDGPUUsage.html#processors search gfx11
RUN cmake -DMAGMA_ENABLE_HIP=ON -DCMAKE_CXX_COMPILER=hipcc -DGPU_TARGET='gfx1100' .. && \
    make -j $(nproc)

USER root
RUN make install
USER roc
WORKDIR /build
CMD git clone -j 4 --recursive https://github.com/pytorch/pytorch --depth 1 --branch v1.13.1

# Build of Torch based on: https://github.com/pytorch/pytorch/blob/master/Dockerfile
# Miniconda is experimental here, maybe use Anaconda if run interactively
RUN curl -fsSL -v -o ~/miniconda.sh -O  "https://repo.anaconda.com/miniconda/Miniconda3-py39_22.11.1-1-Linux-x86_64.sh" && \
    RUN chmod +x ~/miniconda.sh && \
    ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh && \
    /opt/conda/bin/conda install -y python=3.9 cmake conda-build pyyaml numpy ipython && \
    /opt/conda/bin/python -mpip install -r /build/pytorch/requirements.txt && \
    /opt/conda/bin/conda install -y ninja cffi dataclasses && \
    /opt/conda/bin/conda install -y mkl mkl-include && \
    /opt/conda/bin/conda clean -ya

WORKDIR /build/pytorch
RUN python tools/amd_build/build_amd.py

RUN --mount=type=cache,target=/opt/ccache \
    export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../",/usr/local/magma/ \
    PYTORCH_ROCM_ARCH=gfx1100 USE_MAGMA=1 USE_ROCM=1 USE_NVCC=0 USE_CUDA=0 python setup.py install

# Test build of Torch
# Should print: True Radeon RX 7900 XTX
RUN python3 -c 'import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(torch.cuda.current_device()))'

# Torchvision build
WORKDIR /build
RUN git clone --recursive https://github.com/pytorch/vision --depth 1 --branch v0.14.1
WORKDIR /build/vision
RUN python setup.py install
WORKDIR /build
RUN rm -rf pytorch && rm -rf vision

Build with docker build . -t rocmbuild:1

Run interactively with:
docker run -d --network=host --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 8G rocmbuild:1 sleep 400000
(hacky, but works+some volumes might be wanted)

aaronmondal · 2023-01-25T16:10:04Z

Can confirm that with HSA_OVERRIDE_GFX_VERSION=10.3.0 the issue seems to go away on Gentoo when unmasking the currently still pre-experimental Clang/LLVM 16 toolchain and adjusting the 5.3.3 ebuilds for the following package versions:

rocr-runtime-5.4.1  # 5.4.2 not yet available.
roct-thunk-interface-5.4.2
rocm-opencl-runtime-5.4.2
rocm-comgr-5.4.2
rocm-device-libs-5.4.2

So this issue should originate from one of these libraries.

The downside is that the gentoo Clang 16 toolchain is not able to build mesa due to rtti flag mismatch, so current usability may be limited. That's either a gentoo or mesa bug though.

evshiron · 2023-06-29T06:38:57Z

I was experimenting various things recently, and it seems like Navi 3x performance still has a lot of room for improvement.

You might see some improvements on Navi 3x, but most of them are for MI GPUs.

onitake · 2023-06-29T06:41:41Z

I'm looking forward to this: ROCm/flash-attention#1
The original FlashAttention algorithm implementation was, sadly, totally Nvidia optimized and not usable on ROCm. But AMD is changing that right now. This will be very useful for ML workloads, such as https://pytorch.org/docs/main/generated/torch.nn.functional.scaled_dot_product_attention.html

evshiron · 2023-06-29T06:49:29Z

@onitake

Yeah. It seems to work on MI GPUs and the numbers look promising. I merged two branches in Composable Kernel for it to support Navi 31 yesterday, but haven't got it to work for now. If you are interested and want to mess it up, are-we-gfx1100-yet/composable_kernel might be a ~~good~~ start point.

BloodBlight · 2023-07-01T23:22:02Z

For anyone interested, I am posting a slightly updated version of this:
https://gist.github.com/BloodBlight/0d36b33d215056395f34db26fb419a63

EDIT: Ops! Wrong window!!! But, I am leaving this here in case anyone wants it.

briansp2020 · 2023-09-04T15:29:30Z

Are there still people who are waiting for 7900XTX support? Though the performance is still a bit poor, TensorFlow-upstream now runs when built on the latest ROCm release. I was looking into the status of ROCm support for 7900XTX and found a few issues opened by different people and wanted to link all to the issue I opened in MIOpen repo. Though there has not been any confirmation from the developer, I think the performance issues are due to insufficient optimization of MIOpen.
ROCm/MIOpen#2342

johnnynunez · 2023-10-31T01:01:58Z

Are there still people who are waiting for 7900XTX support? Though the performance is still a bit poor, TensorFlow-upstream now runs when built on the latest ROCm release. I was looking into the status of ROCm support for 7900XTX and found a few issues opened by different people and wanted to link all to the issue I opened in MIOpen repo. Though there has not been any confirmation from the developer, I think the performance issues are due to insufficient optimization of MIOpen. ROCmSoftwarePlatform/MIOpen#2342

use ubuntu 22.04 and rocm 5.7.1
evshiron/rocm_lab#16

johnnynunez · 2023-11-02T21:49:08Z

7900 xt running tensorflow 2.14 rocm5.7.0 but very low performance. Pytorch is currently working very well.
4090 it's around 90k

briansp2020 · 2023-11-02T23:08:16Z

@johnnynunez
Is the picture showing 7.7K with 7900XT? I don't think it's running on your GPU.
My 7900XTX scores about 41K https://gist.github.com/briansp2020/3e176c7a933cf23531642e326a2f91c5

johnnynunez · 2023-11-03T09:09:51Z

@johnnynunez Is the picture showing 7.7K with 7900XT? I don't think it's running on your GPU. My 7900XTX scores about 41K https://gist.github.com/briansp2020/3e176c7a933cf23531642e326a2f91c5

It’s running 7900xt, I’ve check it.

johnnynunez · 2023-11-03T12:34:37Z

@johnnynunez Is the picture showing 7.7K with 7900XT? I don't think it's running on your GPU. My 7900XTX scores about 41K https://gist.github.com/briansp2020/3e176c7a933cf23531642e326a2f91c5

did you compile tensorflow-upstream master or r2.14-enhanced-rocm?

briansp2020 · 2023-11-03T13:34:05Z

I think, at the time I ran the benchmark, the master was 2.14. Now when I want to run the benchmark, I build r2.14 as I noticed some incompatibility when running the benchmark using the master. I haven't worked with my 7900XTX for a while since I bought MI100. So, I may not remember the version number correctly. But the gist is that master branch used to work but not anymore and I had to pick a version.

johnnynunez · 2023-11-03T13:44:14Z

I think, at the time I ran the benchmark, the master was 2.14. Now when I want to run the benchmark, I build r2.14 as I noticed some incompatibility when running the benchmark using the master. I haven't worked with my 7900XTX for a while since I bought MI100. So, I may not remember the version number correctly. But the gist is that master branch used to work but not anymore and I had to pick a version.

I've updated the scripts to build with last master commit and rocm 5.7.1 if you want.
evshiron/rocm_lab#16

Secondly modify this line. In my case 32gb 16 cores and 32 threads.

 RESOURCE_OPTION="--local_ram_resources=60000 --local_cpu_resources=35 --jobs=70"

RESOURCE_OPTION="--local_ram_resources=28000 --local_cpu_resources=16 --jobs=32"

briansp2020 · 2023-11-03T18:29:11Z

BTW, r2.14-enhanced-rocm has typo that prevents it from detecting 7900XTX properly. You need to fix tensorflow/compiler/xla/stream_executor/device_description.h line 184. It's missing a comma. I'm not sure what is going on since it was fixed multiple times in the past. But it keeps coming back... I think the master branch is OK.

johnnynunez · 2023-11-03T21:10:19Z

BTW, r2.14-enhanced-rocm has typo that prevents it from detecting 7900XTX properly. You need to fix tensorflow/compiler/xla/stream_executor/device_description.h line 184. It's missing a comma. I'm not sure what is going on since it was fixed multiple times in the past. But it keeps coming back... I think the master branch is OK.

Yes I knew it and fix it

vampireLibrarianMonk · 2024-01-13T11:44:11Z

This is not fixed in the recent 2.14 dockerfile push.

How can i manually compile this one file and correct it?

evshiron · 2024-01-13T15:34:16Z

@vampireLibrarianMonk

The code on the main development branch looks correct, and you can give the CI link in this comment a try, which contains the nightly .whl files:

7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880 (comment)

vampireLibrarianMonk · 2024-01-13T17:33:51Z

Ok that worked but this is long way for newbies such as myself from using. I will keep redoing the steps I have found on rocm docs instead of the amd driver website that is giving me dkms errors.

evshiron · 2024-01-14T13:57:10Z

Yes, unfortunately, we still do not have tensorflow-rocm or related Docker images that directly support Navi 31 GPUs after half a year.

johnnynunez · 2024-01-14T15:02:11Z

Yes, unfortunately, we still do not have tensorflow-rocm or related Docker images that directly support Navi 31 GPUs after half a year.

In my case, I have still freeze with memory transfer etc

alatecj · 2024-01-15T16:56:38Z

So, apart from the .whl nightly build recommendation, can I as an owner of 6700xt do to get the faulty rocm/tensorflow:latest docker image running? Is there a possibility to recompile tensorflow within the docker image after fixing the comma in the .h file?

vampireLibrarianMonk · 2024-01-15T17:13:17Z

I am working on a tutorial to work for my 7900 xtx and 6600 xt.

https://github.com/vampireLibrarianMonk/amd-gpu-hello

I do not yet have the download and manual compilation/installation of tensorflow-upstream of 2.15 and above but it will borrow a lot from this post.

#1880 (comment)

evshiron · 2024-01-15T17:32:44Z

These two comments should help:

These steps might work (I don't have access to a machine for testing at the moment):

docker pull rocm/tensorflow:rocm6.0-tf2.14-dev
- https://hub.docker.com/r/rocm/tensorflow
docker run --name <any-name-you-want> -it --privileged --net host --group-add sudo rocm/tensorflow:rocm6.0-tf2.14-dev /bin/bash
- Stable diffusion with RX7900XTX on ROCm5.7 composable_kernel#1032 (reply in thread)
In bash of the container
- git clone https://github.com/ROCmSoftwarePlatform/tensorflow-upstream.git
- cd tensorflow-upstream
- ./build_rocm_python3

You may need to be root or in video and render groups in the container to access your GPUs (try rocminfo and rocm-smi commands), and do check the environment variables if something doesn't work properly.

alatecj · 2024-01-15T21:19:47Z

These two comments should help:

* [7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880 (comment)](https://github.com/ROCm/ROCm/issues/1880#issuecomment-1548685112)

* [7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880 (comment)](https://github.com/ROCm/ROCm/issues/1880#issuecomment-1548686209)

These steps might work (I don't have access to a machine for testing at the moment):

* `docker pull rocm/tensorflow:rocm6.0-tf2.14-dev`
  
  * https://hub.docker.com/r/rocm/tensorflow

* `docker run --name <any-name-you-want> -it --privileged --net host --group-add sudo rocm/tensorflow:rocm6.0-tf2.14-dev /bin/bash`
  
  * [Stable diffusion with RX7900XTX on ROCm5.7 composable_kernel#1032 (reply in thread)](https://github.com/ROCm/composable_kernel/discussions/1032#discussioncomment-7651690)

* In bash of the container
  
  * `git clone https://github.com/ROCmSoftwarePlatform/tensorflow-upstream.git`
  * `cd tensorflow-upstream`
  * `./build_rocm_python3`

You may need to be root or in video and render groups in the container to access your GPUs (try rocminfo and rocm-smi commands), and do check the environment variables if something doesn't work properly.

Unfortunately, the build fails on FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/llvm-17/bin/clang'

ERROR: /root/.cache/bazel/_bazel_root/be761df731c6b0cca47819a0a9713b70/external/com_google_protobuf/BUILD.bazel:364:11: Compiling src/google/protobuf/compiler/objectivec/objectivec_field.cc [for tool] failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target @com_google_protobuf//:protoc_lib) 
  (cd /root/.cache/bazel/_bazel_root/be761df731c6b0cca47819a0a9713b70/execroot/org_tensorflow && \
  exec env - \
    DOCKER_HOST_CACHEBUSTER=1702938961712741763 \
    PATH=/root/.cache/bazelisk/downloads/bazelbuild/bazel-6.1.0-linux-x86_64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
  external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++14' -MD -MF bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf/_objs/protoc_lib/objectivec_field.d '-frandom-seed=bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf/_objs/protoc_lib/objectivec_field.o' '-DBAZEL_CURRENT_REPOSITORY="com_google_protobuf"' -iquote external/com_google_protobuf -iquote bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf -iquote external/zlib -iquote bazel-out/k8-opt-exec-50AE0418/bin/external/zlib -isystem external/com_google_protobuf/src -isystem bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf/src -isystem external/zlib -isystem bazel-out/k8-opt-exec-50AE0418/bin/external/zlib -g0 -w -Wno-sign-compare -g0 '-std=c++17' -DHAVE_ZLIB -Woverloaded-virtual -Wno-sign-compare -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_AMD__ -DEIGEN_USE_HIP -no-canonical-prefixes -fno-canonical-system-headers -c external/com_google_protobuf/src/google/protobuf/compiler/objectivec/objectivec_field.cc -o bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf/_objs/protoc_lib/objectivec_field.o)
# Configuration: 908a43cbc08d862315c42f531704f207dd474f3f91dc667c1ba8b0ac2bb0e9e1
# Execution platform: @local_execution_config_platform//:platform
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/be761df731c6b0cca47819a0a9713b70/execroot/org_tensorflow/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 279, in <module>
    sys.exit(main())
  File "/root/.cache/bazel/_bazel_root/be761df731c6b0cca47819a0a9713b70/execroot/org_tensorflow/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 276, in main
    return subprocess.call([CPU_COMPILER] + cpu_compiler_flags)
  File "/usr/lib/python3.9/subprocess.py", line 349, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.9/subprocess.py", line 1837, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/llvm-17/bin/clang'

vampireLibrarianMonk · 2024-01-15T23:50:07Z

Review the rocm enhanced branches. The latest usually isn’t the best place to start.

johnnynunez · 2024-01-16T20:30:02Z

These two comments should help:

7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880 (comment)

7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880 (comment)

These steps might work (I don't have access to a machine for testing at the moment):

docker pull rocm/tensorflow:rocm6.0-tf2.14-dev

https://hub.docker.com/r/rocm/tensorflow

docker run --name <any-name-you-want> -it --privileged --net host --group-add sudo rocm/tensorflow:rocm6.0-tf2.14-dev /bin/bash

Stable diffusion with RX7900XTX on ROCm5.7 composable_kernel#1032 (reply in thread)

In bash of the container

git clone https://github.com/ROCmSoftwarePlatform/tensorflow-upstream.git

cd tensorflow-upstream

./build_rocm_python3

You may need to be root or in video and render groups in the container to access your GPUs (try rocminfo and rocm-smi commands), and do check the environment variables if something doesn't work properly.

I updated your repository, and I can compile pytorch and tensorflow with the latest versions.
My GPU is 7900 XT. https://github.com/evshiron/rocm_lab/pull/16/files

vampireLibrarianMonk · 2024-02-18T01:03:13Z

Is anyone gonna update these docs?

https://github.com/ROCm/tensorflow-upstream/tree/develop-upstream/rocm_docs

Seems pretty dated and if not for the last comment I would be lost.

ppanchad-amd · 2024-05-09T19:16:05Z

@Mushoz Has your issue been resolved? If so, please close the ticket. Thanks!

harkgill-amd · 2024-08-22T15:35:15Z

Hi @Mushoz, with ROCm 6.2.0 and TensorFlow 2.16.1, I was able to run the example on a 7900XTX without encountering any issues. Successful runs were done after installing TensorFlow using the prebuilt Docker image

docker pull rocm/tensorflow:latest
docker run -it --network=host --device=/dev/kfd --device=/dev/dri \
--ipc=host --shm-size 16G --group-add video --cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined rocm/tensorflow:latest

and using the ROCm 6.2.0 + TF 2.1.6.1 wheels package

pip install --user tensorflow-rocm==2.16.1 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.2/ --upgrade

There have been many updates/fixes since the issues in this thread have been posted. If anyone does encounter further issues using TensorFlow with ROCm on the 7900XTX, please open a new issue so we investigate it further. Thanks!

aaronmondal mentioned this issue Jan 24, 2023

Add ROCm 5.4.2 ROCm/ROCR-Runtime#152

Closed

nonnull-ca mentioned this issue Mar 15, 2024

Linux? amd/RyzenAI-SW#2

Open

harkgill-amd closed this as completed Aug 22, 2024

7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880

7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880

Comments

Mushoz commented Dec 24, 2022

Issue Type

Tensorflow Version

rocm Version

Custom Code

OS Platform and Distribution

Python version

GPU model and memory

Current Behaviour?

Standalone code to reproduce the issue

Relevant log output

sofiageo commented Dec 24, 2022

Mushoz commented Dec 24, 2022

sofiageo commented Dec 24, 2022

Mushoz commented Dec 24, 2022

sofiageo commented Dec 24, 2022

Mushoz commented Dec 25, 2022

Syntax3rror404 commented Dec 29, 2022

jannesklee commented Dec 29, 2022 • edited Loading

Syntax3rror404 commented Dec 29, 2022

jannesklee commented Dec 29, 2022 • edited Loading

Mushoz commented Dec 29, 2022

jannesklee commented Dec 29, 2022 • edited Loading

Mushoz commented Dec 29, 2022

jannesklee commented Dec 29, 2022 • edited Loading

saadrahim commented Dec 29, 2022

Syntax3rror404 commented Dec 29, 2022

Mushoz commented Dec 29, 2022

cgmb commented Jan 3, 2023 • edited Loading

jannesklee commented Jan 4, 2023 • edited Loading

Mushoz commented Jan 4, 2023

AndersStendevad commented Jan 9, 2023

wsippel commented Jan 11, 2023

Mushoz commented Jan 19, 2023

Kardi5 commented Jan 23, 2023 • edited Loading

Mushoz commented Jan 23, 2023

Kardi5 commented Jan 23, 2023

aaronmondal commented Jan 24, 2023

Kardi5 commented Jan 25, 2023 • edited Loading

aaronmondal commented Jan 25, 2023 • edited Loading

evshiron commented Jun 29, 2023

onitake commented Jun 29, 2023

evshiron commented Jun 29, 2023 • edited Loading

BloodBlight commented Jul 1, 2023 • edited Loading

briansp2020 commented Sep 4, 2023

johnnynunez commented Oct 31, 2023

johnnynunez commented Nov 2, 2023 • edited Loading

briansp2020 commented Nov 2, 2023

johnnynunez commented Nov 3, 2023

johnnynunez commented Nov 3, 2023 • edited Loading

briansp2020 commented Nov 3, 2023

johnnynunez commented Nov 3, 2023 • edited Loading

briansp2020 commented Nov 3, 2023

johnnynunez commented Nov 3, 2023

vampireLibrarianMonk commented Jan 13, 2024

evshiron commented Jan 13, 2024 • edited Loading

vampireLibrarianMonk commented Jan 13, 2024

evshiron commented Jan 14, 2024

johnnynunez commented Jan 14, 2024

alatecj commented Jan 15, 2024

vampireLibrarianMonk commented Jan 15, 2024

evshiron commented Jan 15, 2024 • edited Loading

alatecj commented Jan 15, 2024

vampireLibrarianMonk commented Jan 15, 2024

johnnynunez commented Jan 16, 2024

vampireLibrarianMonk commented Feb 18, 2024

ppanchad-amd commented May 9, 2024 • edited Loading

harkgill-amd commented Aug 22, 2024

jannesklee commented Dec 29, 2022 •

edited

Loading

jannesklee commented Dec 29, 2022 •

edited

Loading

jannesklee commented Dec 29, 2022 •

edited

Loading

jannesklee commented Dec 29, 2022 •

edited

Loading

cgmb commented Jan 3, 2023 •

edited

Loading

jannesklee commented Jan 4, 2023 •

edited

Loading

Kardi5 commented Jan 23, 2023 •

edited

Loading

Kardi5 commented Jan 25, 2023 •

edited

Loading

aaronmondal commented Jan 25, 2023 •

edited

Loading

evshiron commented Jun 29, 2023 •

edited

Loading

BloodBlight commented Jul 1, 2023 •

edited

Loading

johnnynunez commented Nov 2, 2023 •

edited

Loading

johnnynunez commented Nov 3, 2023 •

edited

Loading

johnnynunez commented Nov 3, 2023 •

edited

Loading

evshiron commented Jan 13, 2024 •

edited

Loading

evshiron commented Jan 15, 2024 •

edited

Loading

ppanchad-amd commented May 9, 2024 •

edited

Loading