Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880

Open
Mushoz opened this issue Dec 24, 2022 · 273 comments
Open

7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880

Mushoz opened this issue Dec 24, 2022 · 273 comments

Comments

@Mushoz
Copy link

Mushoz commented Dec 24, 2022

Issue Type

Bug

Tensorflow Version

Tensorflow-rocm v2.11.0-3797-gfe65ef3bbcf 2.11.0

rocm Version

5.4.1

Custom Code

Yes

OS Platform and Distribution

Archlinux: Kernel 6.1.1

Python version

3.10

GPU model and memory

7900 XTX 24GB

Current Behaviour?

I am not entirely sure whether this is an upstream (ROCM) issue, or with Tensorflow-rocm specifically, so I am reporting it to both repo's. A toy example refuses to run and dumps core. I would have expected it to train successfully.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

features = np.random.randn(10000,25)
targets = np.random.randn(10000)

model = tf.keras.Sequential([
     tf.keras.layers.Dense(1)
])

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
              loss=tf.keras.losses.MeanSquaredError())

model.fit(x=features, y=targets)

Relevant log output

[jaap@Jaap-Desktop code]$ pipenv run python testNN.py
2022-12-24 11:18:37.178811: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
python: /build/hsa-rocr/src/ROCR-Runtime-rocm-5.4.1/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char*, AssembleTarget, void*&, size_t&) const: Assertion `code_buf != NULL && "Code buffer allocation failed"' failed.
@sofiageo
Copy link

It's probably a packaging issue for Arch, try with opencl-amd and opencl-amd-dev from AUR and see if it makes a difference.

p.s damn that GPU must be a beast 💯

@Mushoz
Copy link
Author

Mushoz commented Dec 24, 2022

Unfortunately that doesn't seem to work. First it tries to remove the conflicting packages:

:: opencl-amd and rocm-opencl-runtime are in conflict. Remove rocm-opencl-runtime? [y/N] y
:: opencl-amd and hip-runtime-amd are in conflict (hip). Remove hip-runtime-amd? [y/N] y

However, answering Y to both question still results in a failure to install:

error: failed to commit transaction (conflicting files)
opencl-amd: /opt/rocm exists in filesystem
Errors occurred, no packages were upgraded.
 -> exit status 1

Are you sure these packages are even required though? From what I understand, tensorflow-rocm does NOT use opencl at all. As a matter of fact, I upgraded from a 6900XT which was able to run tensorflow-rocm with the exact same packages I have currently installed just fine.

@sofiageo
Copy link

The package name is just that for historical reasons, nothing to do with OpenCL. The reason you get these conflicts errors is because it's not properly handling the conflicts. It's something I will try to fix soon but it's not there yet. So you have to manually remove any rocm-arch package yourself if you want to try opencl-amd.

p.s I don't want to spam the rocm issue tracker with arch packaging comments, so if you are still interested to try it feel free to comment on the AUR page and we can continue the discussion there.

@Mushoz
Copy link
Author

Mushoz commented Dec 24, 2022

I just uninstalled all previous rocm packages and went with the opencl-amd + opencl-amd-dev, but that's just making the example run on the CPU rather than the GPU. So unfortunately it does not fix the issue at hand. Any ideas? :)

@sofiageo
Copy link

I guess it's because your GPU is not supported yet in ROCm. I ran your example with my 5700 XT and it's working fine (although it didn't complete in 10 minutes and I had to cancel it). Maybe you can try to HSA_OVERRIDE_GFX_VERSION=10.3.0 python sample.py or something similar.

@Mushoz
Copy link
Author

Mushoz commented Dec 25, 2022

That just makes it crash with an out of memory error, which is bogus for such a small example with 24GB memory:

[jaap@Jaap-Desktop code]$ HSA_OVERRIDE_GFX_VERSION=10.3.0 pipenv run python testNN.py
2022-12-25 20:49:47.446031: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-25 20:49:48.428818: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.466946: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.466999: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.467209: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-25 20:49:48.468937: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469011: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469044: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469138: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469176: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469209: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-25 20:49:48.469229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 24060 MB memory:  -> device: 0, name: AMD Radeon Graphics, pci bus id: 0000:2d:00.0
2022-12-25 20:49:48.492206: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.492218: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.492221: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.512792: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.512801: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.512804: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.512811: I tensorflow/compiler/xla/stream_executor/stream.cc:1038] [stream=0x55edc5775fb0,impl=0x55edc5775770] did not wait for [stream=0x55edc5794720,impl=0x55edc5775e20]
2022-12-25 20:49:48.512815: I tensorflow/compiler/xla/stream_executor/stream.cc:1038] [stream=0x55edc5794720,impl=0x55edc5775e20] did not wait for [stream=0x55edc5775fb0,impl=0x55edc5775770]
2022-12-25 20:49:48.533248: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.533265: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.533270: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.553530: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.553539: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.553543: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.573939: E tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc:573] could not allocate ROCM stream for device 0: HIP_ERROR_OutOfMemory
2022-12-25 20:49:48.573949: I tensorflow/compiler/xla/stream_executor/stream_executor_pimpl.cc:791] failed to allocate stream; live stream count: 1
2022-12-25 20:49:48.573953: E tensorflow/compiler/xla/stream_executor/stream.cc:297] failed to allocate stream during initialization
2022-12-25 20:49:48.582885: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-25 20:49:48.668485: I tensorflow/compiler/xla/stream_executor/stream.cc:1038] [stream=0x55edc57947d0,impl=0x55edc5794920] did not wait for [stream=0x55edc5775fb0,impl=0x55edc5775770]

@Syntax3rror404
Copy link

7900xtx with rocm would be awsome!!!! @Mushoz do you get it working now? I have the same usecase

@jannesklee
Copy link

jannesklee commented Dec 29, 2022

The problem also occurs with 7900xt. Also with arch linux rocm packages from aur. Is there anything that can be done in order to make it run?

Edit: I reproduced the same output with the samples/0_Intro/bit_extract in https://github.com/ROCm-Developer-Tools/HIP.git as an easier minimal example.

@Syntax3rror404
Copy link

So this means this problem are only exits on arch linux? And not on ubuntu or debian?

@jannesklee
Copy link

jannesklee commented Dec 29, 2022

Installing opencl-amd and opencl-amd-dev seems to work for me.

@Mushoz Did you install llvm with version >= 15 (arch still has 14)

You can also have a look at:
https://www.phoronix.com/review/rx7900xt-rx7900xtx-linux
https://www.reddit.com/r/linux_gaming/comments/zk0462/amd_radeon_rx_7900_xtx_rx_7900_xt_linux_support/

There it states what is needed:

  • llvm >= 15
  • new mesa version compiled against llvm >= 15
  • the firmware needed to be added manually, but I think it is now already included (at least in arch)

@Mushoz
Copy link
Author

Mushoz commented Dec 29, 2022

@jannesklee I am running llvm-minimal-git. Everything is working as it should game-wise. It's just that rocm is broken. Are you able to run the example in my first post just fine? And you are certain it's running on the GPU and not the CPU? Could you run the following python script and show the output?

import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))

@jannesklee
Copy link

jannesklee commented Dec 29, 2022

I got the same error when testing the minimal example shown above, and other samples and it vanished when I used the other packages. When I check the usage with nvtop it shows me that the dedicated graphic card is in use.

Maybe the llvm-minimal-git version is not enough. At https://aur.archlinux.org/pkgbase/llvm-git Lone_Wolf states that llvm-minimal-git focuses on providing stuff needed for AUR mesa-git. Doesn't support cross-compiling or any bindings for external stuff like ocaml & python.

Unfortunately I am currently not capable to install tensorflow, because I get compilation errors, but this is something else I guess. I try to make it run but without success.

@Mushoz
Copy link
Author

Mushoz commented Dec 29, 2022

@jannesklee no need to compile tensorflow. You can install tensorflow-rocm via pip or pipenv if you want to keep it contained within its own virtual environment. Would you mind running my previously mentioned script?

@jannesklee
Copy link

jannesklee commented Dec 29, 2022

My output is. I do not completely understand it to be honest..

python samply.py 
2022-12-29 19:22:12.450100: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-29 19:22:12.510354: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-12-29 19:22:13.488612: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.488649: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521375: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521408: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521421: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1990] Ignoring visible gpu device (device: 0, name: AMD Radeon Graphics, pci bus id: 0000:16:00.0) with AMDGPU version : gfx1100. The supported AMDGPU versions are gfx1030, gfx900, gfx906, gfx908, gfx90a.
2022-12-29 19:22:13.521448: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:843] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-12-29 19:22:13.521454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1990] Ignoring visible gpu device (device: 1, name: AMD Radeon Graphics, pci bus id: 0000:38:00.0) with AMDGPU version : gfx1036. The supported AMDGPU versions are gfx1030, gfx900, gfx906, gfx908, gfx90a.
2022-12-29 19:22:13.521638: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-29 19:22:13.531205: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.531933: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.532580: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.534466: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.534728: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.535002: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.537283: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.538129: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.540706: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.541347: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.546382: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.546865: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.551241: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.551819: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.552166: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.552624: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.555412: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.555920: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.556342: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.556773: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.557366: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.558349: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.558775: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.559037: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.562307: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.565317: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.567121: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.579977: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.580617: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.581283: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.581875: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.583109: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.583446: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.583800: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.584342: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.584655: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.683227: E tensorflow/core/framework/node_def_util.cc:675] NodeDef mentions attribute grad_a which is not in the op definition: Op<name=_MklMatMul; signature=a:T, b:T -> product:T; attr=transpose_a:bool,default=false; attr=transpose_b:bool,default=false; attr=T:type,allowed=[DT_BFLOAT16, DT_FLOAT]> This may be expected if your graph generating binary is newer  than this binary. Unknown attributes will be ignored. NodeDef: {{node gradient_tape/sequential/dense/MatMul/MatMul}}
2022-12-29 19:22:13.684957: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
190/313 [=================>............] - ETA: 0s - loss: 2.4990 2022-12-29 19:22:13.768593: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.768931: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
2022-12-29 19:22:13.769222: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:507] ROCm Fusion is enabled.
313/313 [==============================] - 0s 257us/step - loss: 2.1766

@saadrahim
Copy link
Member

Support for this GPU is not enabled on ROCm 5.4.1. Please await the 5.5.0 release announcement to check for support.

@Syntax3rror404
Copy link

When we can expect a release of 5.5.0 are there any date scheduled?

@Mushoz
Copy link
Author

Mushoz commented Dec 29, 2022

@jannesklee I have the same output. Unfortunately it specifically states that it is ignoring the GPU because it is unsupported.

@saadrahim when can we expect 5.5.0 to release? CUDA is so much easier in this regard. It just works. In order for ROCM to be able to compete with CUDA it really has to step up in terms of communication so that users can rely on ROCM as they can on CUDA

@cgmb
Copy link
Collaborator

cgmb commented Jan 3, 2023

I'm a bit surprised that you're having trouble with ROCm 5.4.1 on the 7900 XTX, as that architecture is gfx1100 and most of the AMD-provided binaries for ROCm 5.4.1 contain gfx1100 code objects. It's not listed as officially supported in the GPU support table for ROCm 5.4, but I would have expected it would mostly work anyway. Is this problem specific to Tensorflow? e.g., do other libraries packaged by Arch work? A quick check might be to build and run Arch's test.cpp for rocrand.

I guess it's because your GPU is not supported yet in ROCm. I ran your example with my 5700 XT and it's working fine (although it didn't complete in 10 minutes and I had to cancel it). Maybe you can try to HSA_OVERRIDE_GFX_VERSION=10.3.0 python sample.py or something similar.

When you set HSA_OVERRIDE_GFX_VERSION=10.3.0, you're telling libhsakmt to pretend that your GPU is Navi 21 (gfx1030). To my knowledge, that works just fine for all the RDNA 2 GPUs, since they all use the same instruction set.

The RDNA 1 instruction sets are similar enough to the RDNA 2 instruction set that sometimes you can successfully run code that was compiled for RDNA 2 on an RDNA 1 GPU (as you are doing with your 5700 XT), however, this is not guaranteed to work. The instruction sets are not identical and if the code you're running happens to use an RDNA 2 instruction that worked differently in RDNA 1 (or doesn't exist at all in RDNA 1), then your program may not function correctly.

Similarly, the RDNA 3 instruction sets are different from the RDNA 2 instruction set. If you try to run code compiled for RDNA 2 on an RDNA 3 GPU using HSA_OVERRIDE_GFX_VERSION, the result may not work correctly.

@jannesklee
Copy link

jannesklee commented Jan 4, 2023

My assumption is also that it is a problem from tensorflow side. I tested above the samples from https://github.com/ROCm-Developer-Tools/HIP

Example bit_extract:

    make
    ./bit_extract

gives me

    info: running on device #0 AMD Radeon Graphics
    info: allocate host mem (  7.63 MB)
    info: allocate device mem (  7.63 MB)
    info: copy Host2Device
    info: launch 'bit_extract_kernel' 
    info: copy Device2Host
    info: check result
    PASSED!

I can also see some activity with nvtop, but unfortunately I do not know exactly how to give more details here.

Regarding your example I unfortunately get a core dump, when running ./test.sh:

In file included from test.cpp:1:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand.hpp:35:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand_kernel.h:54:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand_kernel_hcc.h:37:
In file included from /opt/rocm-5.4.1/include/rocrand/rocrand_kernel.h:28:
/opt/rocm-5.4.1/include/rocrand/rocrand_common.h:74:6: warning: "Disabled inline asm, because the build target does not support it." [-W#warnings]
    #warning "Disabled inline asm, because the build target does not support it."
     ^
1 warning generated when compiling for gfx1036.
In file included from test.cpp:1:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand.hpp:35:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand_kernel.h:54:
In file included from /opt/rocm-5.4.1/include/hiprand/hiprand_kernel_hcc.h:37:
In file included from /opt/rocm-5.4.1/include/rocrand/rocrand_kernel.h:28:
/opt/rocm-5.4.1/include/rocrand/rocrand_common.h:74:6: warning: "Disabled inline asm, because the build target does not support it." [-W#warnings]
    #warning "Disabled inline asm, because the build target does not support it."
     ^
1 warning generated when compiling for gfx1100.
./test.sh: line 5:  7225 Segmentation fault      (core dumped) "$OUT"/test

@Mushoz
Copy link
Author

Mushoz commented Jan 4, 2023

@jannesklee I am not so sure. @saadrahim Specifically stated that ROCM 5.5.0 is required for these cards to run tensorflow. I am also not surprised you are able to run that HIP example. There is some preliminary support for the 7900 series, given that Blender can also use the HIP backend just fine: https://www.phoronix.com/review/rx7900-blender-opencl

That has me thinking though. It would be interesting to see if pytorch-rocm is able to run. I can see that there are docker images available, and some tags are using rocm 5.4.1. That would take packaging issues AND tensorflow out of the equation, and would allow us to see if these cards are able to do any machine learning with the current rocm stack. I might try this out tonight.

Docker images in case you want to give it a shot: https://hub.docker.com/r/rocm/pytorch/tags

@AndersStendevad
Copy link

@jannesklee did it work ?

@wsippel
Copy link

wsippel commented Jan 11, 2023

@Mushoz pytorch-rocm doesn't appear to work, either. Can't find the GPU at all by default and segfaults with HSA_OVERRIDE_GFX_VERSION set.

@Mushoz
Copy link
Author

Mushoz commented Jan 19, 2023

@wsippel Ah, I just replied to you on the AUR but only just now realized you are active here as well. A week ago changes for RDNA3 were merged for MIOpen: https://github.com/ROCmSoftwarePlatform/MIOpen/commits/develop

See the 11th of January. Do you reckon we could get it to work by compiling MIOpen from source?

@Kardi5
Copy link

Kardi5 commented Jan 23, 2023

@wsippel @Mushoz I can confirm that with some effort a build of pytorch 1.13.1 against AMD RX 7900 XTX with ROCm 5.4.2 works and is functional for my use case of running models.

Rough outline for build is the usage of an Ubuntu (20.04/22.04) Docker image as AMD provides ROCm repos for it and installing all required deps without kernel module. See https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/Dockerfile#L67 basically edit 5.3 to 5.4.2 and run all commands till line 67. I also adapted the amdgpu install command to amdgpu-install -y --usecase=graphics,rocm,lrt,hip,hiplibsdk --no-dkms as some libs were missing for the torch build.

Maybe you can build tensorflow via instructions from https://www.tensorflow.org/install/source and adapting the build command to (in venv):
TF_NEED_ROCM=1 python configure.py

@Mushoz
Copy link
Author

Mushoz commented Jan 23, 2023

@Kardi5 Would you mind sharing the final dockerfile that you used? I would love to try and replicate that for Tensorflow. Please leave in all the pytorch specific things as well. I will try to do something similar for Tensorflow.

@Kardi5
Copy link

Kardi5 commented Jan 23, 2023

@Mushoz Sure, but I don't have a complete one myself right now. It was more of an interactive trial and error until all builds worked out. I hope to create a complete dockerfile tonight/tomorrow based on the notes I took.

@aaronmondal
Copy link

This issue also affects Gentoo when installing ROCm via portage. Installing dev-libs/rocm-opencl-runtime, which currently defaults to the older 5.3.3 will cause clinfo to raise the OPs error:

clinfo: /var/tmp/portage/dev-libs/rocr-runtime-5.3.3/work/ROCR-Runtime-rocm-5.3.3/src/core/runtime/amd_gpu_agent.cpp:339: void rocr::AMD::GpuAgent::AssembleShader(const char 
*, rocr::AMD::GpuAgent::AssembleTarget, void *&, size_t &) const: Assertion `code_buf != NULL && "Code buffer allocation failed"' failed.
Aborted (core dumped)

Im rather certain that this particular error is not related to TensorFlow or MIOpen, as I was able to repro the error above with only a basic installation of the Rocm OpenCL runtime and friends.

The changes from ROCR 5.4.1 to 5.4.2 have not been downstreamed to GitHub yet, making it tricky to reproduce the workaround @Kardi5 proposed for other distros. I guess I'll try with 5.4.1 for now.

@Kardi5
Copy link

Kardi5 commented Jan 25, 2023

@Mushoz So far I could only create a rough draft of a complete Dockerfile. Maybe you will find it useful nonetheless.
Current main problem is that my compilation of Magma shows a lot of error'd calls to ROCm as during docker build I can not attach any device like I can during docker build.

Over at https://github.com/pytorch/pytorch/blob/master/.circleci/docker/ubuntu-rocm/Dockerfile there is a more complete example even though much more complex. Their Magma build script (https://github.com/pytorch/pytorch/blob/master/.circleci/docker/common/install_rocm_magma.sh) might be the solution to my troubles but I did not have time to look through it in more detail.

There might still be errors besides Magma building after line WORKDIR /build/magma/build

Draft Torch + Torchvision Dockerfile

FROM ubuntu:22.04

### START SECTION AMD ROCm install
# based on https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/Dockerfile
ARG DEBIAN_FRONTEND=noninteractive
ARG USE_MLIR="OFF"

# Support multiarch
RUN dpkg --add-architecture i386

# Install preliminary dependencies
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
    apt-utils \
    ca-certificates \
    curl \
    libnuma-dev \
    gnupg2 \
    wget

#Add gpg keys
ENV APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=DontWarn
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 9386B48A1A693C5C && \
    wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -

# Check the AMD repo for exact package name
RUN wget https://repo.radeon.com/amdgpu-install/5.4.2/ubuntu/jammy/amdgpu-install_5.4.50402-1_all.deb --no-check-certificate
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
    ./amdgpu-install_5.4.50402-1_all.deb

# Add rocm repository
# Note: The ROCm version with $USE_MLIR should keep in sync with default ROCm version
# unless MLIR library is incompatible with current ROCm.
RUN export ROCM_APT_VER=5.4.2;\
echo $ROCM_APT_VER &&\
sh -c 'echo deb [arch=amd64 trusted=yes] http://repo.radeon.com/rocm/apt/$ROCM_APT_VER/ ubuntu main > /etc/apt/sources.list.d/rocm.list'
RUN sh -c "echo deb http://mirrors.kernel.org/ubuntu jammy main universe | tee -a /etc/apt/sources.list"

RUN amdgpu-install -y --usecase=rocm,graphics,rocmdev,rocmdevtools,lrt,hip,hiplibsdk,mllib,mlsdk --no-dkms

# Install dependencies
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
    build-essential \
    cmake \
    clang-format-12 \
    doxygen \
    gdb \
    git \
    lcov \
    libncurses5-dev \
    llvm-amdgpu \
    miopengemm \
    pkg-config \
    python3-dev \
    python3-pip \
    python3-venv \
    rocblas \
    rpm \
    software-properties-common

# Setup ubsan environment to printstacktrace
ENV UBSAN_OPTIONS=print_stacktrace=1
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

### START SECTION install Magma (torch dep) and PyTorch deps
# For Magma
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
    libmkl-core libmkl-def libmkl-dev libmkl-full-dev libmkl-intel-thread libmkl-gnu-thread gfortran

# For PyTorch
RUN apt-get update && \ 
DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends --allow-unauthenticated \
    build-essential ca-certificates ccache cmake curl git libjpeg-dev libpng-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

### START SECTION Magma and Torch build
RUN useradd -m -G video -U --shell /bin/bash roc && \
    mkdir /build && \
    chown roc:roc /build
USER roc
WORKDIR /build

# Download latest Magma version: http://icl.utk.edu/projectsfiles/magma/downloads/
# Install steps found here: https://salsa.debian.org/science-team/magma/-/tree/master/
RUN wget -qnc "https://icl.utk.edu/projectsfiles/magma/downloads/magma-2.7.0.tar.gz" -O "magma.tar.gz" && \
    tar -xzf magma.tar.gz && \
    rm magma.tar.gz && \
    mv magma* magma && \
    mkdir magma/build

WORKDIR /build/magma/build

# ERRORS START HERE, RUN THE REST OF THIS INTERACTIVELY

# You may want to adopt gfx1100 to something else: https://llvm.org/docs/AMDGPUUsage.html#processors search gfx11
RUN cmake -DMAGMA_ENABLE_HIP=ON -DCMAKE_CXX_COMPILER=hipcc -DGPU_TARGET='gfx1100' .. && \
    make -j $(nproc)

USER root
RUN make install
USER roc
WORKDIR /build
CMD git clone -j 4 --recursive https://github.com/pytorch/pytorch --depth 1 --branch v1.13.1

# Build of Torch based on: https://github.com/pytorch/pytorch/blob/master/Dockerfile
# Miniconda is experimental here, maybe use Anaconda if run interactively
RUN curl -fsSL -v -o ~/miniconda.sh -O  "https://repo.anaconda.com/miniconda/Miniconda3-py39_22.11.1-1-Linux-x86_64.sh" && \
    RUN chmod +x ~/miniconda.sh && \
    ~/miniconda.sh -b -p /opt/conda && \
    rm ~/miniconda.sh && \
    /opt/conda/bin/conda install -y python=3.9 cmake conda-build pyyaml numpy ipython && \
    /opt/conda/bin/python -mpip install -r /build/pytorch/requirements.txt && \
    /opt/conda/bin/conda install -y ninja cffi dataclasses && \
    /opt/conda/bin/conda install -y mkl mkl-include && \
    /opt/conda/bin/conda clean -ya

WORKDIR /build/pytorch
RUN python tools/amd_build/build_amd.py

RUN --mount=type=cache,target=/opt/ccache \
    export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../",/usr/local/magma/ \
    PYTORCH_ROCM_ARCH=gfx1100 USE_MAGMA=1 USE_ROCM=1 USE_NVCC=0 USE_CUDA=0 python setup.py install

# Test build of Torch
# Should print: True Radeon RX 7900 XTX
RUN python3 -c 'import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(torch.cuda.current_device()))'

# Torchvision build
WORKDIR /build
RUN git clone --recursive https://github.com/pytorch/vision --depth 1 --branch v0.14.1
WORKDIR /build/vision
RUN python setup.py install
WORKDIR /build
RUN rm -rf pytorch && rm -rf vision

Build with docker build . -t rocmbuild:1

Run interactively with:
docker run -d --network=host --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 8G rocmbuild:1 sleep 400000
(hacky, but works+some volumes might be wanted)

@aaronmondal
Copy link

aaronmondal commented Jan 25, 2023

Can confirm that with HSA_OVERRIDE_GFX_VERSION=10.3.0 the issue seems to go away on Gentoo when unmasking the currently still pre-experimental Clang/LLVM 16 toolchain and adjusting the 5.3.3 ebuilds for the following package versions:

rocr-runtime-5.4.1  # 5.4.2 not yet available.
roct-thunk-interface-5.4.2
rocm-opencl-runtime-5.4.2
rocm-comgr-5.4.2
rocm-device-libs-5.4.2

So this issue should originate from one of these libraries.

The downside is that the gentoo Clang 16 toolchain is not able to build mesa due to rtti flag mismatch, so current usability may be limited. That's either a gentoo or mesa bug though.

@cromefire
Copy link

https://github.com/RadeonOpenCompute/ROCm/releases/tag/rocm-5.6.0
It looks as though ROCm 5.6.0 might pack some performance improvements, haven't tested it yet, but it seems they are starting to focus on AI.

@evshiron
Copy link

I was experimenting various things recently, and it seems like Navi 3x performance still has a lot of room for improvement.

You might see some improvements on Navi 3x, but most of them are for MI GPUs.

@onitake
Copy link

onitake commented Jun 29, 2023

I'm looking forward to this: ROCm/flash-attention#1
The original FlashAttention algorithm implementation was, sadly, totally Nvidia optimized and not usable on ROCm. But AMD is changing that right now. This will be very useful for ML workloads, such as https://pytorch.org/docs/main/generated/torch.nn.functional.scaled_dot_product_attention.html

@evshiron
Copy link

evshiron commented Jun 29, 2023

@onitake

Yeah. It seems to work on MI GPUs and the numbers look promising. I merged two branches in Composable Kernel for it to support Navi 31 yesterday, but haven't got it to work for now. If you are interested and want to mess it up, are-we-gfx1100-yet/composable_kernel might be a good start point.

@BloodBlight
Copy link

BloodBlight commented Jul 1, 2023

For anyone interested, I am posting a slightly updated version of this:
https://gist.github.com/BloodBlight/0d36b33d215056395f34db26fb419a63

EDIT: Ops! Wrong window!!! But, I am leaving this here in case anyone wants it.

@briansp2020
Copy link

Are there still people who are waiting for 7900XTX support? Though the performance is still a bit poor, TensorFlow-upstream now runs when built on the latest ROCm release. I was looking into the status of ROCm support for 7900XTX and found a few issues opened by different people and wanted to link all to the issue I opened in MIOpen repo. Though there has not been any confirmation from the developer, I think the performance issues are due to insufficient optimization of MIOpen.
ROCm/MIOpen#2342

@johnnynunez
Copy link

Are there still people who are waiting for 7900XTX support? Though the performance is still a bit poor, TensorFlow-upstream now runs when built on the latest ROCm release. I was looking into the status of ROCm support for 7900XTX and found a few issues opened by different people and wanted to link all to the issue I opened in MIOpen repo. Though there has not been any confirmation from the developer, I think the performance issues are due to insufficient optimization of MIOpen. ROCmSoftwarePlatform/MIOpen#2342

use ubuntu 22.04 and rocm 5.7.1
evshiron/rocm_lab#16

@johnnynunez
Copy link

johnnynunez commented Nov 2, 2023

7900 xt running tensorflow 2.14 rocm5.7.0 but very low performance. Pytorch is currently working very well.
4090 it's around 90k
Screenshot 2023-11-02 at 22 49 19

@briansp2020
Copy link

@johnnynunez
Is the picture showing 7.7K with 7900XT? I don't think it's running on your GPU.
My 7900XTX scores about 41K https://gist.github.com/briansp2020/3e176c7a933cf23531642e326a2f91c5

@johnnynunez
Copy link

@johnnynunez Is the picture showing 7.7K with 7900XT? I don't think it's running on your GPU. My 7900XTX scores about 41K https://gist.github.com/briansp2020/3e176c7a933cf23531642e326a2f91c5

It’s running 7900xt, I’ve check it.

@johnnynunez
Copy link

johnnynunez commented Nov 3, 2023

@johnnynunez Is the picture showing 7.7K with 7900XT? I don't think it's running on your GPU. My 7900XTX scores about 41K https://gist.github.com/briansp2020/3e176c7a933cf23531642e326a2f91c5

did you compile tensorflow-upstream master or r2.14-enhanced-rocm?

@briansp2020
Copy link

I think, at the time I ran the benchmark, the master was 2.14. Now when I want to run the benchmark, I build r2.14 as I noticed some incompatibility when running the benchmark using the master. I haven't worked with my 7900XTX for a while since I bought MI100. So, I may not remember the version number correctly. But the gist is that master branch used to work but not anymore and I had to pick a version.

@johnnynunez
Copy link

johnnynunez commented Nov 3, 2023

I think, at the time I ran the benchmark, the master was 2.14. Now when I want to run the benchmark, I build r2.14 as I noticed some incompatibility when running the benchmark using the master. I haven't worked with my 7900XTX for a while since I bought MI100. So, I may not remember the version number correctly. But the gist is that master branch used to work but not anymore and I had to pick a version.

I've updated the scripts to build with last master commit and rocm 5.7.1 if you want.
evshiron/rocm_lab#16

Secondly modify this line. In my case 32gb 16 cores and 32 threads.

 RESOURCE_OPTION="--local_ram_resources=60000 --local_cpu_resources=35 --jobs=70"
RESOURCE_OPTION="--local_ram_resources=28000 --local_cpu_resources=16 --jobs=32"

image

@briansp2020
Copy link

BTW, r2.14-enhanced-rocm has typo that prevents it from detecting 7900XTX properly. You need to fix tensorflow/compiler/xla/stream_executor/device_description.h line 184. It's missing a comma. I'm not sure what is going on since it was fixed multiple times in the past. But it keeps coming back... I think the master branch is OK.

@johnnynunez
Copy link

BTW, r2.14-enhanced-rocm has typo that prevents it from detecting 7900XTX properly. You need to fix tensorflow/compiler/xla/stream_executor/device_description.h line 184. It's missing a comma. I'm not sure what is going on since it was fixed multiple times in the past. But it keeps coming back... I think the master branch is OK.

Yes I knew it and fix it

@vampireLibrarianMonk
Copy link

This is not fixed in the recent 2.14 dockerfile push.

How can i manually compile this one file and correct it?

@evshiron
Copy link

evshiron commented Jan 13, 2024

@vampireLibrarianMonk

The code on the main development branch looks correct, and you can give the CI link in this comment a try, which contains the nightly .whl files:

@vampireLibrarianMonk
Copy link

Ok that worked but this is long way for newbies such as myself from using. I will keep redoing the steps I have found on rocm docs instead of the amd driver website that is giving me dkms errors.

@evshiron
Copy link

Yes, unfortunately, we still do not have tensorflow-rocm or related Docker images that directly support Navi 31 GPUs after half a year.

@johnnynunez
Copy link

Yes, unfortunately, we still do not have tensorflow-rocm or related Docker images that directly support Navi 31 GPUs after half a year.

In my case, I have still freeze with memory transfer etc

@alatecj
Copy link

alatecj commented Jan 15, 2024

So, apart from the .whl nightly build recommendation, can I as an owner of 6700xt do to get the faulty rocm/tensorflow:latest docker image running? Is there a possibility to recompile tensorflow within the docker image after fixing the comma in the .h file?

@vampireLibrarianMonk
Copy link

I am working on a tutorial to work for my 7900 xtx and 6600 xt.

https://github.com/vampireLibrarianMonk/amd-gpu-hello

I do not yet have the download and manual compilation/installation of tensorflow-upstream of 2.15 and above but it will borrow a lot from this post.

#1880 (comment)

@evshiron
Copy link

evshiron commented Jan 15, 2024

These two comments should help:

These steps might work (I don't have access to a machine for testing at the moment):

You may need to be root or in video and render groups in the container to access your GPUs (try rocminfo and rocm-smi commands), and do check the environment variables if something doesn't work properly.

@alatecj
Copy link

alatecj commented Jan 15, 2024

These two comments should help:

* [7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880 (comment)](https://github.com/ROCm/ROCm/issues/1880#issuecomment-1548685112)

* [7900 XTX Refuses to Run tensorflow-rocm Toy Example #1880 (comment)](https://github.com/ROCm/ROCm/issues/1880#issuecomment-1548686209)

These steps might work (I don't have access to a machine for testing at the moment):

* `docker pull rocm/tensorflow:rocm6.0-tf2.14-dev`
  
  * https://hub.docker.com/r/rocm/tensorflow

* `docker run --name <any-name-you-want> -it --privileged --net host --group-add sudo rocm/tensorflow:rocm6.0-tf2.14-dev /bin/bash`
  
  * [Stable diffusion with RX7900XTX on ROCm5.7 composable_kernel#1032 (reply in thread)](https://github.com/ROCm/composable_kernel/discussions/1032#discussioncomment-7651690)

* In bash of the container
  
  * `git clone https://github.com/ROCmSoftwarePlatform/tensorflow-upstream.git`
  * `cd tensorflow-upstream`
  * `./build_rocm_python3`

You may need to be root or in video and render groups in the container to access your GPUs (try rocminfo and rocm-smi commands), and do check the environment variables if something doesn't work properly.

Unfortunately, the build fails on FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/llvm-17/bin/clang'

ERROR: /root/.cache/bazel/_bazel_root/be761df731c6b0cca47819a0a9713b70/external/com_google_protobuf/BUILD.bazel:364:11: Compiling src/google/protobuf/compiler/objectivec/objectivec_field.cc [for tool] failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target @com_google_protobuf//:protoc_lib) 
  (cd /root/.cache/bazel/_bazel_root/be761df731c6b0cca47819a0a9713b70/execroot/org_tensorflow && \
  exec env - \
    DOCKER_HOST_CACHEBUSTER=1702938961712741763 \
    PATH=/root/.cache/bazelisk/downloads/bazelbuild/bazel-6.1.0-linux-x86_64/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PWD=/proc/self/cwd \
  external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++14' -MD -MF bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf/_objs/protoc_lib/objectivec_field.d '-frandom-seed=bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf/_objs/protoc_lib/objectivec_field.o' '-DBAZEL_CURRENT_REPOSITORY="com_google_protobuf"' -iquote external/com_google_protobuf -iquote bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf -iquote external/zlib -iquote bazel-out/k8-opt-exec-50AE0418/bin/external/zlib -isystem external/com_google_protobuf/src -isystem bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf/src -isystem external/zlib -isystem bazel-out/k8-opt-exec-50AE0418/bin/external/zlib -g0 -w -Wno-sign-compare -g0 '-std=c++17' -DHAVE_ZLIB -Woverloaded-virtual -Wno-sign-compare -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' '-DTENSORFLOW_USE_ROCM=1' -D__HIP_PLATFORM_AMD__ -DEIGEN_USE_HIP -no-canonical-prefixes -fno-canonical-system-headers -c external/com_google_protobuf/src/google/protobuf/compiler/objectivec/objectivec_field.cc -o bazel-out/k8-opt-exec-50AE0418/bin/external/com_google_protobuf/_objs/protoc_lib/objectivec_field.o)
# Configuration: 908a43cbc08d862315c42f531704f207dd474f3f91dc667c1ba8b0ac2bb0e9e1
# Execution platform: @local_execution_config_platform//:platform
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/be761df731c6b0cca47819a0a9713b70/execroot/org_tensorflow/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 279, in <module>
    sys.exit(main())
  File "/root/.cache/bazel/_bazel_root/be761df731c6b0cca47819a0a9713b70/execroot/org_tensorflow/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc", line 276, in main
    return subprocess.call([CPU_COMPILER] + cpu_compiler_flags)
  File "/usr/lib/python3.9/subprocess.py", line 349, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.9/subprocess.py", line 1837, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/lib/llvm-17/bin/clang'

@vampireLibrarianMonk
Copy link

Review the rocm enhanced branches. The latest usually isn’t the best place to start.

@johnnynunez
Copy link

These two comments should help:

These steps might work (I don't have access to a machine for testing at the moment):

You may need to be root or in video and render groups in the container to access your GPUs (try rocminfo and rocm-smi commands), and do check the environment variables if something doesn't work properly.

I updated your repository, and I can compile pytorch and tensorflow with the latest versions.
My GPU is 7900 XT. https://github.com/evshiron/rocm_lab/pull/16/files

@vampireLibrarianMonk
Copy link

Is anyone gonna update these docs?

https://github.com/ROCm/tensorflow-upstream/tree/develop-upstream/rocm_docs

Seems pretty dated and if not for the last comment I would be lost.

@ppanchad-amd
Copy link

ppanchad-amd commented May 9, 2024

@Mushoz Has your issue been resolved? If so, please close the ticket. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests