AMD GPU support and optimisation #2995

ferdinandl007 · 2021-04-30T17:50:12Z

🚀 Feature

@glenn-jocher I was wondering was there ever any intent of making this optimised to run on AMD server GPUs as well?
As they are significantly cheaper (10x) for people to train on and with Rocm and Hip getting pretty mature it might be something to consider.
And If not is there something you are considering if yes I might be able to contribute to it and also maybe get the AMD Rocm team on it as well if we have any significant technical hurdles.

github-actions · 2021-04-30T17:50:48Z

👋 Hello @ferdinandl007, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab and Kaggle notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher · 2021-05-02T18:42:33Z

@ferdinandl007 it seems there is some recent movement on AMD GPU support here:
https://pytorch.org/blog/pytorch-for-amd-rocm-platform-now-available-as-python-package/

And the PyTorch 'Getting Started' configurator now shows a new option for it as well:
https://pytorch.org/get-started/locally/

We haven't experimented with it ourselves though, all our machines are either CPU or CUDA currently.

ferdinandl007 · 2021-05-02T19:28:38Z

@glenn-jocher Yes, exactly; I'm aware of that. I had a couple of meetings with the AMD Rocm team about that to get the official support sort it out!
It had to be built from source before, which was slightly inconvenient made it difficult for people to start using AMD GPUs.
However it has to be noted it's also only compatible with server GPUs Such as Mi100, Mi60 and Mi50,
That means still no training on your MacBook GPUS :/ it's still worthwhile putting the project over I can see them becoming quite popular in the future especially in terms of the prices which they're going to be available to people which is far below Nvidias GPUs, and in terms of performance they pretty decent with the MI 100 almost identical in performance to a Nvidias A100 based on my testing.

In terms of changes which need to be done they are quite minor so wouldn't be drastic except if you have some custom Cuda kernels, which would need to be translated with hipfy.

I get sometime I might put a PR in for AMD support but probably it will be a docker image/file as it is still a bit fiddly getting the environment configured correctly 😂

github-actions · 2021-06-04T20:41:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

MeatyAri · 2023-09-27T17:49:06Z

Apparently you can run YOLO on AMD GPUs using onnxruntime_directml here's how:
I did this on yolov8 ON WINDOWS and I believe it should work with any other yolo version out there:

1- install the DirectML version of ONNX. It's crucial to choose ONNX DirectML over any other variants or versions. The Python package you need is aptly named "onnxruntime_directml". Feel free to use:

pip install onnxruntime_directml

2- render your YOLO model into the ONNX format.

from ultralytics import YOLO

model = YOLO('yolov8n.pt')
model.export(format='onnx')

3- Add the 'DmlExecutionProvider' string to the providers list: this is lines 133 to 140 in "venv\Lib\site-packages\ultralytics\nn\autobackend.py":

133        elif onnx:  # ONNX Runtime
134        LOGGER.info(f'Loading {w} for ONNX Runtime inference...')
135        # check_requirements(('onnx', 'onnxruntime-gpu' if cuda else 'onnxruntime'))
136        import onnxruntime
137        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['DmlExecutionProvider', 'CPUExecutionProvider']
138        session = onnxruntime.InferenceSession(w, providers=providers)
139        output_names = [x.name for x in session.get_outputs()]
140        metadata = session.get_modelmeta().custom_metadata_map  # metadata

✨ Comment check_requirements line 135 ✨ Add the 'DmlExecutionProvider' string to the providers list

4- enjoy the 100% boot in terms of the model performance

glenn-jocher · 2023-09-27T18:32:08Z

@MeatyAri thank you for sharing your experience with running YOLO on AMD GPUs using ONNX Runtime with DirectML. This is indeed a workaround that can be used to leverage AMD GPUs for inference.

By installing the "onnxruntime_directml" package and rendering the YOLO model into the ONNX format, you can then modify the providers list in the "venv\Lib\site-packages\ultralytics\nn\autobackend.py" file to include 'DmlExecutionProvider'. This modification allows ONNX Runtime to utilize DirectML for inference on AMD GPUs.

This is a helpful contribution for users who are looking to utilize their AMD GPUs for YOLO inference. We appreciate you sharing this information with the community.

Please note that this workaround is specific to inference and does not address AMD GPU support for training in YOLOv5.

Thank you once again for your contribution!

zumaster20 · 2023-12-04T21:16:35Z

@glenn-jocher so it is not possible to train model using this workaround?

glenn-jocher · 2023-12-04T23:43:03Z

@zumaster20 That's correct, the provided workaround is specific to enabling inference using AMD GPUs through ONNX Runtime with DirectML. Training a YOLO model on AMD GPUs would require support and optimization at the framework level, which is not covered by this workaround. If you have any further questions or insights, feel free to share them!

TimZhangTHS · 2024-01-02T09:07:08Z

@glenn-jocher Hi, when I try this ultralytics keeps trying to download onnx 1.15 instead of running onnxruntime_directml, and it would not utilize my amd gpu do you know what a possible workaround would be?

MeatyAri · 2024-01-05T15:22:47Z

@TimZhangTHS I think you didn't comment the check_requirements line as indicated in my previous comment. So YOLO tries to install packages that you don't need in order to get this running.
I'm not sure what was the exact environment setup when I ran this on my GPU but I can send you the pip freeze results if your problem still exists.

daniellizarazoo · 2024-01-17T20:28:30Z

Wich AMD graphics do you have? Im actually using a Lenovo Ideapad with Ryzen 5 with AMD Radeon Graphics and made the steps that you did, but as yo see in the image, it's still running on cpu. Please share pip freeze
. I guess that my integrated GPU does not have this support :(

glenn-jocher · 2024-01-18T05:15:20Z

@daniellizarazoo It looks like you're encountering an issue where the environment is not correctly recognizing the onnxruntime_directml package. Make sure to comment out the check_requirements line as @MeatyAri suggested, to prevent the automatic download of the standard ONNX package.

@zumaster20 Integrated GPUs, like the one in your Ryzen 5, may not be supported by DirectML for this type of workload. DirectML is primarily aimed at discrete GPUs, and performance on integrated GPUs might not be optimal or supported. The pip freeze output from @MeatyAri could potentially help, but it's also important to verify that your GPU is supported by DirectML.

If you continue to face issues, please ensure that your system meets all the requirements for running onnxruntime_directml and that your GPU is compatible with DirectML. If the compatibility and requirements are met but the problem persists, it might be necessary to look into more detailed logs or error messages to diagnose the issue further.

MeatyAri · 2024-01-24T17:02:47Z

@daniellizarazoo here's the pip freeze:

certifi==2023.7.22
charset-normalizer==3.2.0
colorama==0.4.6
coloredlogs==15.0.1
comtypes==1.2.0
contourpy==1.1.1
cycler==0.11.0
dxcam==0.0.5
filelock==3.12.4
flatbuffers==23.5.26
fonttools==4.42.1
humanfriendly==10.0
idna==3.4
Jinja2==3.1.2
kiwisolver==1.4.5
MarkupSafe==2.1.3
matplotlib==3.8.0
mpmath==1.3.0
mss==9.0.1
networkx==3.1
numpy==1.26.0
onnxruntime-directml==1.16.0
opencv-python==4.8.0.76
packaging==23.1
pandas==2.1.1
Pillow==10.0.1
protobuf==4.24.3
psutil==5.9.5
py-cpuinfo==9.0.0
PyDirectInput==1.0.4
pynput==1.7.6
pyparsing==3.1.1
pyreadline3==3.4.1
python-dateutil==2.8.2
pytz==2023.3.post1
pywin32==306
PyYAML==6.0.1
requests==2.31.0
scipy==1.11.2
seaborn==0.12.2
six==1.16.0
sympy==1.12
torch==2.0.1
torchaudio==2.0.2
torchvision==0.15.2
tqdm==4.66.1
typing_extensions==4.8.0
tzdata==2023.3
ultralytics==8.0.184
urllib3==2.0.5

Hope it helps.

daniellizarazoo · 2024-02-04T23:37:33Z

To optimize the performance with an integrated GPU , I converted the model to an OpenVino Model Half quantizied, and it worked really well for me

glenn-jocher · 2024-02-05T05:42:31Z

@daniellizarazoo That's a great approach! Converting the model to an OpenVino format and using half-precision (FP16) quantization can indeed provide a significant performance boost, especially on integrated GPUs that may not be as powerful as discrete GPUs. It's good to hear that this method worked well for you. Your experience could be valuable for others in the community with similar hardware looking to optimize YOLO model performance. Keep experimenting and sharing your findings! 🚀

dclemon · 2024-04-10T14:07:52Z

Apparently you can run YOLO on AMD GPUs using onnxruntime_directml here's how: I did this on yolov8 ON WINDOWS and I believe it should work with any other yolo version out there:

1- install the DirectML version of ONNX. It's crucial to choose ONNX DirectML over any other variants or versions. The Python package you need is aptly named "onnxruntime_directml". Feel free to use:

pip install onnxruntime_directml

2- render your YOLO model into the ONNX format.
from ultralytics import YOLO

model = YOLO('yolov8n.pt')
model.export(format='onnx')
3- Add the 'DmlExecutionProvider' string to the providers list: this is lines 133 to 140 in "venv\Lib\site-packages\ultralytics\nn\autobackend.py":
133        elif onnx:  # ONNX Runtime
134        LOGGER.info(f'Loading {w} for ONNX Runtime inference...')
135        # check_requirements(('onnx', 'onnxruntime-gpu' if cuda else 'onnxruntime'))
136        import onnxruntime
137        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['DmlExecutionProvider', 'CPUExecutionProvider']
138        session = onnxruntime.InferenceSession(w, providers=providers)
139        output_names = [x.name for x in session.get_outputs()]
140        metadata = session.get_modelmeta().custom_metadata_map  # metadata
✨ Comment check_requirements line 135 ✨ Add the 'DmlExecutionProvider' string to the providers list

4- enjoy the 100% boot in terms of the model performance

But how to select AMD device in yolo? I used 'device='0'' and it does not work.

glenn-jocher · 2024-04-10T20:52:46Z

@dclemon it sounds like you're making good progress on running YOLO on AMD GPUs using onnxruntime_directml! To select an AMD device, you won't specify the device in the same manner as CUDA with 'device=0'. When using onnxruntime_directml, device selection is handled internally by the DirectML execution provider based on the available hardware and drivers installed on your system.

So, when you add 'DmlExecutionProvider' to the providers list, it automatically selects the AMD GPU if it's available and supported by DirectML. There's no need for explicit device selection like you would with CUDA devices.

Make sure the rest of your setup is correctly configured, and ensure your AMD GPU drivers are up to date to support DirectML. If you've correctly modified autobackend.py and installed onnxruntime_directml, your model should run on the AMD GPU. Keep an eye on your GPU utilization during inference to confirm it's being used. Happy coding! 🚀

dclemon · 2024-04-12T10:47:37Z

@dclemon it sounds like you're making good progress on running YOLO on AMD GPUs using onnxruntime_directml! To select an AMD device, you won't specify the device in the same manner as CUDA with 'device=0'. When using onnxruntime_directml, device selection is handled internally by the DirectML execution provider based on the available hardware and drivers installed on your system.

So, when you add 'DmlExecutionProvider' to the providers list, it automatically selects the AMD GPU if it's available and supported by DirectML. There's no need for explicit device selection like you would with CUDA devices.

Make sure the rest of your setup is correctly configured, and ensure your AMD GPU drivers are up to date to support DirectML. If you've correctly modified autobackend.py and installed onnxruntime_directml, your model should run on the AMD GPU. Keep an eye on your GPU utilization during inference to confirm it's being used. Happy coding! 🚀

Thank you for your reply, I used 'torch.from_numpy(im).to(device)' in my code and it could not detcet AMD GPU. But I have fixed this problem now. I installed torch_directml and changed the 'select_device' function in line 133:

`

if dml and torch_directml.is_available():

    print('use dml')

    devices=torch_directml.device(0)

    n=0

    s+=r"dml:"+str(torch_directml.device_name(0))

    arg=torch_directml.device(0)`

With this you can load a pt model with AMD device.

glenn-jocher · 2024-04-12T14:24:25Z

Great to hear you've resolved the issue, @dclemon! 🌟 Utilizing torch_directml is indeed a clever way to explicitly manage device selection for AMD GPUs in conjunction with YOLO and DirectML. This approach allows for more control and ensures that your models run on the desired hardware. Thank you for sharing your solution with the community; your findings are valuable and may help others in similar situations. Happy coding, and keep up the fantastic work! 😊

MeatyAri · 2024-04-12T15:02:32Z

Thanks for sharing your work @dclemon your approach in the second comment is slightly different than mine. I've used the directml platform built into the onnxruntime but you're using torch_directml which is recently added to pytorch. Your approach has some great advantages like being able to use the pytorch functions to modify tensors on the AMD GPU however there are some major disadvantages such as being slower, onnx inference seems to be running much faster than the raw pytorch models most of the time,
so in conclusion, by combining these two you'll get the benefits of pytorch running on gpu and the fast onnx inference to run your yolo model even faster!

Make sure that your environment is configured properly because if you have previously installed the cpu version of onnxruntime you cannot simply uninstall it and install other versions of onnxruntime like the directml one. You should either delete your venv completely and install the packages again or use pip-autoremove to remove both the previous install of onnxruntime and its dependencies. here is how to do it:

# install pip-autoremove
pip install pip-autoremove
# remove "somepackage" plus its dependencies:
pip-autoremove somepackage -y

zzh123dfds · 2024-05-21T15:41:11Z

@MeatyAri Hi, I followed your steps and my AMD GPU is booted. But it's only 50% up, and my CPU usage is still high（nearly70%）

zzh123dfds · 2024-05-21T16:12:48Z

@dclemon 请问你这个方法是在yolov8上使用的吗？我按照这个方法操作他会报错 RuntimeError: Cannot set version_counter for inference tensor

MeatyAri · 2024-05-27T15:06:14Z

Hi @zzh123dfds
It's likely that the issue isn't with the inference itself, but rather with your code and resource management. If you're performing any preprocessing before feeding the data into your model, please double-check those steps. Preprocessing is often where the CPU experiences the most strain during execution.

ferdinandl007 added the enhancement New feature or request label Apr 30, 2021

github-actions bot added the Stale label Jun 4, 2021

github-actions bot closed this as completed Jun 10, 2021

MeatyAri mentioned this issue Sep 27, 2023

How to use Yolov5 in AMD GPU #6735

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD GPU support and optimisation #2995

AMD GPU support and optimisation #2995

ferdinandl007 commented Apr 30, 2021

github-actions bot commented Apr 30, 2021 •

edited by glenn-jocher

Loading

glenn-jocher commented May 2, 2021

ferdinandl007 commented May 2, 2021 •

edited

Loading

github-actions bot commented Jun 4, 2021

MeatyAri commented Sep 27, 2023

glenn-jocher commented Sep 27, 2023

zumaster20 commented Dec 4, 2023

glenn-jocher commented Dec 4, 2023

TimZhangTHS commented Jan 2, 2024 •

edited

Loading

MeatyAri commented Jan 5, 2024

daniellizarazoo commented Jan 17, 2024

glenn-jocher commented Jan 18, 2024

MeatyAri commented Jan 24, 2024

daniellizarazoo commented Feb 4, 2024

glenn-jocher commented Feb 5, 2024

dclemon commented Apr 10, 2024

glenn-jocher commented Apr 10, 2024

dclemon commented Apr 12, 2024 •

edited

Loading

glenn-jocher commented Apr 12, 2024

MeatyAri commented Apr 12, 2024

zzh123dfds commented May 21, 2024

zzh123dfds commented May 21, 2024

MeatyAri commented May 27, 2024

AMD GPU support and optimisation #2995

AMD GPU support and optimisation #2995

Comments

ferdinandl007 commented Apr 30, 2021

🚀 Feature

github-actions bot commented Apr 30, 2021 • edited by glenn-jocher Loading

Requirements

Environments

Status

glenn-jocher commented May 2, 2021

ferdinandl007 commented May 2, 2021 • edited Loading

github-actions bot commented Jun 4, 2021

MeatyAri commented Sep 27, 2023

glenn-jocher commented Sep 27, 2023

zumaster20 commented Dec 4, 2023

glenn-jocher commented Dec 4, 2023

TimZhangTHS commented Jan 2, 2024 • edited Loading

MeatyAri commented Jan 5, 2024

daniellizarazoo commented Jan 17, 2024

glenn-jocher commented Jan 18, 2024

MeatyAri commented Jan 24, 2024

daniellizarazoo commented Feb 4, 2024

glenn-jocher commented Feb 5, 2024

dclemon commented Apr 10, 2024

glenn-jocher commented Apr 10, 2024

dclemon commented Apr 12, 2024 • edited Loading

glenn-jocher commented Apr 12, 2024

MeatyAri commented Apr 12, 2024

zzh123dfds commented May 21, 2024

zzh123dfds commented May 21, 2024

MeatyAri commented May 27, 2024

github-actions bot commented Apr 30, 2021 •

edited by glenn-jocher

Loading

ferdinandl007 commented May 2, 2021 •

edited

Loading

TimZhangTHS commented Jan 2, 2024 •

edited

Loading

dclemon commented Apr 12, 2024 •

edited

Loading