-
-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD GPU support and optimisation #2995
Comments
👋 Hello @ferdinandl007, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution. If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available. For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com. RequirementsPython 3.8 or later with all requirements.txt dependencies installed, including $ pip install -r requirements.txt EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
@ferdinandl007 it seems there is some recent movement on AMD GPU support here: And the PyTorch 'Getting Started' configurator now shows a new option for it as well: We haven't experimented with it ourselves though, all our machines are either CPU or CUDA currently. |
@glenn-jocher Yes, exactly; I'm aware of that. I had a couple of meetings with the AMD Rocm team about that to get the official support sort it out! In terms of changes which need to be done they are quite minor so wouldn't be drastic except if you have some custom Cuda kernels, which would need to be translated with hipfy. I get sometime I might put a PR in for AMD support but probably it will be a docker image/file as it is still a bit fiddly getting the environment configured correctly 😂 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Apparently you can run YOLO on AMD GPUs using onnxruntime_directml here's how: 1- install the DirectML version of ONNX. It's crucial to choose ONNX DirectML over any other variants or versions. The Python package you need is aptly named "onnxruntime_directml". Feel free to use:
2- render your YOLO model into the ONNX format.
3- Add the 'DmlExecutionProvider' string to the providers list: this is lines 133 to 140 in "venv\Lib\site-packages\ultralytics\nn\autobackend.py":
✨ Comment check_requirements line 135 ✨ Add the 'DmlExecutionProvider' string to the providers list 4- enjoy the 100% boot in terms of the model performance |
@MeatyAri thank you for sharing your experience with running YOLO on AMD GPUs using ONNX Runtime with DirectML. This is indeed a workaround that can be used to leverage AMD GPUs for inference. By installing the "onnxruntime_directml" package and rendering the YOLO model into the ONNX format, you can then modify the providers list in the "venv\Lib\site-packages\ultralytics\nn\autobackend.py" file to include 'DmlExecutionProvider'. This modification allows ONNX Runtime to utilize DirectML for inference on AMD GPUs. This is a helpful contribution for users who are looking to utilize their AMD GPUs for YOLO inference. We appreciate you sharing this information with the community. Please note that this workaround is specific to inference and does not address AMD GPU support for training in YOLOv5. Thank you once again for your contribution! |
@glenn-jocher so it is not possible to train model using this workaround? |
@zumaster20 That's correct, the provided workaround is specific to enabling inference using AMD GPUs through ONNX Runtime with DirectML. Training a YOLO model on AMD GPUs would require support and optimization at the framework level, which is not covered by this workaround. If you have any further questions or insights, feel free to share them! |
@glenn-jocher Hi, when I try this ultralytics keeps trying to download onnx 1.15 instead of running onnxruntime_directml, and it would not utilize my amd gpu do you know what a possible workaround would be? |
@TimZhangTHS I think you didn't comment the check_requirements line as indicated in my previous comment. So YOLO tries to install packages that you don't need in order to get this running. |
@daniellizarazoo It looks like you're encountering an issue where the environment is not correctly recognizing the @zumaster20 Integrated GPUs, like the one in your Ryzen 5, may not be supported by DirectML for this type of workload. DirectML is primarily aimed at discrete GPUs, and performance on integrated GPUs might not be optimal or supported. The pip freeze output from @MeatyAri could potentially help, but it's also important to verify that your GPU is supported by DirectML. If you continue to face issues, please ensure that your system meets all the requirements for running |
@daniellizarazoo here's the pip freeze:
Hope it helps. |
To optimize the performance with an integrated GPU , I converted the model to an OpenVino Model Half quantizied, and it worked really well for me |
@daniellizarazoo That's a great approach! Converting the model to an OpenVino format and using half-precision (FP16) quantization can indeed provide a significant performance boost, especially on integrated GPUs that may not be as powerful as discrete GPUs. It's good to hear that this method worked well for you. Your experience could be valuable for others in the community with similar hardware looking to optimize YOLO model performance. Keep experimenting and sharing your findings! 🚀 |
But how to select AMD device in yolo? I used 'device='0'' and it does not work. |
@dclemon it sounds like you're making good progress on running YOLO on AMD GPUs using So, when you add Make sure the rest of your setup is correctly configured, and ensure your AMD GPU drivers are up to date to support DirectML. If you've correctly modified |
Thank you for your reply, I used 'torch.from_numpy(im).to(device)' in my code and it could not detcet AMD GPU. But I have fixed this problem now. I installed torch_directml and changed the 'select_device' function in line 133: ` if dml and torch_directml.is_available():
With this you can load a pt model with AMD device. |
Great to hear you've resolved the issue, @dclemon! 🌟 Utilizing |
Thanks for sharing your work @dclemon your approach in the second comment is slightly different than mine. I've used the directml platform built into the onnxruntime but you're using torch_directml which is recently added to pytorch. Your approach has some great advantages like being able to use the pytorch functions to modify tensors on the AMD GPU however there are some major disadvantages such as being slower, onnx inference seems to be running much faster than the raw pytorch models most of the time, Make sure that your environment is configured properly because if you have previously installed the cpu version of onnxruntime you cannot simply uninstall it and install other versions of onnxruntime like the directml one. You should either delete your venv completely and install the packages again or use pip-autoremove to remove both the previous install of onnxruntime and its dependencies. here is how to do it: # install pip-autoremove
pip install pip-autoremove
# remove "somepackage" plus its dependencies:
pip-autoremove somepackage -y |
@MeatyAri Hi, I followed your steps and my AMD GPU is booted. But it's only 50% up, and my CPU usage is still high(nearly70%) |
@dclemon 请问你这个方法是在yolov8上使用的吗?我按照这个方法操作他会报错 RuntimeError: Cannot set version_counter for inference tensor |
Hi @zzh123dfds |
🚀 Feature
@glenn-jocher I was wondering was there ever any intent of making this optimised to run on AMD server GPUs as well?
As they are significantly cheaper (10x) for people to train on and with Rocm and Hip getting pretty mature it might be something to consider.
And If not is there something you are considering if yes I might be able to contribute to it and also maybe get the AMD Rocm team on it as well if we have any significant technical hurdles.
The text was updated successfully, but these errors were encountered: