Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD GPU support and optimisation #2995

Closed
ferdinandl007 opened this issue Apr 30, 2021 · 23 comments
Closed

AMD GPU support and optimisation #2995

ferdinandl007 opened this issue Apr 30, 2021 · 23 comments
Labels
enhancement New feature or request Stale

Comments

@ferdinandl007
Copy link
Contributor

🚀 Feature

@glenn-jocher I was wondering was there ever any intent of making this optimised to run on AMD server GPUs as well?
As they are significantly cheaper (10x) for people to train on and with Rocm and Hip getting pretty mature it might be something to consider.
And If not is there something you are considering if yes I might be able to contribute to it and also maybe get the AMD Rocm team on it as well if we have any significant technical hurdles.

@ferdinandl007 ferdinandl007 added the enhancement New feature or request label Apr 30, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Apr 30, 2021

👋 Hello @ferdinandl007, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@ferdinandl007 it seems there is some recent movement on AMD GPU support here:
https://pytorch.org/blog/pytorch-for-amd-rocm-platform-now-available-as-python-package/

And the PyTorch 'Getting Started' configurator now shows a new option for it as well:
https://pytorch.org/get-started/locally/

We haven't experimented with it ourselves though, all our machines are either CPU or CUDA currently.

@ferdinandl007
Copy link
Contributor Author

ferdinandl007 commented May 2, 2021

@glenn-jocher Yes, exactly; I'm aware of that. I had a couple of meetings with the AMD Rocm team about that to get the official support sort it out!
It had to be built from source before, which was slightly inconvenient made it difficult for people to start using AMD GPUs.
However it has to be noted it's also only compatible with server GPUs Such as Mi100, Mi60 and Mi50,
That means still no training on your MacBook GPUS :/ it's still worthwhile putting the project over I can see them becoming quite popular in the future especially in terms of the prices which they're going to be available to people which is far below Nvidias GPUs, and in terms of performance they pretty decent with the MI 100 almost identical in performance to a Nvidias A100 based on my testing.

In terms of changes which need to be done they are quite minor so wouldn't be drastic except if you have some custom Cuda kernels, which would need to be translated with hipfy.

I get sometime I might put a PR in for AMD support but probably it will be a docker image/file as it is still a bit fiddly getting the environment configured correctly 😂

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@MeatyAri
Copy link

Apparently you can run YOLO on AMD GPUs using onnxruntime_directml here's how:
I did this on yolov8 ON WINDOWS and I believe it should work with any other yolo version out there:

1- install the DirectML version of ONNX. It's crucial to choose ONNX DirectML over any other variants or versions. The Python package you need is aptly named "onnxruntime_directml". Feel free to use:

pip install onnxruntime_directml

2- render your YOLO model into the ONNX format.

from ultralytics import YOLO

model = YOLO('yolov8n.pt')
model.export(format='onnx')

3- Add the 'DmlExecutionProvider' string to the providers list: this is lines 133 to 140 in "venv\Lib\site-packages\ultralytics\nn\autobackend.py":

133        elif onnx:  # ONNX Runtime
134        LOGGER.info(f'Loading {w} for ONNX Runtime inference...')
135        # check_requirements(('onnx', 'onnxruntime-gpu' if cuda else 'onnxruntime'))
136        import onnxruntime
137        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['DmlExecutionProvider', 'CPUExecutionProvider']
138        session = onnxruntime.InferenceSession(w, providers=providers)
139        output_names = [x.name for x in session.get_outputs()]
140        metadata = session.get_modelmeta().custom_metadata_map  # metadata

✨ Comment check_requirements line 135 ✨ Add the 'DmlExecutionProvider' string to the providers list

4- enjoy the 100% boot in terms of the model performance

@glenn-jocher
Copy link
Member

@MeatyAri thank you for sharing your experience with running YOLO on AMD GPUs using ONNX Runtime with DirectML. This is indeed a workaround that can be used to leverage AMD GPUs for inference.

By installing the "onnxruntime_directml" package and rendering the YOLO model into the ONNX format, you can then modify the providers list in the "venv\Lib\site-packages\ultralytics\nn\autobackend.py" file to include 'DmlExecutionProvider'. This modification allows ONNX Runtime to utilize DirectML for inference on AMD GPUs.

This is a helpful contribution for users who are looking to utilize their AMD GPUs for YOLO inference. We appreciate you sharing this information with the community.

Please note that this workaround is specific to inference and does not address AMD GPU support for training in YOLOv5.

Thank you once again for your contribution!

@zumaster20
Copy link

@glenn-jocher so it is not possible to train model using this workaround?

@glenn-jocher
Copy link
Member

@zumaster20 That's correct, the provided workaround is specific to enabling inference using AMD GPUs through ONNX Runtime with DirectML. Training a YOLO model on AMD GPUs would require support and optimization at the framework level, which is not covered by this workaround. If you have any further questions or insights, feel free to share them!

@TimZhangTHS
Copy link

TimZhangTHS commented Jan 2, 2024

@glenn-jocher Hi, when I try this ultralytics keeps trying to download onnx 1.15 instead of running onnxruntime_directml, and it would not utilize my amd gpu do you know what a possible workaround would be?

@MeatyAri
Copy link

MeatyAri commented Jan 5, 2024

@TimZhangTHS I think you didn't comment the check_requirements line as indicated in my previous comment. So YOLO tries to install packages that you don't need in order to get this running.
I'm not sure what was the exact environment setup when I ran this on my GPU but I can send you the pip freeze results if your problem still exists.

@daniellizarazoo
Copy link

Wich AMD graphics do you have? Im actually using a Lenovo Ideapad with Ryzen 5 with AMD Radeon Graphics and made the steps that you did, but as yo see in the image, it's still running on cpu. Please share pip freeze
image. I guess that my integrated GPU does not have this support :(

@glenn-jocher
Copy link
Member

@daniellizarazoo It looks like you're encountering an issue where the environment is not correctly recognizing the onnxruntime_directml package. Make sure to comment out the check_requirements line as @MeatyAri suggested, to prevent the automatic download of the standard ONNX package.

@zumaster20 Integrated GPUs, like the one in your Ryzen 5, may not be supported by DirectML for this type of workload. DirectML is primarily aimed at discrete GPUs, and performance on integrated GPUs might not be optimal or supported. The pip freeze output from @MeatyAri could potentially help, but it's also important to verify that your GPU is supported by DirectML.

If you continue to face issues, please ensure that your system meets all the requirements for running onnxruntime_directml and that your GPU is compatible with DirectML. If the compatibility and requirements are met but the problem persists, it might be necessary to look into more detailed logs or error messages to diagnose the issue further.

@MeatyAri
Copy link

@daniellizarazoo here's the pip freeze:

certifi==2023.7.22
charset-normalizer==3.2.0
colorama==0.4.6
coloredlogs==15.0.1
comtypes==1.2.0
contourpy==1.1.1
cycler==0.11.0
dxcam==0.0.5
filelock==3.12.4
flatbuffers==23.5.26
fonttools==4.42.1
humanfriendly==10.0
idna==3.4
Jinja2==3.1.2
kiwisolver==1.4.5
MarkupSafe==2.1.3
matplotlib==3.8.0
mpmath==1.3.0
mss==9.0.1
networkx==3.1
numpy==1.26.0
onnxruntime-directml==1.16.0
opencv-python==4.8.0.76
packaging==23.1
pandas==2.1.1
Pillow==10.0.1
protobuf==4.24.3
psutil==5.9.5
py-cpuinfo==9.0.0
PyDirectInput==1.0.4
pynput==1.7.6
pyparsing==3.1.1
pyreadline3==3.4.1
python-dateutil==2.8.2
pytz==2023.3.post1
pywin32==306
PyYAML==6.0.1
requests==2.31.0
scipy==1.11.2
seaborn==0.12.2
six==1.16.0
sympy==1.12
torch==2.0.1
torchaudio==2.0.2
torchvision==0.15.2
tqdm==4.66.1
typing_extensions==4.8.0
tzdata==2023.3
ultralytics==8.0.184
urllib3==2.0.5

Hope it helps.

@daniellizarazoo
Copy link

To optimize the performance with an integrated GPU , I converted the model to an OpenVino Model Half quantizied, and it worked really well for me

@glenn-jocher
Copy link
Member

@daniellizarazoo That's a great approach! Converting the model to an OpenVino format and using half-precision (FP16) quantization can indeed provide a significant performance boost, especially on integrated GPUs that may not be as powerful as discrete GPUs. It's good to hear that this method worked well for you. Your experience could be valuable for others in the community with similar hardware looking to optimize YOLO model performance. Keep experimenting and sharing your findings! 🚀

@dclemon
Copy link

dclemon commented Apr 10, 2024

Apparently you can run YOLO on AMD GPUs using onnxruntime_directml here's how: I did this on yolov8 ON WINDOWS and I believe it should work with any other yolo version out there:

1- install the DirectML version of ONNX. It's crucial to choose ONNX DirectML over any other variants or versions. The Python package you need is aptly named "onnxruntime_directml". Feel free to use:

pip install onnxruntime_directml

2- render your YOLO model into the ONNX format.

from ultralytics import YOLO

model = YOLO('yolov8n.pt')
model.export(format='onnx')

3- Add the 'DmlExecutionProvider' string to the providers list: this is lines 133 to 140 in "venv\Lib\site-packages\ultralytics\nn\autobackend.py":

133        elif onnx:  # ONNX Runtime
134        LOGGER.info(f'Loading {w} for ONNX Runtime inference...')
135        # check_requirements(('onnx', 'onnxruntime-gpu' if cuda else 'onnxruntime'))
136        import onnxruntime
137        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['DmlExecutionProvider', 'CPUExecutionProvider']
138        session = onnxruntime.InferenceSession(w, providers=providers)
139        output_names = [x.name for x in session.get_outputs()]
140        metadata = session.get_modelmeta().custom_metadata_map  # metadata

✨ Comment check_requirements line 135 ✨ Add the 'DmlExecutionProvider' string to the providers list

4- enjoy the 100% boot in terms of the model performance

But how to select AMD device in yolo? I used 'device='0'' and it does not work.

@glenn-jocher
Copy link
Member

@dclemon it sounds like you're making good progress on running YOLO on AMD GPUs using onnxruntime_directml! To select an AMD device, you won't specify the device in the same manner as CUDA with 'device=0'. When using onnxruntime_directml, device selection is handled internally by the DirectML execution provider based on the available hardware and drivers installed on your system.

So, when you add 'DmlExecutionProvider' to the providers list, it automatically selects the AMD GPU if it's available and supported by DirectML. There's no need for explicit device selection like you would with CUDA devices.

Make sure the rest of your setup is correctly configured, and ensure your AMD GPU drivers are up to date to support DirectML. If you've correctly modified autobackend.py and installed onnxruntime_directml, your model should run on the AMD GPU. Keep an eye on your GPU utilization during inference to confirm it's being used. Happy coding! 🚀

@dclemon
Copy link

dclemon commented Apr 12, 2024

@dclemon it sounds like you're making good progress on running YOLO on AMD GPUs using onnxruntime_directml! To select an AMD device, you won't specify the device in the same manner as CUDA with 'device=0'. When using onnxruntime_directml, device selection is handled internally by the DirectML execution provider based on the available hardware and drivers installed on your system.

So, when you add 'DmlExecutionProvider' to the providers list, it automatically selects the AMD GPU if it's available and supported by DirectML. There's no need for explicit device selection like you would with CUDA devices.

Make sure the rest of your setup is correctly configured, and ensure your AMD GPU drivers are up to date to support DirectML. If you've correctly modified autobackend.py and installed onnxruntime_directml, your model should run on the AMD GPU. Keep an eye on your GPU utilization during inference to confirm it's being used. Happy coding! 🚀

Thank you for your reply, I used 'torch.from_numpy(im).to(device)' in my code and it could not detcet AMD GPU. But I have fixed this problem now. I installed torch_directml and changed the 'select_device' function in line 133:

`

if dml and torch_directml.is_available():

    print('use dml')

    devices=torch_directml.device(0)

    n=0

    s+=r"dml:"+str(torch_directml.device_name(0))

    arg=torch_directml.device(0)`

With this you can load a pt model with AMD device.

@glenn-jocher
Copy link
Member

Great to hear you've resolved the issue, @dclemon! 🌟 Utilizing torch_directml is indeed a clever way to explicitly manage device selection for AMD GPUs in conjunction with YOLO and DirectML. This approach allows for more control and ensures that your models run on the desired hardware. Thank you for sharing your solution with the community; your findings are valuable and may help others in similar situations. Happy coding, and keep up the fantastic work! 😊

@MeatyAri
Copy link

Thanks for sharing your work @dclemon your approach in the second comment is slightly different than mine. I've used the directml platform built into the onnxruntime but you're using torch_directml which is recently added to pytorch. Your approach has some great advantages like being able to use the pytorch functions to modify tensors on the AMD GPU however there are some major disadvantages such as being slower, onnx inference seems to be running much faster than the raw pytorch models most of the time,
so in conclusion, by combining these two you'll get the benefits of pytorch running on gpu and the fast onnx inference to run your yolo model even faster!

Make sure that your environment is configured properly because if you have previously installed the cpu version of onnxruntime you cannot simply uninstall it and install other versions of onnxruntime like the directml one. You should either delete your venv completely and install the packages again or use pip-autoremove to remove both the previous install of onnxruntime and its dependencies. here is how to do it:

# install pip-autoremove
pip install pip-autoremove
# remove "somepackage" plus its dependencies:
pip-autoremove somepackage -y

@zzh123dfds
Copy link

@MeatyAri Hi, I followed your steps and my AMD GPU is booted. But it's only 50% up, and my CPU usage is still high(nearly70%)

@zzh123dfds
Copy link

@dclemon 请问你这个方法是在yolov8上使用的吗?我按照这个方法操作他会报错 RuntimeError: Cannot set version_counter for inference tensor

@MeatyAri
Copy link

Hi @zzh123dfds
It's likely that the issue isn't with the inference itself, but rather with your code and resource management. If you're performing any preprocessing before feeding the data into your model, please double-check those steps. Preprocessing is often where the CPU experiences the most strain during execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

8 participants