Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable algo caching in ROCM EP #19567

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dmnieto
Copy link

@dmnieto dmnieto commented Feb 19, 2024

Description
Similar to the work done by Liangxijun-1001 in
apache/tvm#16178 the ROCM spec mandates calling miopenFindConvolution*Algorithm() before using any Convolution API

This is the link to the porting guide describing this requirement https://rocmdocs.amd.com/projects/MIOpen/en/latest/MIOpen_Porting_Guide.html

Thus, this change disables the algo cache and enforces the official API semantics

Motivation and Context
Without this change onnxruntime would fail subsequent calls to convolution APIs with errors like
'MIOpen Error: No invoker was registered for convolution forward.' It was reproduced, for example,
while porting immich to use ROCM acceleration

fixes this: #19566

Similar to the work done by Liangxijun-1001 in
apache/tvm#16178 the ROCM spec mandates calling
miopenFindConvolution*Algorithm() before using any Convolution API

This is the link to the porting guide describing this requirement
https://rocmdocs.amd.com/projects/MIOpen/en/latest/MIOpen_Porting_Guide.html

Thus, this change disables the algo cache and enforces the official
API semantics

Signed-off-by: David Nieto <dmnieto@gmail.com>
@tianleiwu
Copy link
Contributor

My intuition is that algo caching can be used in ROCM.

For example, PyTorch also cache algos:
https://github.com/pytorch/pytorch/blob/26fbbc3e844c8267aa8cf78f152c9612da40dc89/aten/src/ATen/native/miopen/Conv_miopen.cpp#L579-L585

@dmnieto
Copy link
Author

dmnieto commented Feb 29, 2024

My intuition is that algo caching can be used in ROCM.

For example, PyTorch also cache algos: https://github.com/pytorch/pytorch/blob/26fbbc3e844c8267aa8cf78f152c9612da40dc89/aten/src/ATen/native/miopen/Conv_miopen.cpp#L579-L585

It does look like the cache in pytorch checks quite a few parameters https://github.com/pytorch/pytorch/blob/26fbbc3e844c8267aa8cf78f152c9612da40dc89/aten/src/ATen/native/miopen/Conv_miopen.cpp#L172 among those the mihandle. I am going to instrument a bit to see if there is a corner case I have not considered

@Zelnes
Copy link

Zelnes commented Jul 10, 2024

I stumbled into the same issue, while trying to use onnxruntime with Immich too.
See my comment here, with some error logs.

I had this issue only while running ML jobs in parallel (I describe everything in the link above), but worked fine in single threaded execution.

However, I've managed to get something to work by applying the content of this PR with a small patch needed after a rebase on the main branch.

diff --git a/onnxruntime/core/providers/rocm/nn/conv_transpose.cc b/onnxruntime/core/providers/rocm/nn/conv_transpose.cc
index 45ed4c8ac..79a11c0b1 100644
--- a/onnxruntime/core/providers/rocm/nn/conv_transpose.cc
+++ b/onnxruntime/core/providers/rocm/nn/conv_transpose.cc
@@ -76,7 +76,6 @@ Status ConvTranspose<T, NHWC>::DoConvTranspose(OpKernelContext* context, bool dy

       if (w_dims_changed) {
         s_.last_w_dims = gsl::make_span(w_dims);
-        s_.cached_benchmark_bwd_results.clear();
       }

       ConvTransposeAttributes::Prepare p;

I don't have any skills on all this, I just wanted to make use of my AMD RX 6400 for ML

@tianleiwu
Copy link
Contributor

tianleiwu commented Jul 10, 2024

@Zelnes, I can understand that the cache is not thread safe. Basically, when w changed, it will clear cache but another thread might still use the cache.

A quick fix is to add a mutex to guard the cache like this

More complete solution is to add more parameters (like weight shape and conv_desc etc like pytorch) to the key of hash table so that we will never need clear the cache.

Current implementation has assumption that those non-shape parameters does not change. It is not always True, which will lead to `MIOpen Error: No invoker was registered for convolution forward.'.

Cache is needed for good performance (finding algo takes time so it does not make sense to do it every time).

@PeixuanZuo for awareness the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants