Disable algo caching in ROCM EP #19567

dmnieto · 2024-02-19T19:33:29Z

Description
Similar to the work done by Liangxijun-1001 in
apache/tvm#16178 the ROCM spec mandates calling miopenFindConvolution*Algorithm() before using any Convolution API

This is the link to the porting guide describing this requirement https://rocmdocs.amd.com/projects/MIOpen/en/latest/MIOpen_Porting_Guide.html

Thus, this change disables the algo cache and enforces the official API semantics

Motivation and Context
Without this change onnxruntime would fail subsequent calls to convolution APIs with errors like
'MIOpen Error: No invoker was registered for convolution forward.' It was reproduced, for example,
while porting immich to use ROCM acceleration

fixes this: #19566

Similar to the work done by Liangxijun-1001 in apache/tvm#16178 the ROCM spec mandates calling miopenFindConvolution*Algorithm() before using any Convolution API This is the link to the porting guide describing this requirement https://rocmdocs.amd.com/projects/MIOpen/en/latest/MIOpen_Porting_Guide.html Thus, this change disables the algo cache and enforces the official API semantics Signed-off-by: David Nieto <dmnieto@gmail.com>

tianleiwu · 2024-02-20T15:48:55Z

My intuition is that algo caching can be used in ROCM.

For example, PyTorch also cache algos:
https://github.com/pytorch/pytorch/blob/26fbbc3e844c8267aa8cf78f152c9612da40dc89/aten/src/ATen/native/miopen/Conv_miopen.cpp#L579-L585

dmnieto · 2024-02-29T02:04:02Z

My intuition is that algo caching can be used in ROCM.

For example, PyTorch also cache algos: https://github.com/pytorch/pytorch/blob/26fbbc3e844c8267aa8cf78f152c9612da40dc89/aten/src/ATen/native/miopen/Conv_miopen.cpp#L579-L585

It does look like the cache in pytorch checks quite a few parameters https://github.com/pytorch/pytorch/blob/26fbbc3e844c8267aa8cf78f152c9612da40dc89/aten/src/ATen/native/miopen/Conv_miopen.cpp#L172 among those the mihandle. I am going to instrument a bit to see if there is a corner case I have not considered

Zelnes · 2024-07-10T15:11:25Z

I stumbled into the same issue, while trying to use onnxruntime with Immich too.
See my comment here, with some error logs.

I had this issue only while running ML jobs in parallel (I describe everything in the link above), but worked fine in single threaded execution.

However, I've managed to get something to work by applying the content of this PR with a small patch needed after a rebase on the main branch.

diff --git a/onnxruntime/core/providers/rocm/nn/conv_transpose.cc b/onnxruntime/core/providers/rocm/nn/conv_transpose.cc
index 45ed4c8ac..79a11c0b1 100644
--- a/onnxruntime/core/providers/rocm/nn/conv_transpose.cc
+++ b/onnxruntime/core/providers/rocm/nn/conv_transpose.cc
@@ -76,7 +76,6 @@ Status ConvTranspose<T, NHWC>::DoConvTranspose(OpKernelContext* context, bool dy

       if (w_dims_changed) {
         s_.last_w_dims = gsl::make_span(w_dims);
-        s_.cached_benchmark_bwd_results.clear();
       }

       ConvTransposeAttributes::Prepare p;

I don't have any skills on all this, I just wanted to make use of my AMD RX 6400 for ML

tianleiwu · 2024-07-10T21:29:50Z

@Zelnes, I can understand that the cache is not thread safe. Basically, when w changed, it will clear cache but another thread might still use the cache.

A quick fix is to add a mutex to guard the cache like this

More complete solution is to add more parameters (like weight shape and conv_desc etc like pytorch) to the key of hash table so that we will never need clear the cache.

Current implementation has assumption that those non-shape parameters does not change. It is not always True, which will lead to `MIOpen Error: No invoker was registered for convolution forward.'.

Cache is needed for good performance (finding algo takes time so it does not make sense to do it every time).

@PeixuanZuo for awareness the issue.

dmnieto added 2 commits February 17, 2024 11:23

Merge branch 'microsoft:main' into fix_rocm_convolution

0102a49

Merge branch 'microsoft:main' into fix_rocm_convolution

a02c308

tianleiwu mentioned this pull request Jul 10, 2024

ROCm Conv is not thread safe #21311

Open

Zelnes mentioned this pull request Jul 12, 2024

feat(ml): introduce support of onnxruntime-rocm for AMD GPU immich-app/immich#11063

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable algo caching in ROCM EP #19567

Disable algo caching in ROCM EP #19567

dmnieto commented Feb 19, 2024

tianleiwu commented Feb 20, 2024

dmnieto commented Feb 29, 2024

Zelnes commented Jul 10, 2024

tianleiwu commented Jul 10, 2024 •

edited

Loading

Disable algo caching in ROCM EP #19567

Are you sure you want to change the base?

Disable algo caching in ROCM EP #19567

Conversation

dmnieto commented Feb 19, 2024

tianleiwu commented Feb 20, 2024

dmnieto commented Feb 29, 2024

Zelnes commented Jul 10, 2024

tianleiwu commented Jul 10, 2024 • edited Loading

tianleiwu commented Jul 10, 2024 •

edited

Loading