[MIGraphX EP] Fix CopyTensorAsync and add guards for stream sync CopyTensors #16787

TedThemistokleous · 2023-07-21T04:32:33Z

Add compile guards to gate functionality based on MIGRAPHX_STREAM_SYNC for adding the following

remove excess hipStreamSyncronize to nullstream on CopyTensor calls
Add proper call for stream synchronized CopyTensorAsync for DeviceToHost case

Without this change subsequent CopyTensorAsync() calls will fail for cards that don't use pinned memory thus causing hipMemcpy() calls to occur before certain kernel operations occur.

Description

Remove excess syncronization when stream sync is enabled but also block the DeviceTohost CopyTensorAsync() to the desired GPU stream.

Motivation and Context

Without this change we fail to properly wait for a kernel to compute and synchronize correctly if memory isn't specifically pinned for the task. This was observed when doing another run with test_parity_gelu and test_parity_layernorm tests.

becomes

…Tensors() Add compile guards to gate functionality based on MIGRAPHX_STREAM_SYNC for adding the following - remove excess hipStreamSyncronize to nullstream on CopyTensor calls - Add proper call for stream synchronized CopyTensorAsync for DeviceToHost case Without this change subsequent CopyTensorAsync() calls will fail for cards that don't use pinned memory thus causing hipMemcpy() calls to occur before certain kernel operations occur.

TedThemistokleous · 2023-07-21T04:33:16Z

ping @cloudhan @PeixuanZuo . This is related to the issue I found earlier. #16774

cloudhan · 2023-07-21T04:46:00Z

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

cloudhan · 2023-07-21T04:46:06Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,ONNX Runtime Web CI Pipeline

azure-pipelines · 2023-07-21T04:46:35Z

Azure Pipelines successfully started running 8 pipeline(s).

azure-pipelines · 2023-07-21T04:46:38Z

Azure Pipelines successfully started running 8 pipeline(s).

onnxruntime/core/providers/migraphx/gpu_data_transfer.cc

This is already handled in the EP as end of run performs: OnRunEnd()->hipStreamQuery()->hipStreamSyncronize() as well as Sync()->hipStreamSync() Also after every hipMemCpy() we perform a hipStreamSyncronize(stream)

TedThemistokleous · 2023-07-21T15:24:56Z

@cloudhan @PeixuanZuo @ytaous

let me know if you need anything further for this or there are any additional concerns.

cloudhan · 2023-07-21T16:45:44Z

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

cloudhan · 2023-07-21T16:45:57Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,ONNX Runtime Web CI Pipeline

azure-pipelines · 2023-07-21T16:46:20Z

Azure Pipelines successfully started running 8 pipeline(s).

azure-pipelines · 2023-07-21T16:46:35Z

Azure Pipelines successfully started running 8 pipeline(s).

TedThemistokleous · 2023-07-21T19:17:38Z

@cloudhan, looks like this test isn't building for some reason

4: Test command: 
4: Working Directory: D:/a/onnxruntime/onnxruntime/build/Release/_deps/tvm-build
4/4 Test #4: cpptest_NOT_BUILT ................***Not Run   0.00 sec

75% tests passed, 1 tests failed out of 4

…Tensors (microsoft#16787) Add compile guards to gate functionality based on MIGRAPHX_STREAM_SYNC for adding the following - remove excess hipStreamSyncronize to nullstream on CopyTensor calls - Add proper call for stream synchronized CopyTensorAsync for DeviceToHost case Without this change subsequent CopyTensorAsync() calls will fail for cards that don't use pinned memory thus causing hipMemcpy() calls to occur before certain kernel operations occur. ![image](https://github.com/microsoft/onnxruntime/assets/107195283/4915c18a-fb2d-40c9-a50e-a7c6613c324b) becomes ![image](https://github.com/microsoft/onnxruntime/assets/107195283/f661acf4-e2af-4c9a-b26a-30fca339cf1d) --------- Co-authored-by: Ted Themistokleous <tthemist@amd.com>

…Tensors (microsoft#16787) (#13) Add compile guards to gate functionality based on MIGRAPHX_STREAM_SYNC for adding the following - remove excess hipStreamSyncronize to nullstream on CopyTensor calls - Add proper call for stream synchronized CopyTensorAsync for DeviceToHost case Without this change subsequent CopyTensorAsync() calls will fail for cards that don't use pinned memory thus causing hipMemcpy() calls to occur before certain kernel operations occur. ![image](https://github.com/microsoft/onnxruntime/assets/107195283/4915c18a-fb2d-40c9-a50e-a7c6613c324b) becomes ![image](https://github.com/microsoft/onnxruntime/assets/107195283/f661acf4-e2af-4c9a-b26a-30fca339cf1d) --------- Co-authored-by: Ted Themistokleous <tthemist@amd.com>

…Tensors (microsoft#16787) Add compile guards to gate functionality based on MIGRAPHX_STREAM_SYNC for adding the following - remove excess hipStreamSyncronize to nullstream on CopyTensor calls - Add proper call for stream synchronized CopyTensorAsync for DeviceToHost case Without this change subsequent CopyTensorAsync() calls will fail for cards that don't use pinned memory thus causing hipMemcpy() calls to occur before certain kernel operations occur. ![image](https://github.com/microsoft/onnxruntime/assets/107195283/4915c18a-fb2d-40c9-a50e-a7c6613c324b) becomes ![image](https://github.com/microsoft/onnxruntime/assets/107195283/f661acf4-e2af-4c9a-b26a-30fca339cf1d) --------- Co-authored-by: Ted Themistokleous <tthemist@amd.com>

…Tensors (#16787) Add compile guards to gate functionality based on MIGRAPHX_STREAM_SYNC for adding the following - remove excess hipStreamSyncronize to nullstream on CopyTensor calls - Add proper call for stream synchronized CopyTensorAsync for DeviceToHost case Without this change subsequent CopyTensorAsync() calls will fail for cards that don't use pinned memory thus causing hipMemcpy() calls to occur before certain kernel operations occur. ![image](https://github.com/microsoft/onnxruntime/assets/107195283/4915c18a-fb2d-40c9-a50e-a7c6613c324b) becomes ![image](https://github.com/microsoft/onnxruntime/assets/107195283/f661acf4-e2af-4c9a-b26a-30fca339cf1d) --------- Co-authored-by: Ted Themistokleous <tthemist@amd.com>

TedThemistokleous mentioned this pull request Jul 21, 2023

MIGRAPHX_TRACE_EVAL=1 with DLM test_parity_gelu changes behavior. ROCm/AMDMIGraphX#1877

Closed

cloudhan reviewed Jul 21, 2023

View reviewed changes

onnxruntime/core/providers/migraphx/gpu_data_transfer.cc Outdated Show resolved Hide resolved

TedThemistokleous added 2 commits July 21, 2023 07:33

Fix format for #ifndef's for changeset

05a00a7

Remove syncs in CopyTensor syncronous case

dee9b54

This is already handled in the EP as end of run performs: OnRunEnd()->hipStreamQuery()->hipStreamSyncronize() as well as Sync()->hipStreamSync() Also after every hipMemCpy() we perform a hipStreamSyncronize(stream)

cloudhan approved these changes Jul 22, 2023

View reviewed changes

cloudhan merged commit 488544b into microsoft:main Jul 22, 2023
65 of 66 checks passed

TedThemistokleous deleted the migx_gpu_async_copy_fix branch July 24, 2023 13:54

TedThemistokleous mentioned this pull request Jul 26, 2023

[MIGraphX EP] Fix CopyTensorAsync and add guards for stream sync Copy… ROCm/onnxruntime#15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MIGraphX EP] Fix CopyTensorAsync and add guards for stream sync CopyTensors #16787

[MIGraphX EP] Fix CopyTensorAsync and add guards for stream sync CopyTensors #16787

TedThemistokleous commented Jul 21, 2023

TedThemistokleous commented Jul 21, 2023

cloudhan commented Jul 21, 2023

cloudhan commented Jul 21, 2023

azure-pipelines bot commented Jul 21, 2023

azure-pipelines bot commented Jul 21, 2023

TedThemistokleous commented Jul 21, 2023 •

edited

Loading

cloudhan commented Jul 21, 2023

cloudhan commented Jul 21, 2023

azure-pipelines bot commented Jul 21, 2023

azure-pipelines bot commented Jul 21, 2023

TedThemistokleous commented Jul 21, 2023

[MIGraphX EP] Fix CopyTensorAsync and add guards for stream sync CopyTensors #16787

[MIGraphX EP] Fix CopyTensorAsync and add guards for stream sync CopyTensors #16787

Conversation

TedThemistokleous commented Jul 21, 2023

Description

Motivation and Context

TedThemistokleous commented Jul 21, 2023

cloudhan commented Jul 21, 2023

cloudhan commented Jul 21, 2023

azure-pipelines bot commented Jul 21, 2023

azure-pipelines bot commented Jul 21, 2023

TedThemistokleous commented Jul 21, 2023 • edited Loading

cloudhan commented Jul 21, 2023

cloudhan commented Jul 21, 2023

azure-pipelines bot commented Jul 21, 2023

azure-pipelines bot commented Jul 21, 2023

TedThemistokleous commented Jul 21, 2023

TedThemistokleous commented Jul 21, 2023 •

edited

Loading