Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update passes to use offload_copy based on root module #1875

Merged
merged 5 commits into from
Jul 5, 2023

Conversation

umangyadav
Copy link
Member

Needed to run multi-targeted program where "main" isn't the only root module. There could be many root modules other than main.

@umangyadav umangyadav self-assigned this Jun 23, 2023
@umangyadav umangyadav requested a review from kahmed10 June 23, 2023 20:14
@codecov
Copy link

codecov bot commented Jun 23, 2023

Codecov Report

Merging #1875 (436ea17) into develop (edc4bf5) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 436ea17 differs from pull request most recent head 5924fd9. Consider uploading reports for the commit 5924fd9 to get more accurate results

@@             Coverage Diff             @@
##           develop    #1875      +/-   ##
===========================================
- Coverage    91.39%   91.37%   -0.02%     
===========================================
  Files          419      419              
  Lines        15542    15544       +2     
===========================================
  Hits         14204    14204              
- Misses        1338     1340       +2     
Impacted Files Coverage Δ
src/include/migraphx/replace_allocate.hpp 100.00% <ø> (ø)
src/pass_manager.cpp 90.41% <100.00%> (-2.65%) ⬇️
src/promote_literals.cpp 100.00% <100.00%> (ø)
src/replace_allocate.cpp 100.00% <100.00%> (ø)

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Jun 29, 2023

Test Batch Rate new
5924fd
Rate old
3f5668
Diff Compare
torchvision-resnet50 64 895.42 895.19 0.03%
torchvision-resnet50_fp16 64 5,306.36 5,309.13 -0.05%
torchvision-densenet121 32 1,126.29 1,126.97 -0.06%
torchvision-densenet121_fp16 32 3,281.90 3,278.63 0.10%
torchvision-inceptionv3 32 594.18 592.60 0.27%
torchvision-inceptionv3_fp16 32 2,499.08 2,530.08 -1.23%
cadene-inceptionv4 16 329.61 329.20 0.12%
cadene-resnext64x4 16 394.41 394.69 -0.07%
slim-mobilenet 64 7,133.66 7,124.70 0.13%
slim-nasnetalarge 64 160.05 159.98 0.04%
slim-resnet50v2 64 1,089.71 1,089.92 -0.02%
bert-mrpc-onnx 8 719.50 719.70 -0.03%
bert-mrpc-tf 1 370.77 370.64 0.03%
pytorch-examples-wlang-gru 1 307.76 307.69 0.02%
pytorch-examples-wlang-lstm 1 313.57 314.32 -0.24%
torchvision-resnet50_1 1 91.90 91.96 -0.07%
torchvision-inceptionv3_1 1 127.95 129.32 -1.06%
cadene-dpn92_1 1 335.65 335.78 -0.04%
cadene-resnext101_1 1 237.13 236.95 0.08%
slim-vgg16_1 1 53.35 53.36 -0.01%
slim-mobilenet_1 1 1,516.80 1,493.21 1.58%
slim-inceptionv4_1 1 100.81 99.81 1.00%
onnx-taau-downsample 1 316.87 316.10 0.24%
dlrm-criteoterabyte 1 21.43 21.41 0.05%
dlrm-criteoterabyte_fp16 1 39.95 39.98 -0.07%
agentmodel 1 5,581.60 5,857.27 -4.71% 🔴
unet_fp16 2 52.83 52.77 0.12%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


    :white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

    :white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

    :white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:torchvision-inceptionv3_1: PASSED: MIGraphX meets tolerance

🔴cadene-dpn92_1: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-inceptionv4_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

    :white_check_mark:agentmodel: PASSED: MIGraphX meets tolerance

    :white_check_mark:unet: PASSED: MIGraphX meets tolerance

Copy link
Collaborator

@CharlieL7 CharlieL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the change to use get_root_module() and including a root module pointer. However you mentioned that there might be multiple root modules with multi-target compilation? I do not understand what benefit that gives us compared to having one root module and sub-modules that go to specific targets.

@umangyadav
Copy link
Member Author

umangyadav commented Jun 29, 2023

one root module and sub-modules that go to specific targets.

One root module is the "main" module. This main root module would run on the "Ref" target initially and it would call different target-specific root modules at the top level.
But those target root modules can further have nesting. So it is possible that GPU module would call into CPU module. In this case GPU root module would have CPU root module nested inside it. Execution flow would GPU-->CPU-->GPU in this case.

Copy link
Collaborator

@CharlieL7 CharlieL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.
Question for my understanding:
If we have offload_copy on and a program that goes GPU->CPU->GPU with the changing root modules, this means that each module would have its own offload copies right? I could see MIGX optimizing the second GPU offloads if we can reuse those from the first GPU section.
On the other hand, if offload_copy is off we would have output parameters for both the GPU sections?

@umangyadav
Copy link
Member Author

umangyadav commented Jun 30, 2023

each module would have its own offload copies right?

Yes each root module would have its own parameters and offload copies.

I could see MIGX optimizing the second GPU offloads if we can reuse those from the first GPU section.

Currently, I don't think MIGX is optimizing it but we can try to do it optimization later. Right now idea is that each target root module is compiled as separate self-containing programs. So each target root module would have its own parameters and return values.

On the other hand, if offload_copy is off we would have output parameters for both the GPU sections?

Yes if offload_copy is off then it would have output parameters, but it can't work without offload-copies for now for multi-targets. We don't have a mechanism to put parameters into GPU and copy back the final result from the host for the multi-target programs. Also for the multi-targets where execution is switching between GPU/CPU/FPGA, it may not make sense to not have offload-copy, compiler would have to inject necessary copies to switch between targets.

@causten causten merged commit e747114 into develop Jul 5, 2023
11 checks passed
@causten causten deleted the mpm_root_module branch July 5, 2023 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants