Update passes to use offload_copy based on root module #1875

umangyadav · 2023-06-23T20:10:18Z

Needed to run multi-targeted program where "main" isn't the only root module. There could be many root modules other than main.

codecov · 2023-06-23T21:23:26Z

Codecov Report

Merging #1875 (436ea17) into develop (edc4bf5) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 436ea17 differs from pull request most recent head 5924fd9. Consider uploading reports for the commit 5924fd9 to get more accurate results

@@             Coverage Diff             @@
##           develop    #1875      +/-   ##
===========================================
- Coverage    91.39%   91.37%   -0.02%     
===========================================
  Files          419      419              
  Lines        15542    15544       +2     
===========================================
  Hits         14204    14204              
- Misses        1338     1340       +2

Impacted Files	Coverage Δ
src/include/migraphx/replace_allocate.hpp	`100.00% <ø> (ø)`
src/pass_manager.cpp	`90.41% <100.00%> (-2.65%)`	⬇️
src/promote_literals.cpp	`100.00% <100.00%> (ø)`
src/replace_allocate.cpp	`100.00% <100.00%> (ø)`

migraphx-bot · 2023-06-29T05:57:21Z

Test	Batch	Rate new 5924fd	Rate old 3f5668	Diff	Compare
torchvision-resnet50	64	895.42	895.19	0.03%	✅
torchvision-resnet50_fp16	64	5,306.36	5,309.13	-0.05%	✅
torchvision-densenet121	32	1,126.29	1,126.97	-0.06%	✅
torchvision-densenet121_fp16	32	3,281.90	3,278.63	0.10%	✅
torchvision-inceptionv3	32	594.18	592.60	0.27%	✅
torchvision-inceptionv3_fp16	32	2,499.08	2,530.08	-1.23%	✅
cadene-inceptionv4	16	329.61	329.20	0.12%	✅
cadene-resnext64x4	16	394.41	394.69	-0.07%	✅
slim-mobilenet	64	7,133.66	7,124.70	0.13%	✅
slim-nasnetalarge	64	160.05	159.98	0.04%	✅
slim-resnet50v2	64	1,089.71	1,089.92	-0.02%	✅
bert-mrpc-onnx	8	719.50	719.70	-0.03%	✅
bert-mrpc-tf	1	370.77	370.64	0.03%	✅
pytorch-examples-wlang-gru	1	307.76	307.69	0.02%	✅
pytorch-examples-wlang-lstm	1	313.57	314.32	-0.24%	✅
torchvision-resnet50_1	1	91.90	91.96	-0.07%	✅
torchvision-inceptionv3_1	1	127.95	129.32	-1.06%	✅
cadene-dpn92_1	1	335.65	335.78	-0.04%	✅
cadene-resnext101_1	1	237.13	236.95	0.08%	✅
slim-vgg16_1	1	53.35	53.36	-0.01%	✅
slim-mobilenet_1	1	1,516.80	1,493.21	1.58%	✅
slim-inceptionv4_1	1	100.81	99.81	1.00%	✅
onnx-taau-downsample	1	316.87	316.10	0.24%	✅
dlrm-criteoterabyte	1	21.43	21.41	0.05%	✅
dlrm-criteoterabyte_fp16	1	39.95	39.98	-0.07%	✅
agentmodel	1	5,581.60	5,857.27	-4.71%	🔴
unet_fp16	2	52.83	52.77	0.12%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2023-06-29T05:57:23Z

:white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

:white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

:white_check_mark:torchvision-inceptionv3_1: PASSED: MIGraphX meets tolerance

🔴cadene-dpn92_1: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

:white_check_mark:slim-inceptionv4_1: PASSED: MIGraphX meets tolerance

:white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark:agentmodel: PASSED: MIGraphX meets tolerance

:white_check_mark:unet: PASSED: MIGraphX meets tolerance

CharlieL7

I understand the change to use get_root_module() and including a root module pointer. However you mentioned that there might be multiple root modules with multi-target compilation? I do not understand what benefit that gives us compared to having one root module and sub-modules that go to specific targets.

umangyadav · 2023-06-29T23:29:21Z

one root module and sub-modules that go to specific targets.

One root module is the "main" module. This main root module would run on the "Ref" target initially and it would call different target-specific root modules at the top level.
But those target root modules can further have nesting. So it is possible that GPU module would call into CPU module. In this case GPU root module would have CPU root module nested inside it. Execution flow would GPU-->CPU-->GPU in this case.

CharlieL7

Approved.
Question for my understanding:
If we have offload_copy on and a program that goes GPU->CPU->GPU with the changing root modules, this means that each module would have its own offload copies right? I could see MIGX optimizing the second GPU offloads if we can reuse those from the first GPU section.
On the other hand, if offload_copy is off we would have output parameters for both the GPU sections?

umangyadav · 2023-06-30T13:41:05Z

each module would have its own offload copies right?

Yes each root module would have its own parameters and offload copies.

I could see MIGX optimizing the second GPU offloads if we can reuse those from the first GPU section.

Currently, I don't think MIGX is optimizing it but we can try to do it optimization later. Right now idea is that each target root module is compiled as separate self-containing programs. So each target root module would have its own parameters and return values.

On the other hand, if offload_copy is off we would have output parameters for both the GPU sections?

Yes if offload_copy is off then it would have output parameters, but it can't work without offload-copies for now for multi-targets. We don't have a mechanism to put parameters into GPU and copy back the final result from the host for the multi-target programs. Also for the multi-targets where execution is switching between GPU/CPU/FPGA, it may not make sense to not have offload-copy, compiler would have to inject necessary copies to switch between targets.

umangyadav added 2 commits June 23, 2023 19:54

Update pass manager to pass root_module

20836ab

Use root module for the repalce allocate

bae3e3f

umangyadav requested review from CharlieL7 and pfultz2 June 23, 2023 20:10

umangyadav self-assigned this Jun 23, 2023

umangyadav requested a review from kahmed10 June 23, 2023 20:14

Fix tidy

dac22e4

Merge branch 'develop' into mpm_root_module

dffb862

CharlieL7 reviewed Jun 29, 2023

View reviewed changes

CharlieL7 approved these changes Jun 30, 2023

View reviewed changes

Merge branch 'develop' into mpm_root_module

5924fd9

pfultz2 approved these changes Jul 5, 2023

View reviewed changes

kahmed10 approved these changes Jul 5, 2023

View reviewed changes

causten merged commit e747114 into develop Jul 5, 2023
11 checks passed

causten deleted the mpm_root_module branch July 5, 2023 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update passes to use offload_copy based on root module #1875

Update passes to use offload_copy based on root module #1875

umangyadav commented Jun 23, 2023

codecov bot commented Jun 23, 2023 •

edited

Loading

migraphx-bot commented Jun 29, 2023 •

edited

Loading

migraphx-bot commented Jun 29, 2023

CharlieL7 left a comment

umangyadav commented Jun 29, 2023 •

edited

Loading

CharlieL7 left a comment

umangyadav commented Jun 30, 2023 •

edited

Loading

Update passes to use offload_copy based on root module #1875

Update passes to use offload_copy based on root module #1875

Conversation

umangyadav commented Jun 23, 2023

codecov bot commented Jun 23, 2023 • edited Loading

Codecov Report

migraphx-bot commented Jun 29, 2023 • edited Loading

migraphx-bot commented Jun 29, 2023

CharlieL7 left a comment

Choose a reason for hiding this comment

umangyadav commented Jun 29, 2023 • edited Loading

CharlieL7 left a comment

Choose a reason for hiding this comment

umangyadav commented Jun 30, 2023 • edited Loading

codecov bot commented Jun 23, 2023 •

edited

Loading

migraphx-bot commented Jun 29, 2023 •

edited

Loading

umangyadav commented Jun 29, 2023 •

edited

Loading

umangyadav commented Jun 30, 2023 •

edited

Loading