Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement to ck integration #1859

Merged
merged 29 commits into from
Jul 2, 2023
Merged

Improvement to ck integration #1859

merged 29 commits into from
Jul 2, 2023

Conversation

pfultz2
Copy link
Collaborator

@pfultz2 pfultz2 commented Jun 20, 2023

  • Add a CI job to test CK
  • Add MIGRAPHX_TUNE_CK env variable to only do tuning for CK
  • Continue tuning even when there is invalid configs
  • Fix a bug with parallel compilation not using all available threads
  • Add additional test for gemms using half types
  • Removed int32 as supported type since it doesnt pass our test suite

@codecov
Copy link

codecov bot commented Jun 20, 2023

Codecov Report

Merging #1859 (60e1f8e) into develop (1f827a7) will not change coverage.
The diff coverage is n/a.

❗ Current head 60e1f8e differs from pull request most recent head 6696d44. Consider uploading reports for the commit 6696d44 to get more accurate results

@@           Coverage Diff            @@
##           develop    #1859   +/-   ##
========================================
  Coverage    91.39%   91.39%           
========================================
  Files          419      419           
  Lines        15542    15542           
========================================
  Hits         14204    14204           
  Misses        1338     1338           

src/targets/gpu/compile_ops.cpp Show resolved Hide resolved
@@ -185,7 +203,10 @@ void par_compile(std::size_t n, F f)
{
if(n == 0)
return;
par_for(n, n / value_of(MIGRAPHX_GPU_COMPILE_PARALLEL{}, n), f);
auto d = value_of(MIGRAPHX_GPU_COMPILE_PARALLEL{});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto d = value_of(MIGRAPHX_GPU_COMPILE_PARALLEL{});
auto d = value_of(MIGRAPHX_GPU_COMPILE_PARALLEL{}, n);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wont work, because the n will be cached when MIGRAPHX_GPU_COMPILE_PARALLEL is not set.

src/targets/gpu/jit/ck_gemm.cpp Show resolved Hide resolved
@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Jun 29, 2023

Test Batch Rate new
6696d4
Rate old
d6aade
Diff Compare
torchvision-resnet50 64 894.85 895.65 -0.09%
torchvision-resnet50_fp16 64 5,313.14 5,315.31 -0.04%
torchvision-densenet121 32 1,123.99 1,126.78 -0.25%
torchvision-densenet121_fp16 32 3,275.44 3,282.20 -0.21%
torchvision-inceptionv3 32 592.96 593.29 -0.06%
torchvision-inceptionv3_fp16 32 2,494.23 2,517.31 -0.92%
cadene-inceptionv4 16 328.70 329.04 -0.10%
cadene-resnext64x4 16 396.92 397.15 -0.06%
slim-mobilenet 64 7,132.06 7,136.84 -0.07%
slim-nasnetalarge 64 159.84 160.01 -0.11%
slim-resnet50v2 64 1,089.43 1,090.55 -0.10%
bert-mrpc-onnx 8 718.90 719.50 -0.08%
bert-mrpc-tf 1 369.81 370.16 -0.10%
pytorch-examples-wlang-gru 1 294.66 304.54 -3.25% 🔴
pytorch-examples-wlang-lstm 1 298.45 308.91 -3.39% 🔴
torchvision-resnet50_1 1 91.44 91.91 -0.51%
torchvision-inceptionv3_1 1 128.98 128.80 0.14%
cadene-dpn92_1 1 333.88 336.62 -0.82%
cadene-resnext101_1 1 236.49 237.34 -0.36%
slim-vgg16_1 1 53.34 53.36 -0.04%
slim-mobilenet_1 1 1,485.31 1,523.34 -2.50%
slim-inceptionv4_1 1 101.96 101.62 0.34%
onnx-taau-downsample 1 316.36 316.09 0.08%
dlrm-criteoterabyte 1 21.42 21.42 0.01%
dlrm-criteoterabyte_fp16 1 39.95 39.96 -0.03%
agentmodel 1 5,413.13 5,733.29 -5.58% 🔴
unet_fp16 2 52.63 52.83 -0.37%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


    :white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

    :white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

    :white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:torchvision-inceptionv3_1: PASSED: MIGraphX meets tolerance

🔴cadene-dpn92_1: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-inceptionv4_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

    :white_check_mark:agentmodel: PASSED: MIGraphX meets tolerance

    :white_check_mark:unet: PASSED: MIGraphX meets tolerance

@pfultz2
Copy link
Collaborator Author

pfultz2 commented Jun 29, 2023

@turneram Any feedback?

Copy link
Contributor

@turneram turneram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to leave int32 enabled, then adding the requirement that m, n, and k are all divisible by 4 appears to maintain correctness. We could also investigate further to get more precise criteria and then add it back in another PR.

@causten causten merged commit 3c9df3b into develop Jul 2, 2023
11 checks passed
@causten causten deleted the ci-ck branch July 2, 2023 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants