Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Onnxruntime parity checks to CI #1938

Closed
wants to merge 14 commits into from

Conversation

TedThemistokleous
Copy link
Collaborator

Add in onnxruntime parity checks into our CI after we build onnxruntime. Goal with this changeset is to ensure the tests run without error/issue and isn't concerned about performance.

Parallelize the Parity tests, with the existing unit tests run and organize this so we can keep unit, install, and parity checks separate in our pipeline. ORT is now built with --build_wheel which is used with the final two stages (unit tests and parity)

Related to the issues seen in DLM where parity tests fail when we move ROCm versions. This change is to help get ahead of things

Related issues

#1935 & #1877

Run the three parity checks that are used for onnxruntime as part of CI

can be expanded later if needed
…y tests

Do the following

- Split build_and_test.sh into build_and_install.sh and test_onnxrt_unit_test.py
- Add parity tests to test_onnxrt_parity_tests.sh
- Parallelize unit and parity tests for MIGraphX-Onnxruntime integration in jenkins
- Add changes to dockerfile for additional run scripts.

Idea here is that unit tests take a while, and in the meantime if we perform
a wheel build of onnxruntime library and then run the appropriate parity tests
we should be able to catch any odd changes in parity that are missed by the unit tests
which would also appear in additional testing.
@TedThemistokleous TedThemistokleous added enhancement New feature or request dependencies Pull requests that update a dependency file Continous Integration Pull request updates parts of continous integration pipeline labels Jul 10, 2023
@TedThemistokleous TedThemistokleous self-assigned this Jul 10, 2023
@TedThemistokleous TedThemistokleous linked an issue Jul 10, 2023 that may be closed by this pull request
5 tasks
@TedThemistokleous TedThemistokleous changed the title Add parity check CI Add Onnxruntime parity checks to CI Jul 10, 2023
@TedThemistokleous TedThemistokleous added the onnxruntime PR changes interaction between MIGraphX and Onnxruntime label Jul 10, 2023
@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Jul 11, 2023

Test Batch Rate new
c9b6a2
Rate old
aeb9f7
Diff Compare
torchvision-resnet50 64 2,277.79 2,277.82 -0.00%
torchvision-resnet50_fp16 64 5,347.33 5,327.21 0.38%
torchvision-densenet121 32 1,816.64 1,830.34 -0.75%
torchvision-densenet121_fp16 32 3,385.14 3,384.17 0.03%
torchvision-inceptionv3 32 1,346.23 1,344.58 0.12%
torchvision-inceptionv3_fp16 32 2,534.23 2,517.07 0.68%
cadene-inceptionv4 16 680.70 677.01 0.55%
cadene-resnext64x4 16 589.39 589.61 -0.04%
slim-mobilenet 64 7,207.38 7,214.94 -0.10%
slim-nasnetalarge 64 236.69 236.56 0.06%
slim-resnet50v2 64 2,522.59 2,522.06 0.02%
bert-mrpc-onnx 8 718.07 718.88 -0.11%
bert-mrpc-tf 1 363.63 364.10 -0.13%
pytorch-examples-wlang-gru 1 309.88 311.71 -0.59%
pytorch-examples-wlang-lstm 1 318.80 317.97 0.26%
torchvision-resnet50_1 1 554.56 562.46 -1.40%
torchvision-inceptionv3_1 1 305.41 306.69 -0.42%
cadene-dpn92_1 1 357.16 360.64 -0.97%
cadene-resnext101_1 1 220.01 219.70 0.14%
slim-vgg16_1 1 223.95 223.28 0.30%
slim-mobilenet_1 1 1,503.00 1,438.56 4.48% 🔆
slim-inceptionv4_1 1 221.47 225.27 -1.69%
onnx-taau-downsample 1 321.43 320.98 0.14%
dlrm-criteoterabyte 1 21.70 21.65 0.24%
dlrm-criteoterabyte_fp16 1 40.57 40.59 -0.04%
agentmodel 1 5,914.95 5,973.36 -0.98%
unet_fp16 2 54.95 55.01 -0.11%

Check results before merge 🔆

@migraphx-bot
Copy link
Collaborator

migraphx-bot commented Jul 11, 2023


    :white_check_mark:bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

    :white_check_mark:bert-mrpc-tf: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

    :white_check_mark:pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

    :white_check_mark:torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

🔴torchvision-inceptionv3_1: FAILED: MIGraphX is not within tolerance - check verbose output


🔴cadene-dpn92_1: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:cadene-resnext101_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-vgg16_1: PASSED: MIGraphX meets tolerance

    :white_check_mark:slim-mobilenet_1: PASSED: MIGraphX meets tolerance

🔴slim-inceptionv4_1: FAILED: MIGraphX is not within tolerance - check verbose output


    :white_check_mark:dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

    :white_check_mark:agentmodel: PASSED: MIGraphX meets tolerance

    :white_check_mark:unet: PASSED: MIGraphX meets tolerance

tools/build_and_install_ort.sh Outdated Show resolved Hide resolved
Jenkinsfile Outdated Show resolved Hide resolved
Using groovy to debug issues in jenkins file

Fixing new line and names
@codecov
Copy link

codecov bot commented Jul 12, 2023

Codecov Report

Merging #1938 (7927fe8) into develop (dcc7b0a) will not change coverage.
The diff coverage is n/a.

❗ Current head 7927fe8 differs from pull request most recent head 0b2bcf2. Consider uploading reports for the commit 0b2bcf2 to get more accurate results

@@           Coverage Diff            @@
##           develop    #1938   +/-   ##
========================================
  Coverage    91.49%   91.49%           
========================================
  Files          430      430           
  Lines        16129    16129           
========================================
  Hits         14758    14758           
  Misses        1371     1371           

@TedThemistokleous TedThemistokleous added skip bot checks Skips the Performance and Accuracy CI tests DeepLearningModels Artifacts related to DLM benchmarking/parity checks labels Aug 5, 2023
need to call the script the proper name
@TedThemistokleous
Copy link
Collaborator Author

odd seeing


[2023-08-06T19:08:42.223Z] Caused by: hudson.plugins.git.GitException: Command "git init /home/jenkins/workspace/_AMDMIGraphX_add_parity_check_ci" returned status code 128:

[2023-08-06T19:08:42.223Z] stdout: 

[2023-08-06T19:08:42.223Z] stderr: error: copy-fd: write returned: No space left on device

[2023-08-06T19:08:42.223Z] fatal: cannot copy '/usr/share/git-core/templates/hooks/pre-rebase.sample' to '/data/workspace/_AMDMIGraphX_add_parity_check_ci/.git/hooks/pre-rebase.sample': No space left on device```

@@ -127,3 +129,8 @@ ENV LD_LIBRARY_PATH=$PREFIX/lib
ENV UBSAN_OPTIONS=print_stacktrace=1
ENV ASAN_OPTIONS=detect_stack_use_after_return=1:check_initialization_order=1:strict_init_order=1
RUN ln -s /opt/rocm/llvm/bin/llvm-symbolizer /usr/bin/llvm-symbolizer

#install dependancies used for parity checks
RUN pip3 install psutil==5.9.5 onnx==1.10.2 coloredlogs==15.0.1 packaging==23.1 transformers==4.29.2 sympy==1.12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now support the latest onnx version. 1.10.2 is too old and doesn't have a bug fix about model location changes. I would rather there be a requirements file to handle test specific items.

cd /workspace/onnxruntime/onnxruntime/test/python/transformers/

#Install latest stable torch version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 5.4.2 really the latest stable? Would this even be compatible with ROCm 5.7 or 6.0? How useful would this be if the version of migx being tested is 5.7 or 6.0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't touched this in a while I can update.

@@ -1,7 +1,7 @@
#####################################################################################
# The MIT License (MIT)
#
# Copyright (c) 2015-2022 Advanced Micro Devices, Inc. All rights reserved.
# Copyright (c) 2015-2023 Advanced Micro Devices, Inc. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License checker will flag this. Change to 2024

@TedThemistokleous
Copy link
Collaborator Author

Closing this out. Superceeded by more recent syncs to Onnxruntime main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Continous Integration Pull request updates parts of continous integration pipeline DeepLearningModels Artifacts related to DLM benchmarking/parity checks dependencies Pull requests that update a dependency file enhancement New feature or request onnxruntime PR changes interaction between MIGraphX and Onnxruntime skip bot checks Skips the Performance and Accuracy CI tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Onnxruntime parity checks into MIGraphX CI
5 participants