Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Benchmarking CI with --hard-fail flag #7915

Closed
wants to merge 22 commits into from

Conversation

JWLee89
Copy link
Contributor

@JWLee89 JWLee89 commented May 21, 2022

This PR is motivated by a quote in the following issue (#7870).

We need to add more detailed export and benchmarking CI (python utils/benchmarks.py) as well in a separate action since this take longer.

For now, this PR does not incorporate separate actions and benchmarking to keep the PR compact, since adding all of these tests in a single PR will result in major changes, making it difficult to review.

Question regarding separate CI working for exports and benchmarking:

Should the CI workflows be run on all three OS (windows, linux, mac)? The reason why I ask is that as far as I know, TensorRT does not play well with Mac OS. Tests may be easier if we target a platform such as ubuntu 18.04 or 20.04 and run the CI workflow inside of a docker container.

Changes

  • Updated ci-testing.yml.
    • Replaced the following: python export.py --weights ${{ matrix.model }}.pt --img 64 --include torchscript onnx # export with pytest.
  • Added test_export.py
    • Test to see if the given export format results in actual file serialization. Will automatically fail if the output file / folder does not exist.
    • For folders, it will check inside folder to ensure that generated files / subfolders are not empty
    • Cleanup file system after running tests (as exported models are serialized to local file system)
  • Added pytest to requirements.txt
  • Addedconftest.py to accept pytest command line argument: --weights (the path to PyTorch .pt file)

Future Works

  • Inference benchmarking (assert that all model formats output similar mAP on VOC validation set)
  • Running github workflow on docker
    • This will enable testing using TensorRT, provided that we have access to self-hosted GPU server in CI pipeline
  • Add separate CI workflow for exports and benchmarking.

Discovered issues

  • tensorflowjs and OpenVINO do not play well together simultaneously due to OpenVINO supporting numpy ver < 1.20. tensorflowjs on the other hand, does not work with numpy ver < 1.20.
  • The lazy pip installation feature via check_requirements(('tensorflowjs', )) sometimes fails when installing tensorflowjs. The root cause is currently unknown, but I believe that we can reproduce it by following the github workflow inside of a docker container.

πŸ› οΈ PR Summary

Made with ❀️ by Ultralytics Actions

🌟 Summary

Enhanced benchmark integrity and added benchmark validation for model exports in YOLOv5.

πŸ“Š Key Changes

  • Added a --hard-fail option to the benchmarking script to enforce stricter validation.
  • Benchmarks now include a reference to mAP (mean Average Precision) values in the coco128.yaml file for different models and image sizes.
  • New functionalities in benchmarks.py check that exported models meet predetermined performance thresholds.
  • Enhanced error handling to provide clearer feedback during the benchmarking process.

🎯 Purpose & Impact

  • 🎯 Ensures that model performance remains above defined thresholds, increasing confidence in model reliability.
  • 🎯 Automates validation checks for model exports, reducing the potential for human error and improving workflow efficiency.
  • 🎯 Provides a reference for expected model performance, which developers and users can use as a benchmark for their applications.
  • 🎯 Enhances the robustness of benchmark testing, helping to catch regressions or issues following changes to the codebase.

The changes primarily impact developers conducting benchmarks for YOLOv5 models, but they also benefit users by ensuring high-quality model performance standards.

@JWLee89 JWLee89 force-pushed the export-ci branch 3 times, most recently from fc9f15a to ff99d04 Compare May 22, 2022 07:53
@glenn-jocher
Copy link
Member

@JWLee89 thanks for the PR. We already have most export verification functionality included in utils/benchmarks.py, i.e. see #6613. This file runs all possible exports and checks export success and mAP and speeds if exports succeeded.

We want to use this existing code as the basis for any CI updates, by just making a few small changes to create hard rather than soft failures, i.e. python utils/benchmarks --hard-fail, and to also assert output mAPs above threshold.

Colab Pro+ High-RAM CPU Results

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=cpu, half=False, test=False
Checking setup...
YOLOv5 πŸš€ v6.1-135-g7926afc torch 1.10.0+cu111 CPU
Setup complete βœ… (8 CPUs, 51.0 GB RAM, 41.5/166.8 GB disk)

Benchmarks complete (241.20s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4623               127.61
1             TorchScript        0.4623               131.23
2                    ONNX        0.4623                69.34
3                OpenVINO        0.4623                66.52
4                TensorRT           NaN                  NaN
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel        0.4623               123.79
7     TensorFlow GraphDef        0.4623               121.57
8         TensorFlow Lite        0.4623               316.61
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

@glenn-jocher
Copy link
Member

glenn-jocher commented May 22, 2022

About deleting exported models, I'm not clear on the reason you want to delete them, but if you want to you can simply point utils/benchmarks.py --weights to a new directory. The PyTorch model automatically downloads to this directory and all exports are located alongside it. After CI you can simply delete the directory, i.e.:

python utils/benchmarks.py --weights weights/yolov5n.pt
rm -rf weights/

@JWLee89
Copy link
Contributor Author

JWLee89 commented May 23, 2022

@glenn-jocher

@JWLee89 thanks for the PR. We already have most export verification functionality included in utils/benchmarks.py, i.e. see #6613. This file runs all possible exports and checks export success and mAP and speeds if exports succeeded.

We want to use this existing code as the basis for any CI updates, by just making a few small changes to create hard rather than soft failures, i.e. python utils/benchmarks --hard-fail, and to also assert output mAPs above threshold.

Thank you for the clarification. I will update this PR to match the specifications by modifying utils/benchmark.py and adding it to the CI workflow.

I just ran the test on my local device and it seems that this does take some time to run, so I am planning on creating a separate workflow ci-benchmarks.yml so that it runs separately to ci-testing.yml workflow. Would this be okay?

I will ping you when the updated code is ready for review.

About deleting exported models, I'm not clear on the reason you want to delete them, but if you want to you can simply point utils/benchmarks.py --weights to a new directory. The PyTorch model automatically downloads to this directory and all exports are located alongside it. After CI you can simply delete the directory, i.e.:

I wanted to delete exported models if in the case that we run the CI workflow on a self-hosted server, we don't leave dangling files on the server's local filesystem.

Some questions

  1. Regarding the threshold mAP values, I think that instead of hardcoding them, since they are data-related, i thought adding benchmarks properties under data/<dataset-name>.yaml (i.e. data/coco128.yaml) would be appropriate (please see yaml file below). Would this be okay?
# Download script/URL (optional)
download: https://ultralytics.com/assets/coco128.zip

# Benchmark values
benchmarks:
  # metrics E.g. mAP
  mAP:
    # Img size 640
    yolov5n: 0.45
    yolov5s: 0.45
    yolov5m: 0.45
    yolov5l: 0.45
    yolov5x: 0.45

    # Img size 1280
    yolov5n6: 0.5
    yolov5s6: 0.5
    yolov5m6: 0.6
    yolov5l6: 0.6
    yolov5x6: 0.6

For now, If the benchmark property does not exist under the yaml file, we will skip the assertions.

  1. For now, I added arbitrary threshold values to the yaml file, but I was wondering if there are official threshold values (for each model type) that you have in mind. If so, please let me know and I will add them in this PR.

  2. Could the benchmark tests be extended to other datasets? e.g. VOC. If so, please let me know if you have any benchmark threshold values in mind.

@JWLee89
Copy link
Contributor Author

JWLee89 commented May 23, 2022

@glenn-jocher I have added the assertions to check whether mAP lies above a certain threshold. The test is running on CI-workflow called ci-benchmarking.yml. To run it locally, type the following:

python utils/benchmarks.py --weights ${{ matrix.model }}.pt --hard-fail

Summary on implementation details

  • Threshold values stored within data/coco128.yaml since threshold values / selected evaluation metrics may vary based on datasets.

  • Workflow CI Benchmarking runs benchmarking on yolov5n and yolov5s

    • Runs on ubuntu-latest
    • Threshold value provided right now is an arbitrary value of 0.45 for yolov5n and yolov5s.
  • Added --hard-fail flag. If specified, it will raise error if any error occurs and if mAP is below certain threshold. Implementation-wise, the for loop is not encased within a try-except block.

    • Note that CI will fail if the given models fail to export. Some models that require specific OS or environments (e.g. edgetpu) are omitted from the check.
  • Logging pandas DataFrame fails on CI, so workaround pretty-printing added.

  • Omitted the following formats:

    • coreml: Inference only supported on MacOS
    • edgetpu, tfjs: Previously omitted. Edgetpu requires special device such as coral board or tpu to run inference
    • TensorRT: we need to run it inside of docker on a dedicated host with cuda. Otherwise, we get the following error
TensorRT: export failure: export running on CPU but must be on GPU, i.e. `python export.py --device 0`
Traceback (most recent call last):
  File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 222, in <module>
    main(opt)
  File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 217, in main
    test(**vars(opt)) if opt.test else run(**vars(opt))
  File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 136, in run
    benchmarks = get_benchmark_values(name, f, suffix, gpu, weights, data, imgsz, half, batch_size, device)
  File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 104, in get_benchmark_values
    w = export.run(weights=weights, imgsz=[imgsz], include=[f], device=device, half=half)[-1]  # all others
IndexError: list index out of range

mAP threshold check can be seen below. Applied a threshold value of 0.8 to show that it raises the appropriate error when the model does not meet threshold requirements.

Screen Shot 2022-05-23 at 10 31 17 PM

@JWLee89 JWLee89 force-pushed the export-ci branch 3 times, most recently from 53c6f08 to ca3d4aa Compare May 23, 2022 09:07
@JWLee89 JWLee89 changed the title Added automated basic model export to CI Added Benchmarking CI with --hard-fail flag May 23, 2022
@JWLee89 JWLee89 force-pushed the export-ci branch 2 times, most recently from d1145dd to ea77cc2 Compare May 23, 2022 12:55
@JWLee89 JWLee89 force-pushed the export-ci branch 2 times, most recently from 1c854b9 to 3fc7703 Compare May 24, 2022 05:08
@JWLee89
Copy link
Contributor Author

JWLee89 commented May 25, 2022

@glenn-jocher When you have some time to spare, could you please take a look at the PR?

I noticed that in #6613, the benchmarks are performed using yolov5s. Right now, the automation for model export failure and mAP threshold check is being done for both yolov5s and yolov5n. If we want to run the check for all available yolo models, we can have two separate CI workflow for s6 (input size 1280) and s (input size 640) series.

For additional information, please feel free to ask. Details are also recorded in my previous comments.

Thank you for taking time out of your schedule to take a look at the PR.

@glenn-jocher
Copy link
Member

@JWLee89 yes I've got this as a TODO, will get to it soon!

@JWLee89
Copy link
Contributor Author

JWLee89 commented May 27, 2022

@JWLee89 yes I've got this as a TODO, will get to it soon!

@glenn-jocher Thank you for letting me know!

@glenn-jocher
Copy link
Member

@JWLee89 I've added benchmarking by default to all CI runs now in #7996:

- name: Run benchmarks
run: |
python utils/benchmarks.py --weights ${{ matrix.model }}.pt --img 320

This runs YOLOv5n at 320 and should produce these results:
https://github.com/ultralytics/yolov5/runs/6623481183?check_suite_focus=true
Screenshot 2022-05-27 at 11 56 18

@JWLee89
Copy link
Contributor Author

JWLee89 commented May 27, 2022

@glenn-jocher Thank you for the follow up!

I noticed that you removed the --hard-fail flag. Is this intentional? Just wanted to double-check. If the --hard-fail feature is not what you had in mind, please let me know and I will remove / update the code accordingly.

And right now, the CI benchmark testing is failing, because the mAP 0.5:0.95 for yolov5n is lower than the currently specified lower threshold value of 0.32.

File "/home/runner/work/yolov5/yolov5/utils/benchmarks.py", line 166, in run
    raise ThresholdError(f'mAP value: {mAP} is below threshold value: {map_benchmark_threshold}')
__main__.ThresholdError: mAP value: 0.2927 is below threshold value: 0.32

Is there any specific threshold value that you have in mind for yolov5n?

If the mAP threshold check feature is not needed, I can also remove / comment out the code from this PR. Let me know.

Thanks!

@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 πŸš€ and Vision AI ⭐.

@github-actions github-actions bot added the Stale label Jun 30, 2022
@github-actions github-actions bot removed the Stale label Jul 29, 2022
@glenn-jocher glenn-jocher removed the TODO label Jul 31, 2022
@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 πŸš€ and Vision AI ⭐.

@github-actions github-actions bot added the Stale label Mar 22, 2023
@github-actions github-actions bot removed the Stale label Apr 10, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2023

πŸ‘‹ Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap.

We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved.

For additional resources and information, please see the links below:

Thank you for your contributions to YOLO πŸš€ and Vision AI ⭐

@github-actions github-actions bot added the Stale label Oct 3, 2023
@github-actions github-actions bot closed this Nov 3, 2023
@glenn-jocher
Copy link
Member

@JWLee89 The --hard-fail flag was intentionally removed for simpler and consistent CI results, favoring the existing behavior to maintain consistent user experience.

Regarding the mAP threshold issue for yolov5n, it seems the specified lower threshold value of 0.32 might be too high. I currently do not have a specific threshold value in mind for yolov5n. If the mAP threshold check feature is not essential, feel free to remove or comment out the code from the PR.

Your efforts to improve the benchmarking process are valuable, and I appreciate your attention to detail.

@JWLee89
Copy link
Contributor Author

JWLee89 commented Nov 15, 2023

@JWLee89 The --hard-fail flag was intentionally removed for simpler and consistent CI results, favoring the existing behavior to maintain consistent user experience.

Regarding the mAP threshold issue for yolov5n, it seems the specified lower threshold value of 0.32 might be too high. I currently do not have a specific threshold value in mind for yolov5n. If the mAP threshold check feature is not essential, feel free to remove or comment out the code from the PR.

Your efforts to improve the benchmarking process are valuable, and I appreciate your attention to detail.

@glenn-jocher Thank you for the update and not a problem. If there is some other high-priority / useful items that needs to be worked on, I am more than happy to work on some of these items.

Since I have not been actively working on Yolo for a while, I might need a day or two to read through the project / source code again to get myself re-acquainted.

@glenn-jocher
Copy link
Member

@JWLee89 Your willingness to contribute is greatly appreciated! Re-familiarizing yourself with the project sounds like a solid plan. Feel free to reach out if you have any questions or need assistance with anything. Your valuable insights and contributions are always welcome. Thank you for your support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants