Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help Wanted: CI Automation Tools for Unit Tests #407

Closed
glenn-jocher opened this issue Jul 14, 2020 · 7 comments
Closed

Help Wanted: CI Automation Tools for Unit Tests #407

glenn-jocher opened this issue Jul 14, 2020 · 7 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed Stale

Comments

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 14, 2020

🚀 Feature

Automated testing on every commit and on PRs to make sure that updates are not introducing bugs!

Motivation

Due to high demand and very limited resources, we are not able to manually test every commit and PR before merging with the master branch. This occasionally introduces bugs (!) which may be cloned by up to 1000 people per day currently. We need to reject problem commits and PRs before they make it into the master branch.

To solve this problem, and to better automate YOLOv5 development we want to use Continuous Integration (CI) tools (i.e. GitHub Actions) to automatically run unit tests on every new commit, and on PRs before merging.

One major challenge is requisitioning a suitable hardware backend on demand for this purpose. Our unit tests below require checking operations on train/test/detect combinations of multi-GPU, single-GPU and CPU.

git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install -qr requirements.txt onnx
python3 -c "from utils.google_utils import *; gdrive_download('1n_oKgR81BJtqk75b00eAjdv03qVCQn2f', 'coco128.zip')" && mv ./coco128 ../

export PYTHONPATH="$PWD" # to run *.py. files in subdirectories
for x in yolov5s yolov5m yolov5l yolov5x # models
do
  python train.py --weights $x.pt --cfg $x.yaml --epochs 3 --img 320 --device 0,1  # train
  for di in 0,1 0 cpu # inference devices
  do
    python detect.py --weights $x.pt --device $di  # detect official
    python detect.py --weights runs/exp0/weights/last.pt --device $di  # detect custom
    python test.py --weights $x.pt --device $di # test official
    python test.py --weights runs/exp0/weights/last.pt --device $di # test custom
  done
  python models/yolo.py --cfg $x.yaml # inspect
  python models/export.py --weights $x.pt --img 640 --batch 1 # export
done

Alternatives

We already use Docker AutoBuild for image deployment. Docker AutoTest would seem to be a natural addition to this, but unfortunately we saw that this offers limited hardware constraints (i.e. <2GB RAM) unsuitable for running our tests, and no GPU availability either. https://docs.docker.com/docker-hub/builds/automated-testing/

Additional context

Anyone that is an expert in this area or has successfully implemented these tools in the past please let us know, as we need help here!!

@glenn-jocher glenn-jocher added enhancement New feature or request help wanted Extra attention is needed labels Jul 14, 2020
@glenn-jocher glenn-jocher self-assigned this Jul 14, 2020
@dlawrences
Copy link
Contributor

Hi @glenn-jocher

I've been doing some research on this myself. Could you please clarify some of your expectations in terms of the agents that would run these unit tests & builds?

There's managed agents (like the ones Microsoft provisions for you if you opt for Azure DevOps services) and there's self-hosted agents as well you can run on your own resources (VMs etc).

I don't expect any of the free CI services out there would provision any GPU accelerated agents for free.

In terms of the required GPU memory for single-GPU and multi-GPU, what would you say it would be the minimal required for this dataset and the way the unit tests are executed?

Cheers

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Jul 16, 2020

@dlawrences that's a good question. I had in a mind for example a (paid) GCP VM that's created ahead of time, and started automatically just for unit tests on commits/PRs, shutting down afterwards. The tests take about 5-10 mins, so it might only be running an hour or two a day in total. We could use K80's to further keep the cost down.

I'm just not sure how to get started. Docker AutoTest seems to have a straightforward implementation path, we could run a reduced set of unit tests on cpu-only there, though these would not prevent bugs from entering the master branch, they would only prevent docker from AutoBuilding a bugged commit (and deploying it automatically to Docker Hub as yolov5:latest).

@dlawrences
Copy link
Contributor

Hi @glenn-jocher

I wouldn't go with GCP personally, I have had my fair share of problems trying to secure accelerated GPU resources on this platform without any reservation.

Have a look here: https://aws.amazon.com/ec2/instance-types/

There's g4dn.xlarge that uses NVIDIA T4 with a hourly on-demand price of 0.526 USD. Directly scaling to multi-GPU (4x) would be g4dn.12xlarge with a hourly on-demand price of 3.912 USD.

I imagine you could provision a serverless ECS cluster that has GPU computation capability using a custom Docker image. See here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-gpu.html.

This would mean that you would relieve yourself from the burden of managing actual EC2 instances!

@Borda
Copy link
Contributor

Borda commented Jul 16, 2020

As you require GPU for testing you need to run your own GPU instance to use existing CI supporting GPU which are not many, so far I know only about CircleCI - https://circleci.com/pricing

@glenn-jocher
Copy link
Member Author

@Borda thank you for your recent PR! I think it will help us all out a lot and goes a long way towards addressing our CI shortcomings. We are well covered on CI tests on CPU now! :)

@Borda
Copy link
Contributor

Borda commented Jul 17, 2020

I may recommend use PyTorchLightning which has already test for GPU and TPU then you just need to test CPUs because all other is already tested with PL...

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed Stale
Projects
None yet
Development

No branches or pull requests

3 participants