Skip to content

Commit

Permalink
Merge branch 'master' into feauture/inferenceBenchmark
Browse files Browse the repository at this point in the history
  • Loading branch information
bfreskura committed Jun 7, 2024
2 parents f1aaafb + 4f4f263 commit 4367742
Show file tree
Hide file tree
Showing 16 changed files with 456 additions and 79 deletions.
67 changes: 67 additions & 0 deletions .github/workflows/docker_push.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: Create and push cuda118 + cuda120 docker images to this repo's packages

on:
push:
branches:
- master
workflow_dispatch: {}

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
build-and-push-cuda-images:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
-
name: Checkout repository
uses: actions/checkout@v4

-
name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
#CUDA118 steps
-
name: Extract cuda118 image metadata
id: meta_118
uses: docker/metadata-action@v3
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: cuda118

-
name: Build and push cuda118 image
uses: docker/build-push-action@v5
with:
context: .
file: dockerfiles/cuda118/Dockerfile
push: true
tags: ${{ steps.meta_118.outputs.tags }}
labels: ${{ steps.meta_118.outputs.labels }}
#CUDA120 steps
-
name: Extract cuda120 image metadata
id: meta_120
uses: docker/metadata-action@v3
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=raw,value=cuda120
type=raw,value=latest
-
name: Build and push cuda120 image
uses: docker/build-push-action@v5
with:
context: .
file: dockerfiles/cuda120/Dockerfile
push: true
tags: ${{ steps.meta_120.outputs.tags }}
labels: ${{ steps.meta_120.outputs.labels }}
26 changes: 26 additions & 0 deletions .github/workflows/linter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Run pre-commit hooks

on:
push:
pull_request:
branches: [master]

jobs:
build:
name: Run pre-commit hooks
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: "3.10"

- name: Install pre-commit
run: pip install -r requirements-dev.txt

- name: Run pre-commit checks
run: pre-commit run --all-files
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
lightning_logs/

*.csv
benchmarks/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
36 changes: 18 additions & 18 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,23 @@ default_language_version:
python: python3

repos:
- repo: https://github.com/PyCQA/isort
rev: 5.13.2
hooks:
- id: isort
name: Format imports
args: ["--profile", "black"]

- repo: https://github.com/psf/black
rev: 24.1.1
hooks:
- id: black
name: black
entry: black
types: [python]

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
Expand All @@ -20,29 +35,14 @@ repos:
- id: check-merge-conflict

- repo: https://github.com/asottile/pyupgrade
rev: v3.3.1
rev: v3.15.0
hooks:
- id: pyupgrade
args: [--py38-plus]
name: Upgrade code

- repo: https://github.com/PyCQA/isort
rev: 5.12.0
hooks:
- id: isort
name: Format imports
args: ["--profile", "black"]

- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
name: black
entry: black
types: [python]

- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
rev: 7.0.0
hooks:
- id: flake8
types: [python]
Expand Down
9 changes: 0 additions & 9 deletions Dockerfile

This file was deleted.

106 changes: 103 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,105 @@
# benchmarking-cv-models

## Preparing
<p align="center" >
<img width="400" src="https://cdn.tensorpix.ai/TensorPix-Logo-color.svg" alt="Tensorpix logo"/>
</p>

Prepare Pre-commit hooks: `pre-commit install`
---

# Benchmarking CV models

Docker image for simple training benchmark of popular computer vision models.

The benchmark code explicitly focuses on benchmarking only the **pure training loop code**. The dataset is
generated on the fly and directly in RAM with minimal overhead.

There is no extra work done in the training loop such as data preprocessing, model saving, validation, logging...

We use [Lightning AI](https://lightning.ai/) library for benchmarks as it's a popular tool among deep learning practitioners.

It also supports features such as mixed precision, DDP, and multi-GPU training.
Such features can significantly affect benchmark performance so it's important to offer them in benchmarks.

## ❓ Why did we create this?

[Our](https://tensorpix.ai) ML team had a dilemma while choosing the best GPU for our budget. GPU X was 2x the price of GPU Y, but we couldn't find reliable data that shows if GPU X was also 2x the speed of GPU Y.

There were [some benchmarks](https://lambdalabs.com/gpu-benchmarks), but very few of them were specific for computer vision tasks and even fewer for the GPUs we wanted to test. So we created a docker image that does this with minimal setup.

You can use this benchmark repo to:

- See how various GPUs perform on various deep CV architectures
- Benchmark various CV architectures
- See how efficient are multi-GPU setups for a specific GPU
- Test how much you gain in training speed when using Mixed-precision
- Stress test the GPU(s) at near 100% utilization
- Make pizzas (not tested)

## 📋 Supported architectures

Please open an issue if you need support for a new architecture.

* ResNet50
* ConvNext (base)
* VGG16
* Efficient Net v2
* MobileNet V3
* ResNeXt50
* SWIN
* VIT
* UNet with ResNet50 backbone

## 📖 How to benchmark

### Prerequisites

In order to run benchmark docker containers you must have the following installed on the host machine:

- Docker (we used v24.0.6 for testing)
- NVIDIA drivers. See [Versions](#versions) when choosing the docker image.
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) - required in order to use CUDA inside docker containers

### Examples

**Minimal**

`docker run --rm --ipc=host --ulimit memlock=-1 --gpus all ghcr.io/tensorpix/benchmarking-cv-models --batch-size 32`

**Advanced**

`docker run --rm --ipc=host --ulimit memlock=-1 --gpus '"device=0,1"' -v ./benchmarks:/workdir/benchmarks ghcr.io/tensorpix/benchmarking-cv-models --batch-size 32 --n-iters 1000 --warmup-steps 100 --model resnext50 --precision 16-mixed --width 320 --height 320`

**List all options:**

`docker run --rm ghcr.io/tensorpix/benchmarking-cv-models --help`

### How to select particular GPUs

If you want to use all available GPUs, then set the `--gpus all` docker parameter.

If want to use for example GPUs at indicies 2 and 3, set `--gpus '"device=2,3"'`.

### Logging results to a persistent CSV file

Benchmark code will create a CSV file with benchmark results on every run. The file will exist inside the docker container, but you have to mount it in order to see it on the host machine.

To do so, use the following docker argument when running a container: `-v <host/benchmark/folder>:/workdir/benchmarks`. See the [advanced example](#examples) for more details. The CSV file will reside in the mounted host directory.

We also recommend that you create the `<host/benchmark/folder>` on the host before running the container as the container will create the folder under the `root` user if it doesn't exist on the host.

### Versions

We support two docker images: one for CUDA 12.0 and second for CUDA 11.8. The `12.0` version is on the latest docker tag, while `11.8` is on the `ghcr.io/tensorpix/benchmarking-cv-models:cuda118` tag.

`11.8` version supports earlier NVIDIA drivers so if you run into driver related errors, try this image instead.

## 📊 Metrics

We use 3 metrics for the benchmark:

- Images per second
- Batches per second
- Megapixels per second

Images/s and batches/s are self-explanatory. Megapixels/s (MPx) are not usually used but we like this metric as it's input resolution independent.

It's calculated according to the following formula: `(input_width_px * input_height_px * batch_size * n_gpus * n_iterations) / (elapsed_time_s * 10^6)`
16 changes: 16 additions & 0 deletions dockerfiles/cuda118/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM ubuntu:22.04

RUN apt update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
python3 \
python3-pip \
curl && \
apt clean && \
rm -rf /var/lib/apt/lists/*

COPY requirements.txt /tmp/requirements.txt
RUN pip3 install --no-cache-dir -r /tmp/requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118

COPY ./src /workdir/src
WORKDIR /workdir

ENTRYPOINT [ "python3", "-m", "src.train" ]
16 changes: 16 additions & 0 deletions dockerfiles/cuda120/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM ubuntu:22.04

RUN apt update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
python3 \
python3-pip \
curl && \
apt clean && \
rm -rf /var/lib/apt/lists/*

COPY requirements.txt /tmp/requirements.txt
RUN pip3 install --no-cache-dir -r /tmp/requirements.txt

COPY ./src /workdir/src
WORKDIR /workdir

ENTRYPOINT [ "python3", "-m", "src.train" ]
2 changes: 1 addition & 1 deletion requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
pre-commit~=3.5.0
pre-commit==3.*
9 changes: 6 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
lightning==2.1.1
torch==2.3.1
torchvision==0.18.1
lightning==2.1.4
protobuf==3.20.*
segmentation-models-pytorch==0.3.3
six==1.16.0
torch==2.1.2
torchvision==0.16.2
Loading

0 comments on commit 4367742

Please sign in to comment.