Training directly with FP16 #1880

caiusdebucean · 2021-01-09T02:30:34Z

❔ Can you train yolov5 in FP16?

Additional context

I understand that the model is converted to FP16 at the end of the training cycle, and its inferenced in FP16. I was wondering if its possible to train in half-precision or mixed-precision, in order to speed up the training process. Thanks.

github-actions · 2021-01-09T02:31:15Z

👋 Hello @caiusdebucean, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher · 2021-01-09T03:28:51Z

@caiusdebucean YOLOv5 models are trained using Automatic Mixed Precision (AMP).

yolov5/train.py

Lines 290 to 298 in 3e25f1e

    
           # Forward 
        
           with amp.autocast(enabled=cuda): 
        
               pred = model(imgs)  # forward 
        
               loss, loss_items = compute_loss(pred, targets.to(device), model)  # loss scaled by batch_size 
        
               if rank != -1: 
        
                   loss *= opt.world_size  # gradient averaged between devices in DDP mode 
        
               if opt.quad: 
        
                   loss *= 4.

suddulavenkatanaresh · 2021-09-10T06:07:46Z

hai iam bigfan of yolov5

1.first u have to tarin on the fp32 then convert to onxx and convert tensorrt fp16 for inference .which can increases the your speed to more than as u expected in training bro

Alberto1404 · 2022-12-19T13:43:19Z

@glenn-jocher when I convert the .pt into .wts and .cfg via marcoslucianops/DeepStream-Yolo repo for further deployment on Jetson, it is required to have FP16 or INT8 weights if I want to use DLA.(It is required for me to leave the GPU for other processes) How should I procceed?

caiusdebucean · 2022-12-19T14:03:15Z

You can convert the .pt model to FP16/INT8. Check #10505 - here there are presented the export parameters. For example, --half will export the model in FP16.

Alberto1404 · 2022-12-19T15:02:15Z

@caiusdebucean my output should be a .pt file aswell.
Trying --include or --include - does not work for me.
What should be the exact instruction to export it?

glenn-jocher · 2022-12-19T16:05:43Z

👋 Hello! Thanks for asking about Export Formats. YOLOv5 🚀 offers export to almost all of the common export formats. See our TFLite, ONNX, CoreML, TensorRT Export Tutorial for full details.

Formats

YOLOv5 inference is officially supported in 11 formats:

💡 ProTip: Export to ONNX or OpenVINO for up to 3x CPU speedup. See CPU Benchmarks.
💡 ProTip: Export to TensorRT for up to 5x GPU speedup. See GPU Benchmarks.

Format	`export.py --include`	Model
PyTorch	-	`yolov5s.pt`
TorchScript	`torchscript`	`yolov5s.torchscript`
ONNX	`onnx`	`yolov5s.onnx`
OpenVINO	`openvino`	`yolov5s_openvino_model/`
TensorRT	`engine`	`yolov5s.engine`
CoreML	`coreml`	`yolov5s.mlmodel`
TensorFlow SavedModel	`saved_model`	`yolov5s_saved_model/`
TensorFlow GraphDef	`pb`	`yolov5s.pb`
TensorFlow Lite	`tflite`	`yolov5s.tflite`
TensorFlow Edge TPU	`edgetpu`	`yolov5s_edgetpu.tflite`
TensorFlow.js	`tfjs`	`yolov5s_web_model/`
PaddlePaddle	`paddle`	`yolov5s_paddle_model/`

Benchmarks

Benchmarks below run on a Colab Pro with the YOLOv5 tutorial notebook . To reproduce:

python benchmarks.py --weights yolov5s.pt --imgsz 640 --device 0

Colab Pro V100 GPU

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=0, half=False, test=False
Checking setup...
YOLOv5 🚀 v6.1-135-g7926afc torch 1.10.0+cu111 CUDA:0 (Tesla V100-SXM2-16GB, 16160MiB)
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 46.7/166.8 GB disk)

Benchmarks complete (458.07s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4623                10.19
1             TorchScript        0.4623                 6.85
2                    ONNX        0.4623                14.63
3                OpenVINO           NaN                  NaN
4                TensorRT        0.4617                 1.89
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel        0.4623                21.28
7     TensorFlow GraphDef        0.4623                21.22
8         TensorFlow Lite           NaN                  NaN
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

Colab Pro CPU

benchmarks: weights=/content/yolov5/yolov5s.pt, imgsz=640, batch_size=1, data=/content/yolov5/data/coco128.yaml, device=cpu, half=False, test=False
Checking setup...
YOLOv5 🚀 v6.1-135-g7926afc torch 1.10.0+cu111 CPU
Setup complete ✅ (8 CPUs, 51.0 GB RAM, 41.5/166.8 GB disk)

Benchmarks complete (241.20s)
                   Format  mAP@0.5:0.95  Inference time (ms)
0                 PyTorch        0.4623               127.61
1             TorchScript        0.4623               131.23
2                    ONNX        0.4623                69.34
3                OpenVINO        0.4623                66.52
4                TensorRT           NaN                  NaN
5                  CoreML           NaN                  NaN
6   TensorFlow SavedModel        0.4623               123.79
7     TensorFlow GraphDef        0.4623               121.57
8         TensorFlow Lite        0.4623               316.61
9     TensorFlow Edge TPU           NaN                  NaN
10          TensorFlow.js           NaN                  NaN

Export a Trained YOLOv5 Model

This command exports a pretrained YOLOv5s model to TorchScript and ONNX formats. yolov5s.pt is the 'small' model, the second smallest model available. Other options are yolov5n.pt, yolov5m.pt, yolov5l.pt and yolov5x.pt, along with their P6 counterparts i.e. yolov5s6.pt or you own custom training checkpoint i.e. runs/exp/weights/best.pt. For details on all available models please see our README table.

python export.py --weights yolov5s.pt --include torchscript onnx

💡 ProTip: Add --half to export models at FP16 half precision for smaller file sizes

Output:

export: data=data/coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, train=False, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=12, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['torchscript', 'onnx']
YOLOv5 🚀 v6.2-104-ge3e5122 Python-3.7.13 torch-1.12.1+cu113 CPU

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt to yolov5s.pt...
100% 14.1M/14.1M [00:00<00:00, 274MB/s]

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients

PyTorch: starting from yolov5s.pt with output shape (1, 25200, 85) (14.1 MB)

TorchScript: starting export with torch 1.12.1+cu113...
TorchScript: export success ✅ 1.7s, saved as yolov5s.torchscript (28.1 MB)

ONNX: starting export with onnx 1.12.0...
ONNX: export success ✅ 2.3s, saved as yolov5s.onnx (28.0 MB)

Export complete (5.5s)
Results saved to /content/yolov5
Detect:          python detect.py --weights yolov5s.onnx 
Validate:        python val.py --weights yolov5s.onnx 
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.onnx')
Visualize:       https://netron.app/

The 3 exported models will be saved alongside the original PyTorch model:

Netron Viewer is recommended for visualizing exported models:

Exported Model Usage Examples

detect.py runs inference on exported models:

python detect.py --weights yolov5s.pt                 # PyTorch
                           yolov5s.torchscript        # TorchScript
                           yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn
                           yolov5s_openvino_model     # OpenVINO
                           yolov5s.engine             # TensorRT
                           yolov5s.mlmodel            # CoreML (macOS only)
                           yolov5s_saved_model        # TensorFlow SavedModel
                           yolov5s.pb                 # TensorFlow GraphDef
                           yolov5s.tflite             # TensorFlow Lite
                           yolov5s_edgetpu.tflite     # TensorFlow Edge TPU
                           yolov5s_paddle_model       # PaddlePaddle

val.py runs validation on exported models:

python val.py --weights yolov5s.pt                 # PyTorch
                        yolov5s.torchscript        # TorchScript
                        yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn
                        yolov5s_openvino_model     # OpenVINO
                        yolov5s.engine             # TensorRT
                        yolov5s.mlmodel            # CoreML (macOS Only)
                        yolov5s_saved_model        # TensorFlow SavedModel
                        yolov5s.pb                 # TensorFlow GraphDef
                        yolov5s.tflite             # TensorFlow Lite
                        yolov5s_edgetpu.tflite     # TensorFlow Edge TPU
                        yolov5s_paddle_model       # PaddlePaddle

Use PyTorch Hub with exported YOLOv5 models:

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.pt')
                                                       'yolov5s.torchscript ')       # TorchScript
                                                       'yolov5s.onnx')               # ONNX Runtime
                                                       'yolov5s_openvino_model')     # OpenVINO
                                                       'yolov5s.engine')             # TensorRT
                                                       'yolov5s.mlmodel')            # CoreML (macOS Only)
                                                       'yolov5s_saved_model')        # TensorFlow SavedModel
                                                       'yolov5s.pb')                 # TensorFlow GraphDef
                                                       'yolov5s.tflite')             # TensorFlow Lite
                                                       'yolov5s_edgetpu.tflite')     # TensorFlow Edge TPU
                                                       'yolov5s_paddle_model')       # PaddlePaddle

# Images
img = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, PIL, OpenCV, numpy, list

# Inference
results = model(img)

# Results
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.

OpenCV DNN inference

OpenCV inference with ONNX models:

python export.py --weights yolov5s.pt --include onnx

python detect.py --weights yolov5s.onnx --dnn  # detect
python val.py --weights yolov5s.onnx --dnn  # validate

C++ Inference

YOLOv5 OpenCV DNN C++ inference on exported ONNX model examples:

YOLOv5 OpenVINO C++ inference examples:

Good luck 🍀 and let us know if you have any other questions!

Kanan99 · 2024-05-22T09:17:43Z

@glenn-jocher when I convert the .pt into .wts and .cfg via marcoslucianops/DeepStream-Yolo repo for further deployment on Jetson, it is required to have FP16 or INT8 weights if I want to use DLA.(It is required for me to leave the GPU for other processes) How should I proceed?

@Alberto1404 While converting within marcoslucianops/DeepStream-Yolo repo did you observe a drastic drop in accuracy of the model?

glenn-jocher · 2024-05-24T02:05:29Z

Hello @Kanan99! To ensure your model uses FP16 or INT8 weights for DLA on Jetson, you'll need to convert your .pt model to the desired precision before exporting to .wts and .cfg. You can do this directly in the YOLOv5 repository using the --half flag during the export for FP16. Here’s a quick example:

python export.py --weights yolov5s.pt --include torchscript --half

This command converts the model to FP16, which you can then further convert using the DeepStream-Yolo repo. If you're experiencing a drastic drop in accuracy, it might be due to the precision reduction. It's a common trade-off for the speed gains on DLA. You might want to experiment with different quantization approaches or adjust the confidence thresholds to mitigate this. 😊

caiusdebucean added the question Further information is requested label Jan 9, 2021

caiusdebucean closed this as completed Jan 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training directly with FP16 #1880

Training directly with FP16 #1880

caiusdebucean commented Jan 9, 2021

github-actions bot commented Jan 9, 2021 •

edited by glenn-jocher

Loading

glenn-jocher commented Jan 9, 2021

suddulavenkatanaresh commented Sep 10, 2021

Alberto1404 commented Dec 19, 2022

caiusdebucean commented Dec 19, 2022

Alberto1404 commented Dec 19, 2022

glenn-jocher commented Dec 19, 2022 •

edited

Loading

Kanan99 commented May 22, 2024 •

edited

Loading

glenn-jocher commented May 24, 2024

Training directly with FP16 #1880

Training directly with FP16 #1880

Comments

caiusdebucean commented Jan 9, 2021

❔ Can you train yolov5 in FP16?

Additional context

github-actions bot commented Jan 9, 2021 • edited by glenn-jocher Loading

Requirements

Environments

Status

glenn-jocher commented Jan 9, 2021

suddulavenkatanaresh commented Sep 10, 2021

Alberto1404 commented Dec 19, 2022

caiusdebucean commented Dec 19, 2022

Alberto1404 commented Dec 19, 2022

glenn-jocher commented Dec 19, 2022 • edited Loading

Formats

Benchmarks

Colab Pro V100 GPU

Colab Pro CPU

Export a Trained YOLOv5 Model

Exported Model Usage Examples

OpenCV DNN inference

C++ Inference

Kanan99 commented May 22, 2024 • edited Loading

glenn-jocher commented May 24, 2024

github-actions bot commented Jan 9, 2021 •

edited by glenn-jocher

Loading

glenn-jocher commented Dec 19, 2022 •

edited

Loading

Kanan99 commented May 22, 2024 •

edited

Loading