YOLOv5 (6.0/6.1) brief summary #6998

WZMIAOMIAO · 2022-03-16T04:39:06Z

Content

1. Model Structure
2. Data Augmentation
3. Training Strategies
4. Others

1. Model Structure

YOLOv5 (v6.0/6.1) consists of:

Backbone: New CSP-Darknet53
Neck: SPPF, New CSP-PAN
Head: YOLOv3 Head

Model structure (yolov5l.yaml):

Some minor changes compared to previous versions:

Replace the Focus structure with 6x6 Conv2d(more efficient, refer Is the Focus layer equivalent to a simple Conv layer? #4825)
Replace the SPP structure with SPPF(more than double the speed)

test code

import time
import torch
import torch.nn as nn


class SPP(nn.Module):
    def __init__(self):
        super().__init__()
        self.maxpool1 = nn.MaxPool2d(5, 1, padding=2)
        self.maxpool2 = nn.MaxPool2d(9, 1, padding=4)
        self.maxpool3 = nn.MaxPool2d(13, 1, padding=6)

    def forward(self, x):
        o1 = self.maxpool1(x)
        o2 = self.maxpool2(x)
        o3 = self.maxpool3(x)
        return torch.cat([x, o1, o2, o3], dim=1)


class SPPF(nn.Module):
    def __init__(self):
        super().__init__()
        self.maxpool = nn.MaxPool2d(5, 1, padding=2)

    def forward(self, x):
        o1 = self.maxpool(x)
        o2 = self.maxpool(o1)
        o3 = self.maxpool(o2)
        return torch.cat([x, o1, o2, o3], dim=1)


def main():
    input_tensor = torch.rand(8, 32, 16, 16)
    spp = SPP()
    sppf = SPPF()
    output1 = spp(input_tensor)
    output2 = sppf(input_tensor)

    print(torch.equal(output1, output2))

    t_start = time.time()
    for _ in range(100):
        spp(input_tensor)
    print(f"spp time: {time.time() - t_start}")

    t_start = time.time()
    for _ in range(100):
        sppf(input_tensor)
    print(f"sppf time: {time.time() - t_start}")


if __name__ == '__main__':
    main()

result:

True
spp time: 0.5373051166534424
sppf time: 0.20780706405639648

2. Data Augmentation

Mosaic

Copy paste

Random affine(Rotation, Scale, Translation and Shear)

MixUp

Albumentations
Augment HSV(Hue, Saturation, Value)

Random horizontal flip

3. Training Strategies

Multi-scale training(0.5~1.5x)
AutoAnchor(For training custom data)
Warmup and Cosine LR scheduler
EMA(Exponential Moving Average)
Mixed precision
Evolve hyper-parameters

4. Others

4.1 Compute Losses

The YOLOv5 loss consists of three parts:

Classes loss(BCE loss)
Objectness loss(BCE loss)
Location loss(CIoU loss)

4.2 Balance Losses

The objectness losses of the three prediction layers(P3, P4, P5) are weighted differently. The balance weights are [4.0, 1.0, 0.4] respectively.

4.3 Eliminate Grid Sensitivity

In YOLOv2 and YOLOv3, the formula for calculating the predicted target information is:

In YOLOv5, the formula is:

Compare the center point offset before and after scaling. The center point offset range is adjusted from (0, 1) to (-0.5, 1.5).
Therefore, offset can easily get 0 or 1.

Compare the height and width scaling ratio(relative to anchor) before and after adjustment. The original yolo/darknet box equations have a serious flaw. Width and Height are completely unbounded as they are simply out=exp(in), which is dangerous, as it can lead to runaway gradients, instabilities, NaN losses and ultimately a complete loss of training. refer this issue

4.4 Build Targets

Match positive samples:

Calculate the aspect ratio of GT and Anchor Templates

Assign the successfully matched Anchor Templates to the corresponding cells

Because the center point offset range is adjusted from (0, 1) to (-0.5, 1.5). GT Box can be assigned to more anchors.

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

The text was updated successfully, but these errors were encountered:

WZMIAOMIAO · 2022-03-16T04:47:11Z

@glenn-jocher hi, today I briefly summarized yolov5(v6.0). Please help to see if there are any problems or put forward better suggestions. Some schematic diagrams or contents will be added later. Thank you for your great work.

zlj-ky · 2022-03-16T12:03:05Z

hi, 'prediction layers(P3, P4, P5) are weighted differently', how do I find it in the code, and further, modify it？

WZMIAOMIAO · 2022-03-16T12:18:58Z

hi, 'prediction layers(P3, P4, P5) are weighted differently', how do I find it in the code, and further, modify it？

yolov5/utils/loss.py

Line 111 in c09fb2a

    
           self.balance = {3: [4.0, 1.0, 0.4]}.get(det.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7

and

yolov5/utils/loss.py

Line 156 in c09fb2a

lobj += obji * self.balance[i] # obj loss

zlj-ky · 2022-03-16T12:28:27Z

@WZMIAOMIAO thx！

glenn-jocher · 2022-03-17T11:18:09Z

@WZMIAOMIAO awesome summary, nice work!

@zlj-ky yes the balancing parameters are there, we tuned these manually on COCO. The idea is to balance losses from each layer (just like we balance losses across loss components (box, obj, class)). The reason I didn't turn these into learnable weights is that as absolute values the gradient would always want to drag them to zero to minimize the loss. I suppose we could constantly normalize them so they all sum to 1 to avoid this effect. Might be an interesting experiment, and this might help the balancing adapt better to different datasets and image sizes etc.

WZMIAOMIAO · 2022-03-19T06:14:52Z

@glenn-jocher Could we add this brief summary to the document?

glenn-jocher · 2022-03-20T00:04:19Z

@WZMIAOMIAO yes maybe it's a good idea to document this somewhere. Which document do you mean though?

WZMIAOMIAO · 2022-03-21T06:16:10Z

@glenn-jocher I think it could be added to the Tutorials. What do you think?

Per #6998 (comment)

* Add Architecture Summary to README Tutorials Per #6998 (comment) * Update README.md

glenn-jocher · 2022-03-25T13:45:43Z

@WZMIAOMIAO all done in #7146! Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

glenn-jocher · 2022-04-12T10:44:07Z

@HERIUN built_targets() implements an anchor-label assignment strategy so we can calculate the losses between assigned anchor-label pairs.

xinxin342 · 2022-04-16T14:41:19Z

@glenn-jocher what's the adjustment strategy for the balancing parameters?How to change them to learnable weights?

@WZMIAOMIAO awesome summary, nice work!

@zlj-ky yes the balancing parameters are there, we tuned these manually on COCO. The idea is to balance losses from each layer (just like we balance losses across loss components (box, obj, class)). The reason I didn't turn these into learnable weights is that as absolute values the gradient would always want to drag them to zero to minimize the loss. I suppose we could constantly normalize them so they all sum to 1 to avoid this effect. Might be an interesting experiment, and this might help the balancing adapt better to different datasets and image sizes etc.

@glenn-jocher what's the adjustment strategy for the balancing parameters?How to change them to learnable weights?

glenn-jocher · 2022-04-16T14:55:59Z

@xinxin342 the balance params are here, you'd have to convert them to nn.Parameter types assigned to an existing class and set their compute grad to True:

yolov5/utils/loss.py

Line 112 in c9a3b14

    
           self.balance = {3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7

zlj-ky · 2022-04-16T19:37:39Z

@xinxin342 the balance params are here, you'd have to convert them to nn.Parameter types assigned to an existing class and set their compute grad to True:

yolov5/utils/loss.py

Line 112 in c9a3b14

self.balance = {3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02]) # P3-P7

@glenn-jocher
I try to convert the weight to a learnable parameter like this(Limited by my limited experience)

However, this parameter was not updated during training, I don't know why or how to revise my method. Can you teach me, even though it's a very simple question

glenn-jocher · 2022-04-17T10:36:27Z

@zlj-ky that seems like a good approach, but you might need to place self.w inside the model so it's affected by model.train(), model.eval(), etc. You can just place it inside models.yolo.Detect and then access it like this. (Note your code is out of date):

class ComputeLoss:
    sort_obj_iou = False

    def __init__(self, model, autobalance=False):
        device = next(model.parameters()).device  # get model device
        h = model.hyp  # hyperparameters

        # Define criteria
        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))
        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))

        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets

        # Focal loss
        g = h['fl_gamma']  # focal loss gamma
        if g > 0:
            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

        m = de_parallel(model).model[-1]  # Detect() module
        self.balance = {3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7
        self.ssi = list(m.stride).index(16) if autobalance else 0  # stride 16 index
        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance
        self.na = m.na  # number of anchors
        self.nc = m.nc  # number of classes
        self.nl = m.nl  # number of layers
        self.anchors = m.anchors
        self.w = m.w  # <------------------------ NEW CODE 
        self.device = device

This might or might not work as I don't know if this will create a copy or access the Detect parameter.

Even if you get this to work though It's not clear that these are learnable parameters as I'm not sure if they can be correlated to the gradient directly, i.e. the optimizer seeks to reduce loss, so the rebalance may just weigh higher the lower loss components to reduce loss, which may not have the desired effect.

The same concept applies to anchors, which don't seem learnable either during training.

zlj-ky · 2022-04-18T09:12:43Z

@zlj-ky that seems like a good approach, but you might need to place self.w inside the model so it's affected by model.train(), model.eval(), etc. You can just place it inside models.yolo.Detect and then access it like this. (Note your code is out of date):

class ComputeLoss:
    sort_obj_iou = False

    def __init__(self, model, autobalance=False):
        device = next(model.parameters()).device  # get model device
        h = model.hyp  # hyperparameters

        # Define criteria
        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))
        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))

        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets

        # Focal loss
        g = h['fl_gamma']  # focal loss gamma
        if g > 0:
            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

        m = de_parallel(model).model[-1]  # Detect() module
        self.balance = {3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7
        self.ssi = list(m.stride).index(16) if autobalance else 0  # stride 16 index
        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance
        self.na = m.na  # number of anchors
        self.nc = m.nc  # number of classes
        self.nl = m.nl  # number of layers
        self.anchors = m.anchors
        self.w = m.w  # <------------------------ NEW CODE 
        self.device = device

This might or might not work as I don't know if this will create a copy or access the Detect parameter.

Even if you get this to work though It's not clear that these are learnable parameters as I'm not sure if they can be correlated to the gradient directly, i.e. the optimizer seeks to reduce loss, so the rebalance may just weigh higher the lower loss components to reduce loss, which may not have the desired effect.

The same concept applies to anchors, which don't seem learnable either during training.

@glenn-jocher Thank you for sharing your views on this matter and for your patient guidance. I will try it latter.

Cong-Wan · 2022-09-26T06:52:08Z

@glenn-jocher hi, today I briefly summarized yolov5(v6.0). Please help to see if there are any problems or put forward better suggestions. Some schematic diagrams or contents will be added later. Thank you for your great work.

@WZMIAOMIAO @glenn-jocher Hi, thank for your nice work! There I have two questions, first, how could I print every layers outputs.(Here I'd like to change first layer kernel to small size that it's possible for small object detection.) Next, I also want to add a output for object tracing, ([x,y,w,h,nc] -> [x, y, w, h, nc, id]) but I don't know use which loss function to do it.

kadirnar · 2022-10-07T11:16:23Z

@engrjav FPN and PANet are just two head architectures. Earlier versions of YOLOv5 used FPN and newer versions use PANet. CSP is a type of repeating module which as evolved into the current C3 modules.

Hi @glenn-jocher
Why did you choose PANet? Is there a comparison chart? Do you think to prefer Light-BiFPN module for small models?
Light-Yolov5: https://arxiv.org/pdf/2208.13422.pdf

glenn-jocher · 2022-10-09T21:08:17Z

@kadirnar BiFPN and PANet are nearly identical, in a P3-P5 output model the only difference is a single shortcut. There are versions of all 3 heads available here:
https://github.com/ultralytics/yolov5/tree/master/models/hub

As always all design decisions are based on empirical results.

divided7 · 2022-11-17T02:20:13Z

Hello，can we get the results of the ablation experiment？Such as SPP2SPPF、Focus2Conv mAP results on big datasets

glenn-jocher · 2022-11-17T14:14:13Z

@divided-by-7 I'm sorry, we don't this R&D saved in a presentable manner.

dlod-openvino · 2022-11-27T08:11:49Z

@WZMIAOMIAO Could you please summarize the YOLOv5 Instance Segmentation Model Structure? especially the keywords definition of output0 float32[1,25200,117] and output1 float32[1,32,160,160]. Thank you very much in advance!

ishakpacal · 2022-12-11T10:19:52Z

Dear @glenn-jocher @WZMIAOMIAO
The segmentation part is excellent. What has changed in the model architecture related to this, could you provide an example model architecture, thanks in advance.

tayahiyukon · 2022-12-16T07:18:01Z

Hi! What do k, s, p, and c represent in the model structure, respectively?

XueZ-phd · 2022-12-16T07:33:41Z

Hi! What do k, s, p, and c represent in the model structure, respectively?

This is a simple question. k = kernel size, s = stride, p = padding, c = channel dims

tayahiyukon · 2022-12-17T08:11:45Z

Hi! What do k, s, p, and c represent in the model structure, respectively?

This is a simple question. k = kernel size, s = stride, p = padding, c = channel dims

Okay, thank you very much!

karl-gardner · 2022-12-26T19:18:34Z

Hello @glenn-jocher or anyone who knows the answer. I am trying to understand the build targets process a little more. When you say GTx%1>0.5 and GTy%1>0.5 is the % just the modulus? If it is the modulo operator, then why is this used?

Thanks,

Karl Gardner

scraus · 2022-12-27T13:33:21Z

@WZMIAOMIAO @glenn-jocher or anyone who knows. I am trying to understand more about the model structure. Is there an article that discusses and explains the YOLOv5 structure? Thanks!

* Add Architecture Summary to README Tutorials Per ultralytics/yolov5#6998 (comment) * Update README.md

gracesmrngkr · 2023-06-12T10:00:12Z

Hi @glenn-jocher can i know what is the formula if input image 640x640x3 becomes 320x320x64 with k=3 s=2 p=1?

glenn-jocher · 2023-11-15T01:48:32Z

@gracesmrngkr this transformation is governed by the following formula:

[
\text{output_size} = \left\lfloor \frac{\text{input_size} - \text{kernel_size} + 2\times \text{padding}}{\text{stride}} \right\rfloor + 1
]

So in this case, with an input size of 640 and a kernel size of 3, a stride of 2, and padding of 1, the output size would be 320.

This was referenced Mar 17, 2022

YOLOv5 6.0 Model Structure #6885

Closed

Learn about build targets #6863

Closed

glenn-jocher added a commit that referenced this issue Mar 25, 2022

Add Architecture Summary to README Tutorials

52c4cab

Per #6998 (comment)

glenn-jocher mentioned this issue Mar 25, 2022

Add Architecture Summary to README Tutorials #7146

Merged

glenn-jocher added a commit that referenced this issue Mar 25, 2022

Add Architecture Summary to README Tutorials (#7146)

7a2a118

* Add Architecture Summary to README Tutorials Per #6998 (comment) * Update README.md

glenn-jocher linked a pull request Mar 25, 2022 that will close this issue

Add Architecture Summary to README Tutorials #7146

Merged

glenn-jocher added the documentation Improvements or additions to documentation label Mar 25, 2022

glenn-jocher assigned WZMIAOMIAO Mar 25, 2022

This was referenced Mar 31, 2022

Getting the printed data during training? #7231

Closed

Training a model on MBP M1 extremely slow #7308

Closed

Joint dataset training question #6904

Closed

Add TensorFlow and TFLite export #1127

Merged

glenn-jocher mentioned this issue Apr 10, 2022

Is there a way different than Pandas to read bounding box data? #7368

Closed

1 task

glenn-jocher mentioned this issue Apr 12, 2022

Validation step is not happening while training #7397

Closed

2 tasks

glenn-jocher mentioned this issue Apr 17, 2022

pruned weight #7455

Closed

1 task

This was referenced Sep 25, 2022

I have a .pt how can I load it with model.hub.load() and run validation ultralytics/yolov3#1974

Closed

Documentation of methods, parameters, allowed values, term definitions, etc, etc #9584

Closed

This was referenced Oct 3, 2022

Gradual unfreezing the layers during training. #9677

Closed

Procedure of training the model offline. #9700

Closed

glenn-jocher mentioned this issue Oct 10, 2022

confusion matrix - backgroud part #9754

Closed

mullenba mentioned this issue Oct 12, 2022

How to get pred box coordinates in loss.py? #7953

Closed

1 task

This was referenced Oct 24, 2022

Use Yolo for anomaly detection #9906

Closed

I want to pass the image read by opencv to the model I/F #9913

Closed

This was referenced Nov 6, 2022

Number of Classes #10054

Closed

Multigpu training becomes slower in Kaggle #10078

Closed

Yolov5 cannot detection a video (tfjs) #7416

Closed

zhangby2085 mentioned this issue Nov 30, 2022

Data Augmentation best practice #10348

Closed

1 task

glenn-jocher mentioned this issue Dec 6, 2022

How to freeze backbone and unfreeze it after a specific epoch? #10416

Closed

1 task

junxnone mentioned this issue Feb 14, 2023

ML Tasks Image Detection YOLO V5 junxnone/aiwiki#379

Open

SecretStar112 added a commit to SecretStar112/yolov5 that referenced this issue May 24, 2023

Add Architecture Summary to README Tutorials (#7146)

e4a1c9c

* Add Architecture Summary to README Tutorials Per ultralytics/yolov5#6998 (comment) * Update README.md

liamh1999 mentioned this issue Jun 29, 2023

YOLOv5 Architecture explanation #11791

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

YOLOv5 (6.0/6.1) brief summary #6998

YOLOv5 (6.0/6.1) brief summary #6998

WZMIAOMIAO commented Mar 16, 2022 •

edited by glenn-jocher

Loading

WZMIAOMIAO commented Mar 16, 2022

zlj-ky commented Mar 16, 2022

WZMIAOMIAO commented Mar 16, 2022

zlj-ky commented Mar 16, 2022

glenn-jocher commented Mar 17, 2022 •

edited

Loading

WZMIAOMIAO commented Mar 19, 2022

glenn-jocher commented Mar 20, 2022

WZMIAOMIAO commented Mar 21, 2022

glenn-jocher commented Mar 25, 2022

glenn-jocher commented Apr 12, 2022

xinxin342 commented Apr 16, 2022

glenn-jocher commented Apr 16, 2022

zlj-ky commented Apr 16, 2022

glenn-jocher commented Apr 17, 2022 •

edited

Loading

zlj-ky commented Apr 18, 2022

Cong-Wan commented Sep 26, 2022

kadirnar commented Oct 7, 2022

glenn-jocher commented Oct 9, 2022 •

edited

Loading

divided7 commented Nov 17, 2022

glenn-jocher commented Nov 17, 2022

dlod-openvino commented Nov 27, 2022

ishakpacal commented Dec 11, 2022

tayahiyukon commented Dec 16, 2022

XueZ-phd commented Dec 16, 2022

tayahiyukon commented Dec 17, 2022

karl-gardner commented Dec 26, 2022

scraus commented Dec 27, 2022

gracesmrngkr commented Jun 12, 2023

glenn-jocher commented Nov 15, 2023

YOLOv5 (6.0/6.1) brief summary #6998

YOLOv5 (6.0/6.1) brief summary #6998

Comments

WZMIAOMIAO commented Mar 16, 2022 • edited by glenn-jocher Loading

Content

1. Model Structure

2. Data Augmentation

3. Training Strategies

4. Others

4.1 Compute Losses

4.2 Balance Losses

4.3 Eliminate Grid Sensitivity

4.4 Build Targets

Environments

Status

WZMIAOMIAO commented Mar 16, 2022

zlj-ky commented Mar 16, 2022

WZMIAOMIAO commented Mar 16, 2022

zlj-ky commented Mar 16, 2022

glenn-jocher commented Mar 17, 2022 • edited Loading

WZMIAOMIAO commented Mar 19, 2022

glenn-jocher commented Mar 20, 2022

WZMIAOMIAO commented Mar 21, 2022

glenn-jocher commented Mar 25, 2022

glenn-jocher commented Apr 12, 2022

xinxin342 commented Apr 16, 2022

glenn-jocher commented Apr 16, 2022

zlj-ky commented Apr 16, 2022

glenn-jocher commented Apr 17, 2022 • edited Loading

zlj-ky commented Apr 18, 2022

Cong-Wan commented Sep 26, 2022

kadirnar commented Oct 7, 2022

glenn-jocher commented Oct 9, 2022 • edited Loading

divided7 commented Nov 17, 2022

glenn-jocher commented Nov 17, 2022

dlod-openvino commented Nov 27, 2022

ishakpacal commented Dec 11, 2022

tayahiyukon commented Dec 16, 2022

XueZ-phd commented Dec 16, 2022

tayahiyukon commented Dec 17, 2022

karl-gardner commented Dec 26, 2022

scraus commented Dec 27, 2022

gracesmrngkr commented Jun 12, 2023

glenn-jocher commented Nov 15, 2023

WZMIAOMIAO commented Mar 16, 2022 •

edited by glenn-jocher

Loading

glenn-jocher commented Mar 17, 2022 •

edited

Loading

glenn-jocher commented Apr 17, 2022 •

edited

Loading

glenn-jocher commented Oct 9, 2022 •

edited

Loading