Overview of model structure about YOLOv5 #280

seekFire · 2020-07-03T09:16:38Z

In order to understand the structure of YOLOv5 and use other frameworks to implement YOLOv5, I try to create an overview, as shown below. If there has any error, please point out

github-actions · 2020-07-03T09:17:18Z

Hello @seekFire, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

glenn-jocher · 2020-07-03T17:42:00Z

@seekFire yes looks correct!

TaoXieSZ · 2020-07-03T22:41:34Z

@seekFire That looks pretty and clean. What kind of drawing tool you use?

seekFire · 2020-07-03T23:26:48Z

@ChristopherSTAN Just PowerPoint

seekFire · 2020-07-03T23:34:28Z

@glenn-jocher Thank you for your confirmation！

bretagne-peiqi · 2020-07-24T05:48:54Z

Hello, I also made one, if there is any error, please help me point out : )

glenn-jocher · 2020-07-24T05:55:57Z

@bretagne-peiqi yes this looks correct, except that with the v2.0 release the 3 output Conv2d() boxes (red in your diagram) are now inside the Detect() stage:

    (24): Detect(
      (m): ModuleList(
        (0): Conv2d(128, 255, kernel_size=(1, 1), stride=(1, 1))
        (1): Conv2d(256, 255, kernel_size=(1, 1), stride=(1, 1))
        (2): Conv2d(512, 255, kernel_size=(1, 1), stride=(1, 1))
      )

glenn-jocher · 2020-07-24T05:57:30Z

@bretagne-peiqi ah, also you have an FPN head here, whereas the more recent YOLOv5 models have PANet heads. See https://github.com/ultralytics/yolov5/blob/master/models/yolov5s.yaml

bretagne-peiqi · 2020-07-24T06:39:21Z

@glenn-jocher many thanks.

github-actions · 2020-08-24T00:34:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gaobaorong · 2020-09-18T09:20:49Z

Good!

pravastacaraka · 2021-02-07T02:17:32Z

@seekFire @bretagne-peiqi @glenn-jocher do you guys have an overview diagram for this YOLOv4 v4.0?

zhiqwang · 2021-02-07T02:31:14Z

Hi @pravastacaraka , here is a overview of YOLOv5 v4.0 , actually it looks very similar to the previous version, here is the v3.1 version.

I copied this diagram from here, it is written in Chinese.

Copyright statement: This article is the original article of the blogger and follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprinting.
Link to this article: https://blog.csdn.net/Q1u1NG/article/details/107511465

pravastacaraka · 2021-02-07T04:10:12Z

@zhiqwang thank you so much for your kind help

ehdrndd · 2021-09-01T00:41:16Z

well, I updated architecture

data4pass · 2021-09-27T08:01:56Z

My apologies if this question is too beginner-level, but I would like to ask, what operation is it exactly that is used to "combine" the three predictions that we got from the detection layers?

glenn-jocher · 2021-09-27T20:52:36Z

@data4pass all detection heads concatenate together (along dimension 1) into a single output in the YOLOv5 Detect() layer:

yolov5/models/yolo.py

Line 73 in a820b43

return x if self.training else (torch.cat(z, 1), x)

data4pass · 2021-09-30T19:14:51Z

Understood, but don't the three resulting tensors have different shapes? Don't we have to reshape the tensors somehow so that they can be concatenated?

glenn-jocher · 2021-09-30T19:20:57Z

@data4pass see Detect() layer for reshape ops:

yolov5/models/yolo.py

Line 36 in ba99092

class Detect(nn.Module):

Zengyf-CVer · 2021-12-24T23:56:05Z

well, I updated architecture

Hello @ehdrndd , what software did you use to make this picture?

yyccR · 2021-12-28T03:29:06Z

The latest structure looks clean and simple

glenn-jocher · 2021-12-28T03:43:54Z

@yyccR very nice!

Symbadian · 2023-04-04T09:20:30Z

@Symbadian

yolov5/models/common.py

Line 158 in cca5e21

def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion

In C3 layer, n = number of Bottleneck layer:
nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))

Ahh I see thanx for your swift response @yyccR

glenn-jocher · 2023-04-04T09:49:35Z

@Symbadian you're welcome! Don't hesitate to ask if you have any further questions.

Symbadian · 2023-04-04T10:28:19Z

@Symbadian you're welcome! Don't hesitate to ask if you have any further questions.

hey @glenn-jocher, before I start analysing the operations I am requesting your input to authenticate my diagram. I followed some of the guys based on their interpretation. I am ensuring that the concept is accurate for the version 6 model.

I had the wrong model design previously hence the request for your input at this stage.
thanx for your response in advance, I really appreciate this and you efforts to explain.
See my understanding below:

glenn-jocher · 2023-04-04T11:24:59Z

Hi @Symbadian, your diagram looks great! It accurately reflects the YOLOv5x structure, including the various modules and their respective connections. Keep up the good work!

Symbadian · 2023-04-04T11:28:37Z

Hi @Symbadian, your diagram looks great! It accurately reflects the YOLOv5x structure, including the various modules and their respective connections. Keep up the good work!

Hi @glenn-jocher and thanx for your response! in that case my approach is wrong again if I proceed with this diagram. I applied the YOLOv5m Model. Is it possible for you to guide me to an example for such so that I can redo this task once more, please?

Thanx in advance for your guidance really means loads!!

glenn-jocher · 2023-04-04T12:46:43Z

Certainly, @Symbadian. Here is an example of the YOLOv5m model architecture:

![Screenshot 2023-04-05 at 9.02.10 AM.png](attachment:Screenshot 2023-04-05 at 9.02.10 AM.png)

Please note that this diagram only shows the architecture, and not the specifics of each layer or their connections. Let me know if you have any further questions or need further assistance.

Symbadian · 2023-04-04T13:05:19Z

Hey @glenn-jocher , thanx loads pal!
Unfortunately, it's not showing in your comment!
it is possible if you can resend the example once more, please?
thanx in advance!

glenn-jocher · 2023-04-04T14:01:58Z

I apologize for the confusion, @Symbadian. Here is the example of the YOLOv5m model architecture:

Please note that this diagram only shows the architecture, and not the specifics of each layer or their connections. Let me know if you have any further questions or need further assistance.

Symbadian · 2023-04-04T14:03:27Z

Hi @glenn-jocher not sure what's going on but it's still not showing pal! this took me to a blank page

AccessDeniedAccess DeniedAZJ3ETS7N82N33R2GC/o+65bVVL9Pr42nyy2KiQCTEIJvNSQXz5mTsKiWrgHB6zayuTmU9Qj2PLMtNmir+jO3Mk7dMI=

glenn-jocher · 2023-04-04T14:05:52Z

I apologize for the inconvenience, @Symbadian. Here is a direct link to the image of the YOLOv5m architecture:

https://user-images.githubusercontent.com/50293021/137899212-61d685a7-fe16-48f2-9b82-41d14311734b.png

Hopefully, you will be able to view it with this link. Let me know if you have any further questions or concerns.

Symbadian · 2023-04-04T14:07:18Z

@glenn-jocher someone really don't want me to have this image pal lolz!
this took me to another failed attempt

AccessDeniedAccess DeniedTYRMG9TEQVJKJVMSokGAYJHGsHJoZHt8pEkKCkzD5kZa94FDsGrgtNCRaq+8eqBD5R3AnKqRe0KfJBZ1/isRckzY4Pg=

glenn-jocher · 2023-04-04T15:33:40Z

I apologize for the continued difficulties, @Symbadian. Here is another option: the architecture diagram can also be found in the following article under the heading "Model Architectures":

https://blog.roboflow.com/how-to-train-yolov5-on-a-custom-dataset/

I hope this helps you with your analysis. Let me know if you have any further questions or need further assistance.

Symbadian · 2023-04-04T15:45:53Z

"Model Architectures":

Hey @glenn-jocher Thanx for your response pal.
I didn't see what you specified, I must have missed it (comb the entire site)!

However, is this the header? "Define YOLOv5 Model Configuration and Architecture"

if yes? there's no diagram therein that seems to be YOLOv5 medium from my little experience and knowledge..

Am I missing something here? please suggest!!
I'm unable to move forward without this diagram of the medium model!

glenn-jocher · 2023-04-04T15:51:30Z

I apologize for the confusion, @Symbadian. It looks like the article I provided does not contain a specific diagram of the YOLOv5m architecture.

However, I found this diagram on the Ultralytics YOLOv5 GitHub repo under the "models" folder:

This should be the architecture diagram for the YOLOv5m model that you were looking for. Let me know if you have any further questions or need further assistance.

Symbadian · 2023-04-04T15:59:04Z

Hi @glenn-jocher thanx for responding and for clearing up the issue
I am not certain what's happening, I cannot respond to your last comment!

Anyways, there are multiple diagrams therein the link but this did not specify if it's the medium or not. The rest mentioned the large or small version..

So I am hoping that it's this!! please confirm and thanx in advance pal!

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 9  # number of classes
depth_multiple: 0.67  # model depth multiple
width_multiple: 0.75  # layer channel multiple
anchors:
 - [10,13, 16,30, 33,23]  # P3/8
 - [30,61, 62,45, 59,119]  # P4/16
 - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
 # [from, number, module, args]
 [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
  [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
  [-1, 3, C3, [128]],
  [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
  [-1, 6, C3, [256]],
  [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
  [-1, 9, C3, [512]],
  [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
  [-1, 3, C3, [1024]],
  [-1, 1, SPPF, [1024, 5]],  # 9
 ]

# YOLOv5 v6.0 head
head:
 [[-1, 1, Conv, [512, 1, 1]],
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  [[-1, 6], 1, Concat, [1]],  # cat backbone P4
  [-1, 3, C3, [512, False]],  # 13

  [-1, 1, Conv, [256, 1, 1]],
  [-1, 1, nn.Upsample, [None, 2, 'nearest']],
  [[-1, 4], 1, Concat, [1]],  # cat backbone P3
  [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

  [-1, 1, Conv, [256, 3, 2]],
  [[-1, 14], 1, Concat, [1]],  # cat head P4
  [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

  [-1, 1, Conv, [512, 3, 2]],
  [[-1, 10], 1, Concat, [1]],  # cat head P5
  [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

  [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)

glenn-jocher · 2023-04-04T17:54:33Z

Yes, @Symbadian, that appears to be the architecture diagram for the YOLOv5m model. The code snippet you provided contains the model configuration with its layers and parameters, and the accompanying diagram displays the connections and flow of data through those layers.

I hope this helps you with your tasks. Let me know if you have any further questions or need further assistance.

myasser63 · 2023-04-04T20:02:54Z

@glenn-jocher Does YOLOv5 v6.0 have any type of spatial or channel attention modules?

glenn-jocher · 2023-04-04T21:45:07Z

Yes, @Symbadian, YOLOv5 v6.0 does have attention modules implemented in its architecture. The SPP (Spatial Pyramid Pooling) and PAN (Path Aggregation Network) modules both incorporate spatial and channel attention mechanisms to emphasize more relevant features and reduce noise in the feature maps.

The Spatial Pyramid Pooling (SPP) module computes spatial pooling features at multiple scales to handle varying object sizes, while the Path Aggregation Network (PAN) module aggregates spatial, context, and channel information across feature maps to improve detection accuracy. Both of these modules take advantage of attention mechanisms to refine the features used for object detection.

If you want to learn more about SPP and PAN modules, you can check out the original research papers, "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition" and "Path Aggregation Network for Instance Segmentation", respectively.

glenn-jocher · 2023-04-05T05:19:22Z

That's correct, @Symbadian. Because SPP and PAN modules already incorporate spatial and channel attention mechanisms, you do not need to add an additional attention model such as CBAM before using SPP or PAN.

The SPP and PAN modules are designed to handle object detection tasks specifically, and they have shown to improve detection accuracy while also reducing computational overhead compared to using separate attention models.

So in summary, if you are already using SPP or PAN in your YOLOv5 implementation, adding an additional attention model like CBAM may not be necessary and could potentially introduce performance or computational issues.

Symbadian · 2023-04-05T06:23:24Z

Ahhhh, ok great on the diagram! top of the morning to you and thanx for your guidance @glenn-jocher!

However, I don't think I am using CBAM and I'm not sure what this is!!

is this hidden somewhere unknown?

CBAM is not mentioned anywhere on the diagram!!!???!

supriamir · 2023-04-05T07:40:03Z

Hi @glenn-jocher thank you for your answer.

Is the attention model in the C3 module? I just wonder what type attention model implemented in YOLOV5?

glenn-jocher · 2023-04-05T08:13:35Z

I apologize for the confusion, @Symbadian. You are correct that the attention mechanism used in YOLOv5 is not CBAM – that was an oversight in my previous response.

In YOLOv5, the attention mechanism is implemented in the C3 (CSP-3) blocks. The C3 blocks are a modified version of the CSP (Cross Stage Partial) blocks introduced in the original YOLOv4 paper, and they use a combination of skip connections, convolutional layers, and attention mechanisms to improve information flow through the network and reduce the impact of noisy features.

Specifically, the C3 block in YOLOv5 contains two parallel convolutional layers, with the first layer passing input features through a bottleneck layer and the second layer directly outputting features. These two streams are then concatenated together and passed through a series of pooling and convolutional layers. Attention modules are also included within the C3 block to help the network attend to important features and suppress less relevant information.

Overall, the attention mechanism in the C3 blocks is designed to address the problem of information loss in the network due to repeated downsampling, while still maintaining a level of computational efficiency. I hope this helps!

Symbadian · 2023-04-05T09:02:30Z

Hi @supriamir,

I am wondering the same as I am not certain, However, based on my understanding the C3 is The CSPL (cross-stage partial connections) consisting of the bottleneck layers??!!? I can be wrong here!!

Someone please correct my statement

Symbadian · 2023-04-05T09:19:52Z

I apologize for the confusion, @Symbadian. You are correct that the attention mechanism used in YOLOv5 is not CBAM – that was an oversight in my previous response.

In YOLOv5, the attention mechanism is implemented in the C3 (CSP-3) blocks. The C3 blocks are a modified version of the CSP (Cross Stage Partial) blocks introduced in the original YOLOv4 paper, and they use a combination of skip connections, convolutional layers, and attention mechanisms to improve information flow through the network and reduce the impact of noisy features.

Specifically, the C3 block in YOLOv5 contains two parallel convolutional layers, with the first layer passing input features through a bottleneck layer and the second layer directly outputting features. These two streams are then concatenated together and passed through a series of pooling and convolutional layers. Attention modules are also included within the C3 block to help the network attend to important features and suppress less relevant information.

Overall, the attention mechanism in the C3 blocks is designed to address the problem of information loss in the network due to repeated downsampling, while still maintaining a level of computational efficiency. I hope this helps!

Brilliant, I will work on the diagram and try including the layers represented in the previous comment with the code specifications.

Thanx for the insight into your model's really really great work here!
cheers!

glenn-jocher · 2023-04-05T09:49:39Z

You're welcome, @Symbadian! I'm glad I could help clarify the attention mechanism used in YOLOv5. Feel free to reach out if you have any further questions or need further assistance with your diagram. Wishing you all the best with your work!

Caterina1996 · 2023-11-22T13:32:38Z

In order to understand the structure of YOLOv5 and use other frameworks to implement YOLOv5, I try to create an overview, as shown below. If there has any error, please point out

Hi @seekFire I would like to kindly request your permission to include this image in an academic paper publication. We will be happy to acknowledge or reference you in the form that you deem appropiate. If you have any specific requirements or conditions for granting copyright permission, please contact me.

Thank you very much in advance!

jinzhaot · 2024-05-15T16:47:51Z

In order to understand the structure of YOLOv5 and use other frameworks to implement YOLOv5, I try to create an overview, as shown below. If there has any error, please point out

Hi @seekFire, I would like to request your permission to redraw based on your original image in an academic paper publication. If you have any specific requirements, please let me know.

Thanks for your help!

seekFire added the enhancement New feature or request label Jul 3, 2020

github-actions bot added the Stale label Aug 24, 2020

github-actions bot closed this as completed Aug 31, 2020

zhiqwang mentioned this issue Feb 10, 2021

Network Design of Yolov5 #1882

Closed

zhiqwang mentioned this issue Feb 23, 2021

Dear author, can you provide a visualization scheme for YOLOV5 feature graphs during detect.py? Thank you! #2259

Closed

dariogonle mentioned this issue Aug 23, 2021

YOLOv5 model architecture overview #4518

Closed

karl-gardner mentioned this issue Dec 24, 2021

Yolov5m architecture #6094

Closed

1 task

vjsrinivas mentioned this issue May 16, 2023

Brief summary of YOLOv8 model structure ultralytics/ultralytics#189

Open

ultralytics deleted a comment from glenn-jocher May 19, 2024

Overview of model structure about YOLOv5 #280

Overview of model structure about YOLOv5 #280

Comments

seekFire commented Jul 3, 2020 • edited Loading

github-actions bot commented Jul 3, 2020 • edited by glenn-jocher Loading

glenn-jocher commented Jul 3, 2020

TaoXieSZ commented Jul 3, 2020

seekFire commented Jul 3, 2020

seekFire commented Jul 3, 2020

bretagne-peiqi commented Jul 24, 2020

glenn-jocher commented Jul 24, 2020

glenn-jocher commented Jul 24, 2020

bretagne-peiqi commented Jul 24, 2020

github-actions bot commented Aug 24, 2020

gaobaorong commented Sep 18, 2020

pravastacaraka commented Feb 7, 2021

zhiqwang commented Feb 7, 2021 • edited Loading

pravastacaraka commented Feb 7, 2021

ehdrndd commented Sep 1, 2021

data4pass commented Sep 27, 2021

glenn-jocher commented Sep 27, 2021 • edited Loading

data4pass commented Sep 30, 2021

glenn-jocher commented Sep 30, 2021

Zengyf-CVer commented Dec 24, 2021

yyccR commented Dec 28, 2021

glenn-jocher commented Dec 28, 2021

Symbadian commented Apr 4, 2023

glenn-jocher commented Apr 4, 2023

Symbadian commented Apr 4, 2023

glenn-jocher commented Apr 4, 2023

Symbadian commented Apr 4, 2023

glenn-jocher commented Apr 4, 2023

Symbadian commented Apr 4, 2023

glenn-jocher commented Apr 4, 2023

Symbadian commented Apr 4, 2023

glenn-jocher commented Apr 4, 2023

Symbadian commented Apr 4, 2023 • edited Loading

glenn-jocher commented Apr 4, 2023

Symbadian commented Apr 4, 2023 • edited Loading

glenn-jocher commented Apr 4, 2023

Symbadian commented Apr 4, 2023 • edited Loading

glenn-jocher commented Apr 4, 2023

myasser63 commented Apr 4, 2023

glenn-jocher commented Apr 4, 2023

glenn-jocher commented Apr 5, 2023

Symbadian commented Apr 5, 2023 • edited Loading

supriamir commented Apr 5, 2023

glenn-jocher commented Apr 5, 2023

Symbadian commented Apr 5, 2023

Symbadian commented Apr 5, 2023

glenn-jocher commented Apr 5, 2023

Caterina1996 commented Nov 22, 2023

jinzhaot commented May 15, 2024

seekFire commented Jul 3, 2020 •

edited

Loading

github-actions bot commented Jul 3, 2020 •

edited by glenn-jocher

Loading

zhiqwang commented Feb 7, 2021 •

edited

Loading

glenn-jocher commented Sep 27, 2021 •

edited

Loading

Symbadian commented Apr 4, 2023 •

edited

Loading

Symbadian commented Apr 4, 2023 •

edited

Loading

Symbadian commented Apr 4, 2023 •

edited

Loading

Symbadian commented Apr 5, 2023 •

edited

Loading