Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv5 Architecture explanation #11791

Closed
1 task done
liamh1999 opened this issue Jun 29, 2023 · 5 comments
Closed
1 task done

YOLOv5 Architecture explanation #11791

liamh1999 opened this issue Jun 29, 2023 · 5 comments
Labels
question Further information is requested Stale

Comments

@liamh1999
Copy link

Search before asking

Question

I'm currently writing a scientific paper regarding YOLOv5 and i have questions regarding the architecture. After looking into various sources this one looks like the one most promising: #6998

According to my knowledge YOLOv5 consists of 3 main blocks: Backbone, Neck and head but where is the neck in that diagram ?. Are there any sources regarding how the workflow of the diagram works ? What are the edges in C3 block going into concat ?

Thanks in advance for helping me !

Additional

No response

@liamh1999 liamh1999 added the question Further information is requested label Jun 29, 2023
@github-actions
Copy link
Contributor

👋 Hello @liamh1999, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

@glenn-jocher
Copy link
Member

@liamh1999 the YOLOv5 architecture consists of three main components: the Backbone, the Neck, and the Head. However, in the diagram you provided, the Neck component is not explicitly labeled. The diagram shows the C3 block, which is a part of the Neck in the YOLOv5 architecture.

To understand the workflow of the diagram, you can refer to the YOLOv5 source code and documentation. The architecture diagram you shared is a visual representation of the YOLOv5 model, but it may not provide detailed information about the specific operations performed within each block.

Regarding the edges in the C3 block going into concat, these edges represent the features extracted from different layers of the feature map. In the C3 block, these features are concatenated together to create a richer representation that can capture both low-level and high-level information.

For more in-depth information about the YOLOv5 architecture and its workflow, I recommend referring to the official YOLOv5 source code and the associated documentation. These resources provide detailed explanations of each component and the overall workflow of YOLOv5.

I hope this helps with your research and paper. If you have any further questions, feel free to ask. Good luck with your scientific paper!

@liamh1999
Copy link
Author

Hi @glenn-jocher thanks for the fast reply !
After looking more into it. I found this Diagram #7160 which i think shows the split between Backbone, Neck and Head very Good ! but i'm confused with where the values come from such as I = 3 , O = 64, K = 6, S = 2 . When i look into the yolov5m.yaml https://github.com/ultralytics/yolov5/blob/master/models/yolov5m.yaml i'm not sure where these values come from as these are the values [64, 6, 2, 2] ? (for the first conv). apart from that this is my definition for c3 block and conv so far:

The conv layer is a simple convolutional layer that applies a set of filters to the input and produces an output feature map. The conv layer has four parameters: ch_in, ch_out, kernel, and stride. ch_in is the number of input channels, ch_out is the number of output channels (or filters), kernel is the size of the filter (such as 3x3), and stride is the step size of the filter (such as 2). The conv layer also has an optional parameter: padding, which is the amount of zero-padding added to the input to preserve its size. The conv layer can also have an activation function (such as ReLU) or a batch normalization layer after it.

The c3 layer is a cross-stage partial block that consists of three convolutional layers with skip connections. The c3 layer has four parameters: ch_in, ch_out, n, and e. ch_in is the number of input channels, ch_out is the number of output channels, n is the number of repetitions of the block, and e is the expansion factor. The c3 layer splits the input feature map into two parts: one part goes through a series of convolutional layers with e times ch_out channels, and the other part goes directly to the output. Then, the two parts are concatenated along the channel dimension to form the output feature map. The c3 layer can reduce the redundancy and complexity of the network by using cross-stage partial connections.

if the explanation is wrong I would be thankful for corrections.

I also want to make experiement the impact of dropout on yolov5 in regards of performance as this is part of my paper.
As far as i can see there are no dropout mechanics implemented. Is there any documentation or help of how i can implement that into the backbone ?

Thanks in advance !

@github-actions
Copy link
Contributor

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Jul 31, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 11, 2023
@glenn-jocher
Copy link
Member

@liamh1999 you have a good grasp of the conv and c3 layers in the YOLOv5 architecture. The diagram you found effectively highlights the delineation between the Backbone, Neck, and Head in YOLOv5.

Regarding the values I = 3, O = 64, K = 6, and S = 2, these correspond to the input channels, output channels, kernel size, and stride of the conv layer in the YOLOv5m.yaml. These values are utilized to define the specific configuration of the convolutional layer and are essential for determining the behavior and parameters of the layer.

For evaluating the impact of dropout on YOLOv5, there isn't a built-in dropout mechanism in the current YOLOv5 implementation. To experiment with dropout in the YOLOv5 architecture, you can consider adding dropout layers into the backbone or other relevant sections of the network. You may need to modify the YOLOv5 source code to incorporate dropout layers and then conduct performance evaluations to assess its impact.

Your explanations of the conv and c3 layers are accurate, reflecting a solid understanding of these components.

Feel free to reach out if you need further assistance or clarification. Best of luck with your paper and experiments!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

2 participants