Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the Focus layer equivalent to a simple Conv layer? #4825

Closed
thomasbi1 opened this issue Sep 16, 2021 · 22 comments
Closed

Is the Focus layer equivalent to a simple Conv layer? #4825

thomasbi1 opened this issue Sep 16, 2021 · 22 comments
Assignees
Labels
question Further information is requested Stale

Comments

@thomasbi1
Copy link

Hi

I had a look at the Focus layer and it seems to me like it is equivalent to a simple 2d-convolutional layer without the need for the space-to-depth operation. For example, a Focus layer with kernel size 3 can be expressed as a Conv layer with kernel size 6 and stride 2 . I wrote some code to verify this:

import torch
from models.common import Focus, Conv
from utils.torch_utils import profile


focus = Focus(3, 64, k=3).eval()
conv = Conv(3, 64, k=6, s=2, p=2).eval()

# Express focus layer as conv layer
conv.bn = focus.conv.bn
conv.conv.weight.data[:, :, ::2, ::2] = focus.conv.conv.weight.data[:, :3]
conv.conv.weight.data[:, :, 1::2, ::2] = focus.conv.conv.weight.data[:, 3:6]
conv.conv.weight.data[:, :, ::2, 1::2] = focus.conv.conv.weight.data[:, 6:9]
conv.conv.weight.data[:, :, 1::2, 1::2] = focus.conv.conv.weight.data[:, 9:12]

# Compare
x = torch.randn(16, 3, 640, 640)
with torch.no_grad():
    # Results are not perfectly identical, errors up to about 1e-7 occur (probably numerical)
    assert torch.allclose(focus(x), conv(x), atol=1e-6)

# Profile
results = profile(input=torch.randn(16, 3, 640, 640), ops=[focus, conv, focus, conv], n=10, device=0)

And the output as follows:

YOLOv5 🚀 v5.0-434-g0dc725e torch 1.9.0+cu111 CUDA:0 (A100-SXM4-40GB, 40536.1875MB)
      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
        7040       23.07         2.682         4.055         13.78       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         2.368         3.474         9.989       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         2.343         3.556         11.57       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         2.368         3.456         9.961       (16, 3, 640, 640)      (16, 64, 320, 320)

I did have to slightly tweak the tolerance in torch.allcose for the assertion to succeed, but looking at the errors they seem to be purely numerical.

So am I missing something or could the Focus layer simply be replaced by a Conv layer which would lead to a slight increase in speed?

@thomasbi1 thomasbi1 added the question Further information is requested label Sep 16, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Sep 16, 2021

👋 Hello @thomasbi1, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python>=3.6.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@thomasbi1 thanks for raising this issue! That would be exciting if we can simplify this layer, we are always looking for improvements. I will try to reproduce your results later today.

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 18, 2021

@thomasbi1 I was able to run your comparison is Colab. The allclose seems ok, but I got significantly different profile results when tested in Colab (with T4). I'm not sure what the cause could be, will try different hardware later.

Screenshot 2021-09-18 at 20 01 51

@glenn-jocher
Copy link
Member

I profiled using the more traditional %timeit method as well and saw similar forward times (ms) to the YOLOv5 profiler:

Screenshot 2021-09-18 at 20 07 01

@thomasbi1
Copy link
Author

Wow, that is quite a big discrepancy!

I also tried on Colab with a K80 which again gives vastly different results (original Focus has a much faster forward time but slightly slower backpass)

image

@thomasbi1
Copy link
Author

I also tried with a GTX 1080Ti which again shows different results. Not sure if this is simply due to different hardware or different CUDA versions or something else.

image

@glenn-jocher
Copy link
Member

V100 results here, shows improvement forward and backward at batch size 16 and 1.

YOLOv5 🚀 v5.0-449-g9ef9494 torch 1.9.0 CUDA:0 (Tesla V100-SXM2-16GB, 16160.5MB)
      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
        7040       23.07         2.259         4.497         16.88       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         1.839         4.107            12       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         1.919         4.444         16.63       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         1.839         4.113         11.98       (16, 3, 640, 640)      (16, 64, 320, 320)

YOLOv5 🚀 v5.0-449-g9ef9494 torch 1.9.0 CUDA:0 (Tesla V100-SXM2-16GB, 16160.5MB)
      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
        7040       1.442         0.581        0.4828         1.377        (1, 3, 640, 640)       (1, 64, 320, 320)
        7040       1.442         0.161        0.4387        0.9845        (1, 3, 640, 640)       (1, 64, 320, 320)
        7040       1.442         0.161        0.4806         1.407        (1, 3, 640, 640)       (1, 64, 320, 320)
        7040       1.442         0.161        0.4376        0.9555        (1, 3, 640, 640)       (1, 64, 320, 320)

On balance it seems to help most setups, though the large K80 and T4 slowdowns are unfortunate as they are Colab mainstays. I'll take a look at the exportability next, though I don't imagine any issues there.

@glenn-jocher
Copy link
Member

glenn-jocher commented Sep 18, 2021

@thomasbi1 export tests are good! Ok, this seems like a good candidate for our upcoming v6.0 release, which will arrive with a few other minor architecture updates in October. I will add this issue to the release notes and make sure credit is assigned to you for uncovering this. Thank you for your contributions and let us know if you spot any other items for improvement!

TODO: apply to v6.0 release backbone updates:

backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2 <--- update
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 9, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 1, SPP, [1024, [5, 9, 13]]],
   [-1, 3, C3, [1024, False]],  # 9
  ]

@glenn-jocher
Copy link
Member

Removed TODO, update included in upcoming v6.0 release.

@ptklx
Copy link

ptklx commented Oct 15, 2021

@glenn-jocher Hi ,I use v6.0 to trt is report erro (Unnamed Layer* 0) [Convolution]: group count must divide input channel count
and find this:
image

[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2 <--- update
the input 3 and 64 canot divisible by 2

@glenn-jocher
Copy link
Member

@ptklx not sure I follow, groups=1 in all places, there are no grouped convolutions in any YOLOv5 models. If you believe you have a reproducible bug then I would recommend you submit a bug report with clear steps to reproduce.

@ptklx
Copy link

ptklx commented Oct 18, 2021

@glenn-jocher Sorry, I read the wrong place

@yao-zheng-yi
Copy link

@glenn-jocher Hi, I would like to ask why Conv can replace Focus? I used the above code and the result shows that Focus is significantly faster than Conv.
Snipaste_2021-10-18_14-52-52

@thomasbi1
Copy link
Author

@yao-zheng-yi As far as I can tell the Conv layer is more easily exported to other formats (onnx, tensorflow, tflite etc.), as it is a more standard operation than the space-to-depth operation in the Focus layer. @glenn-jocher can probably elaborate.

Regarding speed, I seem to remember that the CUDA version plays an important role and that the Conv layer is faster on version 11. You could try updating your CUDA version to 11 and see if that helps.

@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 18, 2021

@yao-zheng-yi yes from your profiling results Focus() is much faster on your machine. We did a pretty thorough analysis of various GPUs and found that Focus() is faster on many consumer cards and on T4 GPUs, while Conv() tends to be faster on enterprise cards and newer hardware.

This change (and most every other change) is an exercise in compromise, and we implement changes if we feel the balance of the benefits outweigh the drawbacks for most users, so this is one place where some users will experience slowdowns while others may experience improvements, and hopefully this ratio will improve in the future for new hardware.

One of the main benefits also is improved exportability and simpler architectures as well.

@yao-zheng-yi
Copy link

@thomasbi1 @glenn-jocher Many thanks!

@duanzhiihao
Copy link

I don't know if anyone still cares, but I also tried the script on an Nvidia 1080 ti and 3090; Conv is better in both cases.
image
image

@glenn-jocher
Copy link
Member

@duanzhiihao great, yes thanks for the feedback!

@SSHtoyourheart
Copy link

my memory only 2G ,55555555555

@tothedistance
Copy link

this is how i convert yolo focus to caffe. what a coincidence

@seeyouagain111
Copy link

nice, thanks for your guys's job

@github-actions
Copy link
Contributor

github-actions bot commented Dec 28, 2021

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested Stale
Projects
None yet
Development

No branches or pull requests

8 participants