Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yolov5-lite #3168

Closed
debapriyamaji opened this issue May 14, 2021 · 21 comments
Closed

yolov5-lite #3168

debapriyamaji opened this issue May 14, 2021 · 21 comments
Labels
enhancement New feature or request Stale

Comments

@debapriyamaji
Copy link

debapriyamaji commented May 14, 2021

🚀 Feature

Yolov5 lite models: Making yolov5 more embedded friendly

Motivation

In line with the efficientdet-lite models that are more embedded friendly compared to efficientdet, is there any similar plan for yolov5-lite models.

Pitch

Following layers in the yolov5 are not embedded friendly:

  • slice layer in the beginning.
  • SiLU activation function.

If we can make suitable changes to these layers, these models can be deployed much more efficiently in embedded devices.

I have trained some models with similar changes for the above mentioned layers and got accuracy within 2% of the original model.

If interested, I would love to share those results.

Additional context

Efficientdet-lite models as compared to efficientdet.

@debapriyamaji debapriyamaji added the enhancement New feature or request label May 14, 2021
@github-actions
Copy link
Contributor

github-actions bot commented May 14, 2021

👋 Hello @debapriyamaji, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@debapriyamaji sounds interesting, can you share some quantitative results before and after the changes? We have an activations study here that compares SiLU against some alternatives: https://wandb.ai/glenn-jocher/activations?workspace=user-glenn-jocher

I also recently created some documentation on the Focus() layer here that might interest you in #3181

@debapriyamaji
Copy link
Author

debapriyamaji commented May 17, 2021

@glenn-jocher Thanks for the reply and the insights. My ReLU results were in line with yours.

Following are the changes that I have made step by step for yolov5s:

Model config PyTorch mAP/AP50 GFLOPS Comment
Official model 36.7/55.4 17.0
Retrained Official model 36.9/55.6 17.0
SiLU replaced by ReLU 34.9/53.7 17.0
SiLU replaced by ReLU + slice replaced by conv (Would like to call it yolov5-lite) 35.0/54.4 17.1 Replaced the initial slice layer with a conv(No=12,Ni=3, K=3, S=2).

Let me know what you think.

@wudashuo
Copy link
Contributor

I've done the exact same thing and have moved the model to devices months ago, following somebody's blog. The result I tested on devices seems to be okay, but it still has losses compared to yolov5s, besides, there are a few losses during migration.
As many devices don't support SiLU and Focus, I suppose many people would have done these changes to migrate the model to embedded devices, but there is still a lot to improve, not just simply replace SiLU with ReLU, replace Focus with Conv, that's just a compromise way. I'm working on it for months, trying to find out a model that less than 15 GFLOPs and more than 35 mAP. If you have some ideas, please let me know.
By the way, I don't think it should be called yolv5-lite, the parameters and GFLOPs are larger than yolov5s, it's not lighter than yolov5s.

@glenn-jocher
Copy link
Member

@debapriyamaji @wudashuo one other point is that the Focus() module can be implemented with no slicing by allowing it to use the Contract() module. This provides no benefit on PyTorch/CUDA but may help other deployment targets.

yolov5/models/common.py

Lines 163 to 187 in b7cd1f5

class Focus(nn.Module):
# Focus wh information into c-space
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super(Focus, self).__init__()
self.conv = Conv(c1 * 4, c2, k, s, p, g, act)
# self.contract = Contract(gain=2)
def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2)
return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))
# return self.conv(self.contract(x))
class Contract(nn.Module):
# Contract width-height into channels, i.e. x(1,64,80,80) to x(1,256,40,40)
def __init__(self, gain=2):
super().__init__()
self.gain = gain
def forward(self, x):
N, C, H, W = x.size() # assert (H / s == 0) and (W / s == 0), 'Indivisible gain'
s = self.gain
x = x.view(N, C, H // s, s, W // s, s) # x(1,64,40,2,40,2)
x = x.permute(0, 3, 5, 1, 2, 4).contiguous() # x(1,2,2,64,40,40)
return x.view(N, C * s * s, H // s, W // s) # x(1,256,40,40)

Also I'd argue that SiLU is well supported on many backends. We use it in our YOLOv5 CoreML models in our iOS App, where YOLOv5s runs in <18ms on iPhone 11/12. See https://apps.apple.com/app/id1452689527

@debapriyamaji
Copy link
Author

debapriyamaji commented May 19, 2021

Hi,
@wudashuo You can achieve your goal of 15GFLOPS and mAP of 35 by running yolov5s6 at a resolution of 576x576. Running inference on the pretrained checkpoint with i/p resolution of 576x576, I got mAP of 37.7 for 14.15 GFLOPS.

Regarding, yolov5-lite, GFLOPS number indeed look higher. However, these number are for convolution only. If you consider the operation required for sigmoid and multiplications needed for SiLU, yolov5 will be higher than yolov5-lite. From, the context of efficientnet-lite, complexity is almost same as efficientnet. Main difference is in porting these models to embedded devices.

@debapriyamaji
Copy link
Author

debapriyamaji commented May 22, 2021

@glenn-jocher Thanks for the insight regarding Focus and SiLU layer. I have some follow-up questions:

  • Is there any FPS benchmarking data available for different yolov5 models in apple devices?
  • Did you run these models in 8 bit mode?
  • How did you come to the conclusion that SiLU is well supported in apple devices? Did you run models with ReLU and SiLU and didn't observe any improvement in performance?

Thanks in advance.

@glenn-jocher
Copy link
Member

@debapriyamaji see iOS iDetection Speed Table #1276 for YOLOv5 benchmarks on iPhone models. Quantization seems to have no effect on ANE throughput. SiLU tests work perfectly well in iDetection.

@debapriyamaji
Copy link
Author

@glenn-jocher Thanks for pointing to the FPS table. Is there any similar table for accuracy that would show the drop in accuracy because of quantization?
I am trying to benchmark the ReLU model against SiLU model after quantization in tflite. Will share the results once I am done . Since ReLU is more quantization frindly than SiLU, I am expecting similar trend here as well. Thanks.

@glenn-jocher
Copy link
Member

@debapriyamaji there is no drop in accuracy when moving from FP32 to FP16 inference.

@debapriyamaji
Copy link
Author

debapriyamaji commented May 24, 2021

@glenn-jocher Sorry for not mentioning it in the previous post. I meant the drop in accuracy because of FP8 since the CoreML models are exported as FP8.

One more point I wanted to ask: A14 processor is ~11.0 TOPS. Yolov5s @320x192 is ~ 2.55GOPS. If it runs at 14.3mS or ~70FPS, GOPS utilization is (70*2.55) GOPS = 178.5 GOPS. That's only 1.6% of the entire compute power. So, the efficiency is quite low right?

Thanks.

@developer0hye
Copy link
Contributor

@glenn-jocher
Wow.. FReLU is so powerful... Do you have a plan to use FReLU instead of ReLU layer for new yolov5?

@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 10, 2021

@developer0hye we have an activations study on YOLOv5s here:
https://wandb.ai/glenn-jocher/activations
Screenshot 2021-06-10 at 12 03 06

Yes, FReLU performed very well, but you have to be careful interpreting these results as it's introducing additional convolutions into the model, which will especially help small models, but will cause faster overfitting in large models. Also memory usage increased from 12G to 18G when moving from SiLU to FReLU (though training speed was unaffected).

All in all it requires a lot more study into applying it to larger models, and perhaps only applying it to certain areas of the model rather than the entire model (could use feedback from the FReLU authors here), we just don't have time or manpower.

EDIT1: I think memory usage could be reduced by not applying to as much to the earlier layers. In general the P0, P1, P2 layers in the backbone (first 1/3 of the backbone) all have very small strides with high memory usage and inference times.

@glenn-jocher
Copy link
Member

@AyushExel can I borrow you to pass a feature request to the W&B team? We really need some semi-transparency on these legend overlays. Maybe 20-30% transparency would allow you to see the data under the legend, which would be great in my above screenshot.

@AyushExel
Copy link
Contributor

@glenn-jocher I've passed it on :)

@glenn-jocher
Copy link
Member

@AyushExel awesome thanks :) !

@Alex-afka
Copy link

Alex-afka commented Jun 10, 2021

@glenn-jocher
Did you replace all the activation functions with frelu?
How does frelu work in yolov5l?

@glenn-jocher
Copy link
Member

@Alex-afka see #2891 for details

@github-actions
Copy link
Contributor

github-actions bot commented Jul 11, 2021

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@PrashantDixit0
Copy link

@glenn-jocher and @debapriyamaji, Is this YOLOv5-lite pretrained models are avaialble for research or testing purposes ?

@glenn-jocher
Copy link
Member

@PrashantDixit0 Yes, the YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x models are available with pretrained weights, for various tasks like object detection, instance segmentation, and more. You can find more information on using these models for research and testing purposes in the Ultralytics documentation at https://docs.ultralytics.com/yolov5/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

7 participants