Skip to content

yolov3 model compress and acceleration (quantization, sparse), c++ version

Notifications You must be signed in to change notification settings

ArtyZe/yolov3_lite

Repository files navigation

yolov3_lite

As my repo must run in industry embedded devices which has poor computer sources, so I have to compress and accelerate them step by step untill the inference time fit our boss's command :(

Backbone net of my project is yolov3-lite and optimise version.

In the process of creating my project, I have referenced some git projects and papers in cvpr, thanks to these guys.

I will continue to update afterwards, please stay tuned.

All accelerate switches can be found in MakeFile

[What tricks I used]

Multiple Threads

Set OPENMP := 1 in Makefile

If you know multiple threads run in arm of X86 chips, you must know Openmp.

Next picture is how Openmp runs. It has many tricks to ensure work well between threads.

The result of use openmp in project is:

Image text

Kernel Mask (net sparsity)

Set MASK := 1 in Makefile

It a regular method to decrease the computation of conv layers. But the key point is how to set which kernel is important and which kernel need to delete.

In this project, I referenced the paper of
Accelerating Convolutional Networks via Global & Dynamic Filter Pruning product of Tencent lab

The accelerating result of use kernel mask in project is:

Weights Prune

Set PRUNE := 1 in Makefile

Because this method is very simple, you just need to set weights < threshold to 0, so I don't need to introduce it anymore.

The accelerating result of use < kernel mask & weights prune > in project is:

L1 Regularization

L1 Regularization can be regard to another way to decrease kernels, the principle is like kernel decrease with BN parameters in other papers.

Yolo use L2 regularization as default, so you need to change it to L1 in code. This method has a disadvantage, you need to change cfg files after every epoch end (after one epoch train you know how many kernels to leave in every conv layer) k If you want to know more about L2 and L1 regularization in yolo, you can go to my blog

The accelerating result of use L1 Regulatization in project is:

Quantization

In the domain of network acceleration, Quantization is always the most important trick. I have realized two quantization type, which can be switched in Makefile.

Set QUANTIZATION := 1 in Makefile

This module were imported from AlexeyAB's github repo

As he introduced, this quantization method is referenced nvidia's TensorRT theory.

But when I test this module, it works not good, recently I added google's quantization method code to it.

Set QUANTIZATION_GOOGLE := 1 in Makefile

Paper: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [C]// CVPR, 2018

The most novelty idea is plug in Fake Quantization in train process. And you can get the input quantization scale directly after model training instead of run calibrate process in calibration dataset.

And for the purpose of implemente the project to embedded devices, I added gemm_lowp of google to darknet.

Depthwise Conv

The key point of Mobilenet, it has been merged in yolov3 by the author, I optimized the code so that l.groups can be used in every module.

[How to train the repo]

  1. analysis your original net, decide which module you need to use  
	  
  2. change makefile and open modules, for example, if you want to use image mask, you just need to set 
  `MASK=1`

  1. start train
  
    ./darknet detector train [data_file path] cfg/yolov3.cfg [pretrain weights file] 
     
   4. start test
   set 'GPU=0'
   
   ./darknet detector test [data_file path] cfg/yolov3.cfg [weights file] [image file to detect]

[How to test the repo]

I have pretrained a model in backup, you can have a try :)

  1. analysis your original net, decide which module you need to use  
	  
  2. change makefile and open modules, for example, if you want to use image mask, you just need to set 
  `MASK=1`
   
  3. normal test
    ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg 
  4. test with nvidia quantization
     1). set QUANTIZATION := 1
     2). ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg -quantized
  5. test with google quantization
     1). set QUANTIZATION_GOOGLE := 1
     2). ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg

[Something more]

1. I added F1 score test code, the command is :

./darknet detector f1 [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup

1. I also have some other modules such as `Hash Compress` `Huffman Compress`, but I can't give all of them to you with other 
reasons.

1. When I test all the method in tiny net(not in VGG), it can decrease inference time by 30%~50% with very little f1 decrease,
and if you want faster, use quantization, it will surprise you!!!!!

If you want to use my code, please let me know!!!!

About

yolov3 model compress and acceleration (quantization, sparse), c++ version

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published