MixNet (Mix_Conv) - 0.360 (0.5) BFlops - 77.0% (71.5%) Top1 #4203

CuongNguyen218 · 2019-11-01T03:44:11Z

Hi @AlexeyAB ,
Mix_conv: Mixed Depthwise Convolutional Kernels.
Arxiv
Github
Top1 Acc: 78.9% on ImageNet with 0.56 BFlops. I think this idea is good.

AlexeyAB · 2019-11-01T21:22:24Z

MixNet-L and -M have the same network architecture: we simple apply depth_multiplier 1.3 on MixNet-M to get MixNet-L, as shown in this code: https://github.com/tensorflow/tpu/blob/56e1058cba2b7b5ca233a4c9bfd7331a69082188/models/official/mnasnet/mixnet/mixnet_builder.py#L217

Is trained:

MixNet-M-GPU - 12.0M params - 0.532 BFlops-FMA - 77.0% (71.5%) Top1 - 93.3% ( 90.5%) Top5
- cfg-file: mixnet_m_gpu.cfg.txt
- weights-file: https://drive.google.com/open?id=1SOLd3eXHwcLkvwFgdiui6uL3-_rWWB1E

Original MixNet-M: mixnet_m.cfg.txt - 0.759 BFlops (0.379 FMA)

GhostNet-1.0 - 5.0M params - 0.119 BFlops - xx.x% Top1 - xx.x% Top5 - MY URL
GhostNet-1.0 - 5.2M params - 0.141 BFlops - 73.9% Top1 - 91.4% Top5 - Official
MobileNetV3 - 5.4M params - 0.219 BFlops - 75.2% Top1 - --- Top5
GhostNet-1.3 - 7.3M params - 0.226 BFlops - 75.7% Top1 - 92.7% Top5 - Official
MixNet-S - 4.1M params - 0.256 BFlops - 75.8% Top1 - 92.8% Top5
MixNet-M - 5.0M params - 0.360 BFlops - 77.0% (71.5%) Top1 - 93.3% ( 90.5%) Top5 - MixNet (Mix_Conv) - 0.360 (0.5) BFlops - 77.0% (71.5%) Top1 #4203
MixNet-L - 7.3M params - 0.565 BFlops - 78.9% Top1 - 94.2% Top5
EfficientNetB0 - 4.9M params - 0.450 BFlops - 76.3% (71.3%) Top1 - 93.2% (90.4%) Top5 - MY URL
EfficientNetB0 - 5.3M params - 0.390 BFlops - 76.3% (70.0%) Top1 - 93.2% (88.9%) Top5 - Official
EfficientNetB1 - 7.8M params - 0.700 BFlops - 78.8% Top1 - 94.4% Top5 EfficientNet | Implementation ? #3380
ShuffleNetV2 - xxxx params - 0.600 BFlops - 75.4% Top1 - xxxx Top5 shufflenetV2: an extremely light-weight architecture | Implementation #3750
Darknet53 - 20.0M params - 18.5 BFlops - 77.2% Top1 - 93.8% Top5
https://github.com/pjreddie/darknet/blob/master/cfg/darknet53.cfg and https://pjreddie.com/darknet/imagenet/

Explanation:

MixNet-M-GPU - is a slightly optimized version of MixNet-M for GPU, it has higher Bflops but also faster on GPU
MixNet-M achieves 77.0% Top1 and EfficientNetB0 achieves 76.3% Top1 only when they are trained with a large mini_batch_size on a large cluster DGX-2 400k$ / GPU/TPU-Cluster 1M$, otherwise official EfficientNetB0 achieves only 70.0% Top1 that is lower than our EfficientNetB0 71.3% Top1 https://github.com/WongKinYiu/CrossStagePartialNetworks#small-models (for example GhostNet-1.0 should be trained with Batch-norm-synchronization on 8 GPUs with mini_batch_size=1024)
To achieve 77.0% Top1 on MixNet-M use Darknet GPU-processing on CPU-RAM: Beta: Using CPU-RAM instead of GPU-VRAM for large Mini_batch=32 - 128 #4386
MixNet-M-GPU has 0.532 BFlops while Darknet shows 1.065 BFlops, that is 2x more. In all papers BFlops is actually FMA_BFlops (2 operations = MUL + ADD) https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
Why are there a low amount of BFLOPS in models, but also low speed - in these models a low amount of BFLOPS is achieved by using a grouped/depthwise-convolution, which is very slow on GPU, TPU-edge and other devices.

We replace one of the 15 layers with either (1) vanilla DepthwiseConv9x9 with kernel size 9x9; or (2) MixConv3579 with 4 groups of kernels: {3x3, 5x5, 7x7, 9x9}.
As shown in the figure, large kernel size has different impact on different layers: for most of layers, the accuracy doesn’t change much, but for certain layers with stride 2, a larger kernel can significantly improve the accuracy. Notably, although MixConv3579 uses only half parameters and FLOPS than the vanilla DepthwiseConv9x9, our MixConv achieves similar or slightly better performance for most of the layers.

Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency.

For comparison with EfficientNet

CuongNguyen218 · 2019-11-02T05:05:31Z

@AlexeyAB
As I understand, input tensor is split by the number of filters in Mix_Conv. As i see in cfg above, i think you assume that the input channels is 16 and split it by 4 and get 4 tensor with input channels is 4, right?. But I can't understand why you used the route layer is -2, -4, -6. Can you ensure that the input of each convlayer follow the order [0:3] for 3x3, [4 : 8] for 5x5 and so on ?

gnefihs · 2019-11-05T07:20:40Z

@CuongNguyen218 thanks for sharing this.

And yea it seems like AlexeyAB's cfg will apply the filters to the entire input tensor (like inceptionnet).

beHappy666 · 2019-11-07T07:23:43Z

Maybe the slice implementation be called but not split @AlexeyAB

AlexeyAB · 2019-11-07T11:51:05Z

@beHappy666

Original MixNet uses 4 depthwise conv-layers (3x3, 5x5, 7x7, 9x9) instead of 1 depthwise conv-layer
Yes, we can try to implement channel_slice layer as in ShuffleNetV2: shufflenetV2: an extremely light-weight architecture | Implementation #3750
Or may be easier to improve [route] layer:

[route]
layers = -1
group_id=0
groups=4

[convolutional]
batch_normalize=1
filters=4
groups=4
size=3
stride=2
pad=1
activation=leaky

[route]
layers = -3
group_id=1
groups=4

[convolutional]
batch_normalize=1
filters=4
groups=4
size=5
stride=2
pad=1
activation=leaky

[route]
layers = -5
group_id=2
groups=4

[convolutional]
batch_normalize=1
filters=4
groups=4
size=7
stride=2
pad=1
activation=leaky

[route]
layers = -7
group_id=3
groups=4

[convolutional]
batch_normalize=1
filters=4
groups=4
size=9
stride=2
pad=1
activation=leaky

[route]
layers = -1,-3,-5,-7

AlexeyAB · 2019-11-07T22:08:11Z

I added groups= and groupd_id= params to the [route] layer, so you can try to implement MixNet by using such blocks: #4203 (comment)

But I didn't test it.

Commit: 0fa9c8f

CuongNguyen218 · 2019-11-08T09:03:14Z

@AlexeyAB , how can i know that it's true

dexception · 2019-11-11T08:37:50Z

@AlexeyAB
Since it is using Depthwise Convolutional. Better to use on CPU.
This must be converted to OpenVino. We have to think about operator fusion.

AlexeyAB · 2019-11-11T21:50:12Z

@dexception

We have to think about operator fusion.

What is the operator fusion?

AlexeyAB · 2019-11-12T21:42:31Z

@CuongNguyen218 @dexception @beHappy666 @gnefihs @WongKinYiu @LukeAI

I implemented MixNet-M classification network, so you can try to train it on ImageNet.
It seems it can be fast only on CPU.

GPU nVidia RTX 2070

MixNet-M: mixnet_m.cfg.txt - 0.759 BFlops (0.379 FMA) - 4.6 sec per iteration training - 45ms inference
MixNet-M-XNOR (partially BIT-1 inference): mixnet_m_xnor.cfg.txt - 0.237 BFlops (0.118 FMA) - 5.3 sec per iteration training - 45ms inference (32 BIT-1 ops = 1 Flops)
MixNet-M-GPU (minor modification for GPU): mixnet_m_gpu.cfg.txt - 1.0 BFlops (0.500 FMA) - 2.7 sec per iteration training - 45 ms inference

WongKinYiu · 2019-11-12T23:02:02Z

@AlexeyAB Hello,

#4203 (comment)

MixNet-S - 4.1M params - 0.256 BFlops - 75.8% Top1 - 92.8% Top5
MixNet-M - 5.0M params - 0.360 BFlops - 77.0% Top1 - 93.3% Top5

#4203 (comment)

MixNet-M - 0.256 BFlops - 4.6 sec per iteration training - 45ms inference
MixNet-M-GPU (minor modification for GPU) - 1.0 BFlops - 2.7 sec per iteration training - 45 ms inference

i d like too know what r difference between these two comments, thanks.

AlexeyAB · 2019-11-12T23:08:13Z

@WongKinYiu

i d like too know what r difference between these two comments, thanks.

1st is got from paper
2nd actual implementation

Or what do you mean?

MixNet is just more efficient (Top1/Flops) modification of EfficientNet

WongKinYiu · 2019-11-12T23:27:26Z

just to make sure i understand correctly.

implemented MixNet-M is 0.256 BFLOPs, but GPU version is 1.0 BFLOPs.
and, BFLOPs of implemented MixNet-M is same as MixNet-S in the paper.

i ll take a look cfg files after finish my breakfast, thank you.

AlexeyAB · 2019-11-12T23:44:56Z

Yes, I just made some changes in MixNet-M (mixnet_m_gpu.cfg.txt) so it can be trained ~2x faster - 2.7 sec instead of 4.6 sec per training iteration with the same inference speed on GPU.
I just decreased groups= in depthwise-MixConv-layers, so it should be more accurate and faster on GPU.

May be we should look at Diagonalwise Refactorization: 15x speedup Depthwise Convolutions to speedup EfficientNet and MixNet: #3908

WongKinYiu · 2019-11-13T03:15:33Z

Now training mixnet_m.cfg.txt - 0.256 BFlops - 4.6 sec per iteration training - 45ms inference.
But it shows: Total BFLOPS 0.759.

update: gets cuDNN Error: CUDNN_STATUS_INTERNAL_ERROR

AlexeyAB · 2019-11-13T09:30:12Z

@WongKinYiu Yes, I fixed, BFLOPS 0.759 it is 0.379 FMA (EfficientNet and MixNet authors use FMA).

I successfully trained mixnet_m_gpu.cfg.txt for 10 000 iterations on Windows 7 x64.

WongKinYiu · 2019-11-13T10:14:57Z

@AlexeyAB thanks,

i do not know why on my every windows computer, training models with group convolution will crash.
on ubuntu, everything works.

AlexeyAB · 2019-11-13T10:46:37Z

@WongKinYiu

How many iterations did you train before this error occured?
Can you show screenshot of this error?
Try to increase subdivisions.
What CUDA and cuDNN versions do you use?
Show output of

nvcc --version
nvidia-smi

WongKinYiu · 2019-11-13T11:10:06Z

@AlexeyAB

100~900 iterations.
cuda 10

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
Cuda compilation tools, release 10.0, V10.0.130

windows do not have nvidia-smi

AlexeyAB · 2019-11-13T11:17:26Z

@WongKinYiu

This is a very strange error, why it is trying to create another instance of cuDNN-handle when it is already created.

windows do not have nvidia-smi

It should be in the C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi
nvidia-smi.zip

Do you use the latest version of Darknet?
If you set subdivisions=8 does it help?

WongKinYiu · 2019-11-13T11:26:50Z

yes, i use latest version.

AlexeyAB · 2019-11-13T11:52:35Z

@WongKinYiu

your nvcc --version shows CUDA 10.0, while nvidia-smi shows CUDA 10.1 - may be this is the reason.
also some users encountered errors when using CUDA 10.1

WongKinYiu · 2019-11-13T12:06:51Z

yes, i notice that nvidia-smi shows cuda vesion 10.1.
it is really strange.
when i installed cuda, cuda 10.1 had not been released.

AlexeyAB · 2019-11-13T12:11:29Z

Or just try to use new cuDNN version

CuongNguyen218 · 2019-11-25T03:40:58Z

@WongKinYiu,
can you give me a link to CIOU and Diou papers ?

WongKinYiu · 2019-11-25T03:46:02Z

@CuongNguyen218

here u r: #4360

CuongNguyen218 · 2019-11-25T03:55:53Z

@AlexeyAB ,
Did you provide EfficientNet model or convert Efficientnet pretrained with ImageNet model to darknet.

WongKinYiu · 2019-11-25T03:59:59Z

@CuongNguyen218

ImageNet and COCO models of EfficientNet-B0: #3874 (comment)

CuongNguyen218 · 2019-11-27T10:05:11Z

@AlexeyAB , what result did you get?

WongKinYiu · 2019-12-11T04:01:44Z

@AlexeyAB

mixnet-m-gpu, top-1 = 71.5%, top-5 = 90.5%.

AlexeyAB · 2019-12-11T09:13:44Z

@WongKinYiu Nice! Can you share weights-file?

CuongNguyen218 · 2019-12-11T09:16:48Z

Why is your results very different from paper?

WongKinYiu · 2019-12-11T09:23:00Z

becuz mixnet-m-gpu is designed by @AlexeyAB, not appears in the paper.

CuongNguyen218 · 2019-12-11T09:24:22Z

What do you think about pytorch or tensorflow transform model to darknet? Tải Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ Từ: Kin-Yiu, Wong <notifications@github.com> Đã gửi: Wednesday, December 11, 2019 4:23:01 PM Đến: AlexeyAB/darknet <darknet@noreply.github.com> Cc: Nguyen Ngoc Cuong 20150510 <cuong.nn150510@sis.hust.edu.vn>; Mention <mention@noreply.github.com> Chủ đề: Re: [AlexeyAB/darknet] Mix_Conv (#4203) becuz mixnet-m-gpu is designed by @AlexeyAB<https://github.com/AlexeyAB>, not appears in the paper. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#4203?email_source=notifications&email_token=AKKXRBTGQ7UBX5XUYK4DLV3QYCWPLA5CNFSM4JHVYHBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGSN37A#issuecomment-564452860>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AKKXRBWWDVSURRCZMKISFMLQYCWPLANCNFSM4JHVYHBA>.

WongKinYiu · 2019-12-11T09:26:08Z

@AlexeyAB https://drive.google.com/open?id=1SOLd3eXHwcLkvwFgdiui6uL3-_rWWB1E

AlexeyAB · 2019-12-11T09:45:34Z

@CuongNguyen218

Why is your results very different from paper?

Because in the paper MixNet and Efficientnet are trained with very large mini_batch_size on DGX-2 / Cluster ~400k$ - 1M$.
You can achieve the same accuracy 77.0% Top1 by using Darknet with #4386

If we train with the same mini_batch_size, then EfficientNet-B0 (official) has even lower Top1/5 accuracy than my EfficientNet-B0: https://github.com/WongKinYiu/CrossStagePartialNetworks#small-models

Also, I slightly optimized MixNet on GPU so that it can be trained in 1 month instead of 2 months.

AlexeyAB · 2019-12-11T11:33:07Z

@CuongNguyen218 If you want you can train original MixNet-M on ImageNet: #4203 (comment)

MixNet-M: mixnet_m.cfg.txt - 0.759 BFlops (0.379 FMA) - 4.6 sec per iteration training - 45ms inference

https://github.com/AlexeyAB/darknet/files/3838329/mixnet_m.cfg.txt

glenn-jocher · 2020-03-12T03:01:27Z

@AlexeyAB I just started looking into MixConvs. They seem very interesting! Do you know of anywhere that they are applied to object detection or are they only used in classification?

EfficientDet was published in November 2019, while MixConv was published in July 2019, so the EfficientDet authors clearly must have been aware of this type of convolution but neglected to use it for some reason I'm thinking.

AlexeyAB · 2020-03-12T09:59:13Z

@glenn-jocher

There are the same authors in all three articles: MixNet, EfficientNet, EfficientDet

EfficientNet uses Grouped-Conv
MixNet uses Grouped-Conv with different kernel_size

Both EfficientNet / MixNet are not optimal for the current CPU/GPU/Neuro-chips (MyriadX, Coral-TPU-Edge).

So they do such network as a reference-network to help to create a new neurochips (new version of TPU-edge).

So may be the reason why they don't use MixNet for Detector: Creating a neurochip for EfficientNet (Grouped-conv) is much easier than for MixNet (Grouped-Conv with different kernel_size).

Also may be MixNet has lower BFlops, but also slower.

glenn-jocher · 2020-03-12T17:40:07Z

@AlexeyAB Ah I see, that's an interesting approach. Yes it seems like hardware speeds for all of these new grouped convolution techniques are quite slow, despite the lower parameter count.

minhaj3 · 2020-05-03T09:58:11Z

Hi @AlexeyAB , I am trying to do inference on mixnet model using your config and pretrained weights mentioned in the starting of the tread, but I am getting error: " Error: in the file data/coco.names number of names 80 that isn't equal to classes=0 in the file cfg/mixnet_m_gpu.cfg
". The number of classes is not mentioned in the config file, but this error says so. And even if it implies that it was trained on a different number of classes, it still does not makes sense to have 0 classes in a config file. Am I missing something here? Someone help me out.

I tried running it on my ubuntu 18.04 by using command: "./darknet detector test cfg/coco.data cfg/mixnet_m_gpu.cfg mixnet_m_gpu_final.weights -ext_output data/dog.jpg"

AlexeyAB added the want enhancement Want to improve accuracy, speed or functionality label Nov 1, 2019

AlexeyAB mentioned this issue Nov 1, 2019

EfficientNet | Implementation ? #3380

Closed

AlexeyAB mentioned this issue Nov 11, 2019

Implement Yolo-LSTM (~+4-9 AP) for detection on Video with high mAP and without blinking issues #3114

Open

AlexeyAB added the enhancement label Nov 12, 2019

AlexeyAB mentioned this issue Nov 29, 2019

Matrix Nets: A New Deep Architecture for Object Detection - mAP of 47.8@0.5...0.95 on MS COCO, #3772

Open

AlexeyAB removed the want enhancement Want to improve accuracy, speed or functionality label Dec 11, 2019

AlexeyAB changed the title ~~Mix_Conv~~ Mix_Conv - 0.360 BFlops - 77.0% (71.5%) Top1 Dec 11, 2019

AlexeyAB changed the title ~~Mix_Conv - 0.360 BFlops - 77.0% (71.5%) Top1~~ Mix_Conv - 0.360 (0.5) BFlops - 77.0% (71.5%) Top1 Dec 11, 2019

AlexeyAB changed the title ~~Mix_Conv - 0.360 (0.5) BFlops - 77.0% (71.5%) Top1~~ MixNet (Mix_Conv) - 0.360 (0.5) BFlops - 77.0% (71.5%) Top1 Dec 11, 2019

AlexeyAB mentioned this issue Dec 11, 2019

Mixnet - yet another efficient classifier backbone 78.9% top-1 (better than mobilenetv3, ghostnet) #4503

Closed

AlexeyAB mentioned this issue Dec 23, 2019

GhostNet: More Features from Cheap Operations - 75.7% top-1 (better than than MobileNetV3) #4418

Open

AlexeyAB mentioned this issue May 3, 2020

YOLO v4 for VPU #5467

Closed

AlexeyAB mentioned this issue Jun 25, 2020

YOLOv4-tiny released: 40.2% AP50, 371 FPS (GTX 1080 Ti), 1770 FPS tkDNN/TensorRT #6067

Open

cenit closed this as completed Jan 23, 2021

MixNet (Mix_Conv) - 0.360 (0.5) BFlops - 77.0% (71.5%) Top1 #4203

MixNet (Mix_Conv) - 0.360 (0.5) BFlops - 77.0% (71.5%) Top1 #4203

Comments

CuongNguyen218 commented Nov 1, 2019 • edited Loading

AlexeyAB commented Nov 1, 2019 • edited Loading

CuongNguyen218 commented Nov 2, 2019

gnefihs commented Nov 5, 2019

beHappy666 commented Nov 7, 2019

AlexeyAB commented Nov 7, 2019 • edited Loading

AlexeyAB commented Nov 7, 2019

CuongNguyen218 commented Nov 8, 2019

dexception commented Nov 11, 2019

AlexeyAB commented Nov 11, 2019

AlexeyAB commented Nov 12, 2019 • edited Loading

WongKinYiu commented Nov 12, 2019 • edited Loading

AlexeyAB commented Nov 12, 2019 • edited Loading

WongKinYiu commented Nov 12, 2019

AlexeyAB commented Nov 12, 2019

WongKinYiu commented Nov 13, 2019 • edited Loading

AlexeyAB commented Nov 13, 2019 • edited Loading

WongKinYiu commented Nov 13, 2019

AlexeyAB commented Nov 13, 2019

WongKinYiu commented Nov 13, 2019

AlexeyAB commented Nov 13, 2019

WongKinYiu commented Nov 13, 2019

AlexeyAB commented Nov 13, 2019

WongKinYiu commented Nov 13, 2019

AlexeyAB commented Nov 13, 2019

CuongNguyen218 commented Nov 25, 2019

WongKinYiu commented Nov 25, 2019

CuongNguyen218 commented Nov 25, 2019

WongKinYiu commented Nov 25, 2019

CuongNguyen218 commented Nov 27, 2019

WongKinYiu commented Dec 11, 2019

AlexeyAB commented Dec 11, 2019

CuongNguyen218 commented Dec 11, 2019

WongKinYiu commented Dec 11, 2019

CuongNguyen218 commented Dec 11, 2019 via email

WongKinYiu commented Dec 11, 2019

AlexeyAB commented Dec 11, 2019 • edited Loading

AlexeyAB commented Dec 11, 2019

glenn-jocher commented Mar 12, 2020 • edited Loading

AlexeyAB commented Mar 12, 2020

glenn-jocher commented Mar 12, 2020

minhaj3 commented May 3, 2020 • edited Loading

CuongNguyen218 commented Nov 1, 2019 •

edited

Loading

AlexeyAB commented Nov 1, 2019 •

edited

Loading

AlexeyAB commented Nov 7, 2019 •

edited

Loading

AlexeyAB commented Nov 12, 2019 •

edited

Loading

WongKinYiu commented Nov 12, 2019 •

edited

Loading

AlexeyAB commented Nov 12, 2019 •

edited

Loading

WongKinYiu commented Nov 13, 2019 •

edited

Loading

AlexeyAB commented Nov 13, 2019 •

edited

Loading

AlexeyAB commented Dec 11, 2019 •

edited

Loading

glenn-jocher commented Mar 12, 2020 •

edited

Loading

minhaj3 commented May 3, 2020 •

edited

Loading