Skip to content

Error in mobilenet conversion from Tensorflow to Caffe Different way of padding

Jiahao Yao edited this page May 10, 2018 · 2 revisions

Error in mobilenet conversion from Tensorflow to Caffe Different way of padding

Model: MobileNets v1 & MobileNets v2

Source: Tensorflow

Destination: Caffe

Author: Jiahao


Why we find this problem

We test the tensorflow parser and caffe emitter, using the same weights in every layer.

The mobilenet v1 gets low SNR result.

error: 0.61917245
L1 error: 1431.8882
SNR: 5.100172946375377
PSNR: 19.59308572747237

The mobilenet v2 gets different shape from original shape.

Take the first conv layer as examples. It takes the input of 224x224x3 and outputs 112x112x32, with kernel size of 3 and stride 2. In tensorflow, the padding is same, which actually means padding_left=0, padding_right=1, padding_top=0, padding_bottom=1. However, in caffe, the padding is symmetric. p_h = 0, p_w = 0 ( or padding_left=0, padding_right=0, padding_top=0, padding_bottom=0) can also make the output 111x111x32 shape, which is actually the case when mobilenet v2 is converted to caffe.

Even though p_h = 1, p_w = 1 ( or padding_left=1, padding_right=1, padding_top=1, padding_bottom=1) can also make the output 112x112x32 shape, the value of the output can be different because of mismatch in convolution arithmetic computation. That may be the reason resulting in low SNR of mobilent v1 conversion.

This problem is solved when converted to mxnet

Mxnet also uses symmetric padding in convolution layer like caffe. However, the problem mentioned above can be solved by adding padding layer before the convolution layer.

Following the traditional way of conversion mentioned in tutorial, one might find this trick in converted code.

    input           = mx.sym.var('input')
    MobilenetV2_Conv_Conv2D_pad = mx.sym.pad(data = input, mode = 'constant', pad_width=(0, 0, 0, 0, 0L, 1L, 0L, 1L), constant_value = 0.0, name = 'MobilenetV2/Conv/Conv2D_pad')
    MobilenetV2_Conv_Conv2D = mx.sym.Convolution(data=MobilenetV2_Conv_Conv2D_pad, kernel=(3L, 3L), stride=(2L, 2L), dilate = (1, 1), num_filter = 32, num_group = 1, no_bias = True, layout = 'NCHW', name = 'MobilenetV2/Conv/Conv2D')

the reason of the inconsistent shapes is due to symmetric padding in caffe

Different way of padding results in different shape after convolution.

Since the paddings in caffe are symmetric, the shapes after this convolution layer are 111x112, 112x111, or 112x112 (with different values)

Possible solutions to this problem can be either adding padding layer if non-symmetric padding when padding layer is implemented in caffe.