MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Arxiv: 1704.04861

MobileNet是由Google团队提出的应用于移动及嵌入式设备的轻量级神经网络。在这些场景中, 由于对时延的实时要求, 模型需要运行在端侧。 因此, 对于模型的预测速度、大小都有比较高的要求, 同时不能牺牲过多的精度。 在此之前,一般的做法是对神经网络进行压缩,或是直接训练较小的神经网络。 而MobileNet另辟蹊径,通过深度可分离卷积(Depth-wise Separable Convolution)大大减少了参数数量和计算量(Mutli-Adds)。

在论文MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 中, 作者详细阐述了深度可分离卷积的原理, 并介绍了两个用于调节模型大小的参数: Width Multiplier 和 Resolution Multiplier。

Depthwise Separable Convolution

深度可分离卷积最早由L.Sifre在Rigid-motion scattering for image classification一文中提出, 后被Google应用于Inception和Xeption网络中。 我们都知道, 标准卷积的卷积核是作用于所有的通道的,可看作是所有通道的二维卷积的加权和。 若卷积核的大小为df, 输入的通道数为C, 则单次卷积的计算量为dfdfC。而深度可分离卷积则将这个过程分成了一个Depwise Conv和一个1x1 Conv(又称Pointwise Conv)。所谓Depthwise Conv, 是指每个通道使用独立的卷积。这个过程可以用下图表示

另有一个更加形象一点图:

与标准卷积的比较:

可以看到, 深度可分离卷积把标准卷积的 卷积 * 通道数的操作 变成了一个加法操作。其相对于标准卷积的计算量:

一般来说, 卷积的大小都很小(通常都不大于3),因此, 深度可分离卷积的计算量相当于标准卷积的1/DK^2.

MobileNet的完整网络结构如下:

它的参数量绝大部分都集中于1x1卷积和全连接层上:

Width Multiplier and Resolution Multiplier

这两个参数是用来控制模型的大小的。 其中, width Multiplier 用于控制通道的数量。 即通道数 = 正常通道数 * width multiplier。而resolution multiplier则直接通过修改输入图片的大小来反应。其对参数量的影响如下:

附录

基于Tensorflow的实现(摘自Github: https://github.com/Zehaos/MobileNet):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def mobilenet(inputs,
is_training=True,
width_multiplier=1,
scope='MobileNet'):
def _depthwise_separable_conv(inputs,
num_pwc_filters,
width_multiplier,
sc,
downsample=False):
""" Helper function to build the depth-wise separable convolution layer.
"""
num_pwc_filters = round(num_pwc_filters * width_multiplier)
_stride = 2 if downsample else 1

# skip pointwise by setting num_outputs=None
depthwise_conv = slim.separable_convolution2d(inputs,
num_outputs=None,
stride=_stride,
depth_multiplier=1,
kernel_size=[3, 3],
scope=sc+'/depthwise_conv')

bn = slim.batch_norm(depthwise_conv, scope=sc+'/dw_batch_norm')
pointwise_conv = slim.convolution2d(bn,
num_pwc_filters,
kernel_size=[1, 1],
scope=sc+'/pointwise_conv')
bn = slim.batch_norm(pointwise_conv, scope=sc+'/pw_batch_norm')
return bn

with tf.variable_scope(scope) as sc:
end_points_collection = sc.name + '_end_points'
with slim.arg_scope([slim.convolution2d, slim.separable_convolution2d],
activation_fn=None,
outputs_collections=[end_points_collection]):
with slim.arg_scope([slim.batch_norm],
is_training=is_training,
activation_fn=tf.nn.relu):
net = slim.convolution2d(inputs, round(32 * width_multiplier), [3, 3], stride=2, padding='SAME', scope='conv_1')
net = slim.batch_norm(net, scope='conv_1/batch_norm')
net = _depthwise_separable_conv(net, 64, width_multiplier, sc='conv_ds_2')
net = _depthwise_separable_conv(net, 128, width_multiplier, downsample=True, sc='conv_ds_3')
net = _depthwise_separable_conv(net, 128, width_multiplier, sc='conv_ds_4')
net = _depthwise_separable_conv(net, 256, width_multiplier, downsample=True, sc='conv_ds_5')
net = _depthwise_separable_conv(net, 256, width_multiplier, sc='conv_ds_6')
net = _depthwise_separable_conv(net, 512, width_multiplier, downsample=True, sc='conv_ds_7')

net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_8')
net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_9')
net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_10')
net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_11')
net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_12')

net = _depthwise_separable_conv(net, 1024, width_multiplier, downsample=True, sc='conv_ds_13')
net = _depthwise_separable_conv(net, 1024, width_multiplier, sc='conv_ds_14')

end_points = slim.utils.convert_collection_to_dict(end_points_collection)

return end_points
Astrous Convolution NIMA: Neural Image Assessment

Комментарии

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×