MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Jan 20 2019 7 minutes de lectura (Alrededor de 1021 palabras)

MobileNet是由Google团队提出的应用于移动及嵌入式设备的轻量级神经网络。在这些场景中，由于对时延的实时要求，模型需要运行在端侧。因此，对于模型的预测速度、大小都有比较高的要求，同时不能牺牲过多的精度。在此之前，一般的做法是对神经网络进行压缩，或是直接训练较小的神经网络。而MobileNet另辟蹊径，通过深度可分离卷积(Depth-wise Separable Convolution)大大减少了参数数量和计算量(Mutli-Adds)。

在论文MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 中，作者详细阐述了深度可分离卷积的原理，并介绍了两个用于调节模型大小的参数: Width Multiplier 和 Resolution Multiplier。

Depthwise Separable Convolution

深度可分离卷积最早由L.Sifre在Rigid-motion scattering for image classification一文中提出，后被Google应用于Inception和Xeption网络中。我们都知道，标准卷积的卷积核是作用于所有的通道的，可看作是所有通道的二维卷积的加权和。若卷积核的大小为df, 输入的通道数为C, 则单次卷积的计算量为dfdfC。而深度可分离卷积则将这个过程分成了一个Depwise Conv和一个1x1 Conv(又称Pointwise Conv)。所谓Depthwise Conv，是指每个通道使用独立的卷积。这个过程可以用下图表示

另有一个更加形象一点图:

与标准卷积的比较:

可以看到，深度可分离卷积把标准卷积的卷积 * 通道数的操作变成了一个加法操作。其相对于标准卷积的计算量:

一般来说，卷积的大小都很小（通常都不大于3），因此，深度可分离卷积的计算量相当于标准卷积的1/DK^2.

MobileNet的完整网络结构如下:

它的参数量绝大部分都集中于1x1卷积和全连接层上:

Width Multiplier and Resolution Multiplier

这两个参数是用来控制模型的大小的。其中， width Multiplier 用于控制通道的数量。即通道数 = 正常通道数 * width multiplier。而resolution multiplier则直接通过修改输入图片的大小来反应。其对参数量的影响如下:

附录

基于Tensorflow的实现(摘自Github: https://github.com/Zehaos/MobileNet):

def mobilenet(inputs,
          is_training=True,
          width_multiplier=1,
          scope='MobileNet'):
  def _depthwise_separable_conv(inputs,
                                num_pwc_filters,
                                width_multiplier,
                                sc,
                                downsample=False):
    """ Helper function to build the depth-wise separable convolution layer.
    """
    num_pwc_filters = round(num_pwc_filters * width_multiplier)
    _stride = 2 if downsample else 1

    # skip pointwise by setting num_outputs=None
    depthwise_conv = slim.separable_convolution2d(inputs,
                                                  num_outputs=None,
                                                  stride=_stride,
                                                  depth_multiplier=1,
                                                  kernel_size=[3, 3],
                                                  scope=sc+'/depthwise_conv')

    bn = slim.batch_norm(depthwise_conv, scope=sc+'/dw_batch_norm')
    pointwise_conv = slim.convolution2d(bn,
                                        num_pwc_filters,
                                        kernel_size=[1, 1],
                                        scope=sc+'/pointwise_conv')
    bn = slim.batch_norm(pointwise_conv, scope=sc+'/pw_batch_norm')
    return bn

  with tf.variable_scope(scope) as sc:
    end_points_collection = sc.name + '_end_points'
    with slim.arg_scope([slim.convolution2d, slim.separable_convolution2d],
                        activation_fn=None,
                        outputs_collections=[end_points_collection]):
      with slim.arg_scope([slim.batch_norm],
                          is_training=is_training,
                          activation_fn=tf.nn.relu):
        net = slim.convolution2d(inputs, round(32 * width_multiplier), [3, 3], stride=2, padding='SAME', scope='conv_1')
        net = slim.batch_norm(net, scope='conv_1/batch_norm')
        net = _depthwise_separable_conv(net, 64, width_multiplier, sc='conv_ds_2')
        net = _depthwise_separable_conv(net, 128, width_multiplier, downsample=True, sc='conv_ds_3')
        net = _depthwise_separable_conv(net, 128, width_multiplier, sc='conv_ds_4')
        net = _depthwise_separable_conv(net, 256, width_multiplier, downsample=True, sc='conv_ds_5')
        net = _depthwise_separable_conv(net, 256, width_multiplier, sc='conv_ds_6')
        net = _depthwise_separable_conv(net, 512, width_multiplier, downsample=True, sc='conv_ds_7')

        net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_8')
        net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_9')
        net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_10')
        net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_11')
        net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_12')

        net = _depthwise_separable_conv(net, 1024, width_multiplier, downsample=True, sc='conv_ds_13')
        net = _depthwise_separable_conv(net, 1024, width_multiplier, sc='conv_ds_14')

    end_points = slim.utils.convert_collection_to_dict(end_points_collection)

  return end_points

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Depthwise Separable Convolution

Width Multiplier and Resolution Multiplier

附录

Комментарии

Your browser is out-of-date!