经典CNN模型（九）：MobileNetV3（PyTorch详细注释版）

标签：kernel nn MobileNetV3 self stride PyTorch CNN True

一. MobileNetV3 神经网络介绍

MobileNetV3 是 MobileNet 系列的第三代模型，由 Google 在 2019 年提出，旨在进一步优化模型的效率和性能，特别是在移动设备和边缘计算设备上。与前一代相比，MobileNetV3引入了多项改进，包括使用神经架构搜索（Neural Architecture Search, NAS）、自适应的激活函数以及注意力机制（Squeeze-and-Excitation blocks）。下面是 MobileNetV3 的一些关键特点：

更新 Block（bneck）
使用 NAS 搜索参数（Neural Architecture Search）
重新设计耗时层结构

相比于 MobileNetV2 ，MobileNetV3 性能上在 ImageNet 分类任务中正确率上升了 3.2%，计算延时降低了 20%。

二. 概念拓展

1. 激活函数（Activation Functions）

Sigmoid 函数是一种常用的激活函数，在神经网络中用于引入非线性。Sigmoid 函数的数学形式为： f ( x ) = σ ( x ) = 1 1 + e − x f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} f(x)=σ(x)=1+e−x1这个函数将实数域内的任何输入值映射到 (0, 1) 区间内。具体来说，当输入 ( x )为负无穷大时，函数的输出趋近于 0；当输入 ( x ) 为正无穷大时，输出趋近于 1；而当输入为 0 时，输出正好为 0.5。

由于 Sigmoid 函数计算、求导复杂，对量化过程不友好，MobileNetV3 引入了两种激活函数，分别是 ReLU6 和 Hard-Swish，它们分别在不同的 block 中使用。

ReLU6: 一种截断的 ReLU 函数，将输出值限制在 0 和 6 之间。在某些 block 中使用，尤其是在模型的早期阶段。
Hard-Swish: 一种硬编码版本的 Swish 函数，计算效率更高，定义如下：
Hard-Swish ( x ) = x ∗ HardSigmoid ( x ) \text{Hard-Swish}(x) = x * \text{HardSigmoid}(x) Hard-Swish(x)=x∗HardSigmoid(x)其中， HardSigmoid ( x ) = max ( 0 , min ( 1 , ( x + 3 6 ) ) ) \text{HardSigmoid}(x) = \text{max}(0, \text{min}(1, (\frac{x + 3}{6}))) HardSigmoid(x)=max(0,min(1,(6x+3)))

他们的函数图像对比如下：
在这里插入图片描述

2. SE模块（squeeze-and-excitation ）

在这里插入图片描述
上图片展示了一个简化版的 squeeze-and-excitation (SE) 模块的流程图，这是MobileNetV3 中用来增强特征表示的一个关键组成部分。以下是详细的步骤解释：

Global Average Pooling (AvgPooling): 输入的特征图首先经过全局平均池化操作，将每个通道的信息汇总成一个单一的数值。这个操作会忽略空间信息，只关注每个通道的强度分布情况。
Fully Connected Layers (FC1 and FC2): 接下来，这些汇总后的通道描述符会被送入两个全连接层。第一个全连接层（FC1）会降低通道数量，通常是通过一个比例因子（例如，将通道数减少到原来的1/8）。这个层通常会使用 ReLU 作为激活函数来引入非线性。第二个全连接层（FC2）会将通道数量恢复到原始的数量，并且生成一组权重，这些权重代表了各个通道的重要性。
H-sig (Hard-Sigmoid): 这些权重会通过一个 hard-sigmoid 函数，这是一个硬编码版本的sigmoid函数，它的输出范围限定在 [0, 1] 之间。这个函数的作用是将权重转换为概率分布，确保所有通道的权重之和接近 1。
Feature Map Scaling: 最后，生成的权重会被逐元素相乘（element-wise multiplication）应用到原始的特征图上，这样就可以调整每个通道的相对重要性。这个过程被称为特征映射缩放（feature map scaling），它增强了模型对不同特征的关注程度。

SE模块的主要目标是让网络自动学习并调整特征图中各个通道之间的权重关系，以便更好地关注重要的特征。这个过程提高了模型的表达能力和泛化能力，尤其对于复杂的图像分类任务来说效果显著。

需要注意的是，虽然这个流程图显示了使用 ReLU 和 hard-sigmoid 作为激活函数，但是在实际的 MobileNetV3 中，可能会使用其他的激活函数，如 ReLU6 或 hard-swish，具体取决于具体的实现和配置。

这个模块的设计思想很简单，就是对每个通道进行池化处理，然后通过两个全连接层得到一个通道描述符，最终将学习到的权重应用到原始特征图上。下面举个例子来解释详细的步骤：
在这里插入图片描述

通道池化（Channel Pooling）: 首先，对每个通道进行全局平均池化，将每个通道的信息汇总为一个单一的数值。在这个例子中，我们有两个通道，所以我们会得到两个值：0.25 和 0.3。
全连接层（FC1 和 FC2）: 接着，这两个通道描述符会被送入两个全连接层。第一个全连接层（FC1）的节点数量是通道数的四分之一，这里为 2。注意，这个层通常会使用 ReLU 作为激活函数。第二个全连接层（FC2）的节点数量与通道数相同，也是 2。在这里，我们使用了 hard-sigmoid 函数作为激活函数，它将权重转换为概率分布，保证所有通道的权重之和接近 1。
特征映射缩放（Feature Map Scaling）: 最后，生成的权重会被逐元素相乘（element-wise multiplication）应用到原始的特征图上，这样就可以调整每个通道的相对重要性。在这个例子中，我们得到了新的权重 0.5 和 0.6，它们分别对应于原始的通道。

3. 更新 BlocK（bneck）

在这里插入图片描述

MobileNetV3在MobileNetV2的基础上做了以下改进：

加入SE模块:
- Squeeze-and-Excitation (SE) 模块被添加到了 MobileNetV3的Block 中。SE 模块是一种注意力机制，它可以动态调整特征图中各通道的权重，以提高模型的表达能力。具体来说，SE 模块会对特征图进行全局平均池化，然后通过两个全连接层（FC layers）来学习每个通道的重要性，最后将这些权重应用回原始特征图，从而强调重要的特征而抑制不太重要的特征。
更新了激活函数:
- MobileNetV3 采用了两种新的激活函数：ReLU6 和 Hard-Swish。ReLU6 是一个截断的 ReLU 函数，其输出范围限制在 [0, 6] 之间；而 Hard-Swish 则是 ReLU6 的一种平滑近似，当输入较小时，它的行为类似于线性的 ReLU，而在输入较大时，它表现得像 ReLU6。这两种激活函数都旨在提高模型的性能，特别是在低功耗设备上。
固定 SE 模块的大小:
- 在 MobileNetV3 中，SE 模块的大小被固定为了输入通道数的四分之一，而不是像 MobileNetV2 那样根据 Block 的大小变化。这种做法增加了模型的准确性，尽管参数数量有所增加，但是由于 SE 模块的计算成本较低，因此总体上并没有显著增加模型的复杂性。
优化了 Block 结构:
- MobileNetV3 的 Block 结构也进行了优化，比如在某些情况下移除了 shortcut 连接。只有在 stride=1 并且 input_c=output_c 时才有 shortcut 连接。这有助于减少计算开销，使模型更加轻量化。
Large squeeze-and-excite:
- 在论文中提到，MobileNetV3 的 SE 模块被替换成了"Large squeeze-and-excite"，这意味着 SE 模块的大小被固定为了输入通道数的四分之一。这样做可以在适度增加参数数量的情况下提高准确率，而且不会显著增加计算复杂性。

4. 重新设计耗时层结构

在这里插入图片描述

在 MobileNetV3 的设计中，研究者们着重考虑了如何降低模型的延迟时间和计算操作的数量，同时保持良好的精度。以下是他们对耗时层结构所做的改进：

减少第一个卷积层的卷积核个数：
- 原来的第一个卷积层包含 32 个卷积核，现在将其减半至 16 个。即使如此，使用硬 Swish（H-Swish）激活函数的模型仍能在保持与之前相同的精度水平。这一改变节省了额外的 2 毫秒和 1 亿次 MAC 运算（Multiply-Accumulate operations），降低了计算负担。
精简 Last Stage：
- 对于最后一个阶段（Last Stage），原设计包括一系列的卷积层、批量归一化（BN）、以及 H-Swish 激活函数。研究者们简化了这个阶段，减少了不必要的计算步骤。具体来说，他们删除了一些卷积层，并且在保留精度的同时减少了计算量。
- 图中的 “Original Last Stage” 部分展示了原始的最后阶段结构，包含了多个卷积层和激活函数。而 “Efficient Last Stage” 则表示优化后的版本，可以看到它比原始结构更简洁，减少了几个卷积层，进一步降低了延迟时间。
- 精简后的 Last Stage 减少了7毫秒的延迟，这是总运行时间的约 11%，同时也减少了大约 3 千万次 MAC 运算，几乎不影响模型的准确性。这部分内容可以在第6节找到详细的实验结果。

三. MobileNetV3 神经网络结构

MobileNetV3 是一种轻量级的深度学习模型，专为移动端和其他资源受限的设备设计。它有Large和Small两个变体，分别针对不同的应用场景和计算资源。

MobileNetV3 Large

这个变体通常用于对精度要求较高的场景，比如在设备上资源相对充足的情况下。MobileNetV3 Large 的结构如下：
在这里插入图片描述

输入层：接收输入图像，通常是 224x224 大小。
第一层：标准的卷积层，输出通道数为 16。
接下来是多个深度可分离卷积块（inverted residual blocks），其中包含：
- 扩张点卷积（expanded pointwise convolutions）
- 深度卷积（depthwise convolutions）
- 点卷积（pointwise convolutions）
- Squeeze-and-Excitation (SE) 模块，用于特征重标定
最后几层包括全局平均池化（Global Average Pooling，GAP）和一个全连接层（分类器）。

MobileNetV3 Small

这个变体适用于计算资源有限的场景，例如在低功耗设备上。MobileNetV3 Small 结构较为紧凑，减少了计算量和参数数量，以牺牲一些精度为代价换取更高的效率。其结构如下：
在这里插入图片描述

输入层：同样接收 224x224 大小的输入图像。
第一层：标准卷积层，输出通道数较少，通常为 16。
同样是一系列的深度可分离卷积块，但是每个块的通道数、扩张因子和是否使用 SE 模块可能有所不同，以适应更小的模型规模。
结尾处也是全局平均池化和分类器，但通道数和结构可能与 Large 版本不同。

下面是一个简化版的MobileNetV3 Large和MobileNetV3 Small的结构概览：

MobileNetV3 Large:

Conv: 16 -> 16 (expansion=1, kernel=3, stride=2)
InvRes: 16 -> 24 (expansion=4, kernel=3, stride=2)
InvRes: 24 -> 24 (expansion=3, kernel=3, stride=1)
InvRes: 24 -> 40 (expansion=3, kernel=5, stride=2, SE=True)
InvRes: 40 -> 40 (expansion=3, kernel=5, stride=1, SE=True)
InvRes: 40 -> 40 (expansion=3, kernel=5, stride=1, SE=True)
InvRes: 40 -> 80 (expansion=6, kernel=3, stride=2)
InvRes: 80 -> 80 (expansion=2.5, kernel=3, stride=1)
InvRes: 80 -> 80 (expansion=2.3, kernel=3, stride=1)
InvRes: 80 -> 80 (expansion=2.3, kernel=3, stride=1)
InvRes: 80 -> 112 (expansion=6, kernel=3, stride=1, SE=True)
InvRes: 112 -> 112 (expansion=6, kernel=3, stride=1, SE=True)
InvRes: 112 -> 160 (expansion=6, kernel=5, stride=2)
InvRes: 160 -> 160 (expansion=6, kernel=5, stride=1, SE=True)
InvRes: 160 -> 160 (expansion=6, kernel=5, stride=1, SE=True)
Conv: 160 -> 960 (expansion=6, kernel=1, stride=1)
Conv: 960 -> 1280 (kernel=1, stride=1)

MobileNetV3 Small:

Conv: 16 -> 16 (expansion=1, kernel=3, stride=2)
InvRes: 16 -> 16 (expansion=16, kernel=3, stride=2)
InvRes: 16 -> 24 (expansion=72, kernel=3, stride=2)
InvRes: 24 -> 24 (expansion=88, kernel=3, stride=1)
InvRes: 24 -> 40 (expansion=96, kernel=5, stride=2, SE=True)
InvRes: 40 -> 40 (expansion=240, kernel=5, stride=1, SE=True)
InvRes: 40 -> 40 (expansion=240, kernel=5, stride=1, SE=True)
InvRes: 40 -> 48 (expansion=120, kernel=5, stride=1)
InvRes: 48 -> 48 (expansion=144, kernel=5, stride=1)
InvRes: 48 -> 96 (expansion=288, kernel=5, stride=2)
InvRes: 96 -> 96 (expansion=576, kernel=5, stride=1, SE=True)
InvRes: 96 -> 96 (expansion=576, kernel=5, stride=1, SE=True)
Conv: 96 -> 576 (expansion=576, kernel=1, stride=1)
Conv: 576 -> 1024 (kernel=1, stride=1)

请注意，上述结构中涉及的参数如扩张因子、卷积核大小、步长和输出通道数可能会根据具体的实现和预训练权重而略有不同。

四. MobileNetV3 模型亮点

MobileNetV3 是 MobileNet 系列的最新迭代，由 Google在2019 年提出，它在前代模型的基础上进行了多方面的优化和创新，旨在提供更高效、更精确的轻量级深度学习模型。以下是 MobileNetV3 的几个主要亮点：

神经架构搜索（Neural Architecture Search, NAS）:
MobileNetV3 的设计在很大程度上受益于神经架构搜索技术。通过自动化的搜索算法，研究人员能够找到在计算效率和模型精度之间取得良好平衡的网络架构，这使得 MobileNetV3 能够针对不同设备和应用需求进行优化。
自适应激活函数（Adaptive Activation Function）:
MobileNetV3 引入了 Hard-Swish 作为激活函数，这是一种硬编码版本的 Swish 函数，它在计算上更加高效，同时能够提供非线性和饱和效应，有利于模型训练和性能提升。
Squeeze-and-Excitation (SE) 模块:
SE 模块是一种注意力机制，它通过自适应地调整特征图中各个通道的权重，来增强模型对重要特征的捕捉能力。在 MobileNetV3 中，SE 模块被集成到网络的多个层次，显著提高了模型的表达能力和泛化能力。
Large 和 Small版本:
MobileNetV3 提供了两种版本，Large 版本适用于计算资源较丰富的环境，而Small 版本则更适用于资源受限的场景。这种设计灵活性使得模型能够适应不同的硬件条件和应用需求。
模型可扩展性:
MobileNetV3 允许用户通过调整宽度乘数和分辨率乘数来控制模型的大小和复杂度，这使得模型可以被轻松调整以适应不同的计算平台和任务需求。
优化的最后阶段（Efficient Last Stage）:
为了进一步减少延迟时间，MobileNetV3 优化了模型的最后阶段，通过减少不必要的操作，例如减少卷积层的数量，这在保持模型精度的同时显著降低了计算量和运行时间。
广泛的适用性:
MobileNetV3 不仅在图像分类任务上表现出色，还可以通过简单的结构调整应用于目标检测、语义分割等其他计算机视觉任务，展现出强大的通用性和适应性。

这些亮点共同使得 MobileNetV3 成为一个极其高效和实用的深度学习模型，尤其适用于移动设备和边缘计算设备上的实时应用。无论是对于学术研究还是工业应用，MobileNetV3 都提供了一个优秀的基线模型，可以在此基础上进行进一步的定制和优化。

五. MobileNetV3 代码实现

开发环境配置说明：本项目使用 Python 3.6.13 和 PyTorch 1.10.2 构建，适用于CPU环境。

model.py：定义网络模型
train.py：加载数据集并训练，计算 loss 和 accuracy，保存训练好的网络参数
predict.py：用自己的数据集进行分类测试

model.py

import torch
from torch import nn, Tensor
from torch.nn import functional as F
from functools import partial
from typing import Optional, List, Callable


def _make_divisible(ch, divisor=8, min_ch=None):
    """
    :param ch: 输入特征矩阵的channel
    :param divisor: 基数
    :param min_ch: 最小通道数
    """
    if min_ch is None:
        min_ch = divisor
    #   将ch调整到距离8最近的整数倍
    #   int(ch + divisor / 2) // divisor 向上取整
    new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
    #   确保向下取整时不会减少超过10%
    if new_ch < 0.9 * ch:
        new_ch += divisor
    return new_ch


#   定义 卷积-BN-激活函数 联合操作
class ConvBNActivation(nn.Sequential):
    def __init__(self,
                 in_planes: int,
                 out_planes: int,
                 kernel_size: int = 3,
                 stride: int = 1,
                 groups: int = 1,
                 #  BN层
                 norm_layer: Optional[Callable[..., nn.Module]] = None,
                 #  激活函数
                 activation_layer: Optional[Callable[..., nn.Module]] = None):
        padding = (kernel_size - 1) // 2
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.ReLU6
        super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
                                                         out_channels=out_planes,
                                                         kernel_size=kernel_size,
                                                         stride=stride,
                                                         padding=padding,
                                                         groups=groups,
                                                         bias=False),
                                                         norm_layer(out_planes),
                                                         activation_layer(inplace=True))


#   SE模块
class SqueezeExcitaion(nn.Module):
    def __init__(self, input_c: int, squeeze_factor: int = 4):
        super(SqueezeExcitaion, self).__init__()
        squeeze_c = _make_divisible(input_c // squeeze_factor, 8)
        self.fc1 = nn.Conv2d(input_c, squeeze_c, 1)
        self.fc2 = nn.Conv2d(squeeze_c, input_c, 1)

    def forward(self, x: Tensor)-> Tensor:
        scale = F.adaptive_avg_pool2d(x, output_size=(1, 1))
        scale = self.fc1(scale)
        scale = F.relu(scale, inplace=True)
        scale = self.fc2(scale)
        scale = F.hardsigmoid(scale, inplace=True)
        return scale * x


#   定义V3的Config文件
class InvertedResidualConfig:
    def __init__(self,
                 input_c: int,
                 kernel: int,
                 expanded_c: int,
                 out_c: int,
                 use_se: bool,
                 activation: str,
                 stride: int,
                 #  阿尔法参数
                 width_multi: float):
        self.input_c = self.adjust_channels(input_c, width_multi)
        self.kernel = kernel
        self.expanded_c = self.adjust_channels(expanded_c, width_multi)
        self.out_c = self.adjust_channels(out_c, width_multi)
        self.use_se = use_se
        self.use_hs = activation == "HS"
        self.stride = stride

    @staticmethod
    def adjust_channels(channels: int, width_multi: float):
        return _make_divisible(channels * width_multi, 8)


#   V3 倒残差结构
class InvertedResidual(nn.Module):
    def __init__(self,
                 cnf: InvertedResidualConfig,
                 norm_layer: Optional[Callable[..., nn.Module]]):
        super(InvertedResidual, self).__init__()

        #   判断步幅是否正确
        if cnf.stride not in [1, 2]:
            raise ValueError("illegal stride value.")

        #   初始化 block 为 Identity 模块，确保即使在没有需要额外操作的情况下，
        #   self.block 仍是一个有效的 PyTorch 模块，可以被调用。
        #   这样做可以防止在前向传播中出现 AttributeError。
        self.block = nn.Identity()  # 或者 self.block = nn.Sequential()

        #   判断是否使用残差连接
        self.use_res_connect = (cnf.stride == 1 and cnf.input_c == cnf.out_c)

        layers = []
        activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU

        #   expand
        #   判断是否需要升维操作
        if cnf.expanded_c != cnf.input_c:
            layers.append(ConvBNActivation(cnf.input_c,
                                           cnf.expanded_c,
                                           kernel_size=1,
                                           norm_layer=norm_layer,
                                           activation_layer=activation_layer))

            # depthwise
            layers.append(ConvBNActivation(cnf.expanded_c,
                                           cnf.expanded_c,
                                           kernel_size=cnf.kernel,
                                           stride=cnf.stride,
                                           groups=cnf.expanded_c,
                                           norm_layer=norm_layer,
                                           activation_layer=activation_layer))
            #   判断是否使用SE结构
            if cnf.use_se:
                layers.append(SqueezeExcitaion(cnf.expanded_c))

            #   project
            layers.append(ConvBNActivation(cnf.expanded_c,
                                           cnf.out_c,
                                           kernel_size=1,
                                           norm_layer=norm_layer,
                                           activation_layer=activation_layer))

            self.block = nn.Sequential(*layers)
            self.out_channel = cnf.out_c

    def forward(self, x):
        result = self.block(x)
        if self.use_res_connect:
            result += x
        return result


class MobileNetV3(nn.Module):
    def __init__(self,
                 inverted_residual_setting: List[InvertedResidualConfig],
                 last_channel: int,
                 num_classes: int = 1000,
                 block: Optional[Callable[..., nn.Module]] = None,
                 norm_layer: Optional[Callable[..., nn.Module]] = None):
        super(MobileNetV3, self).__init__()

        if not inverted_residual_setting:
            raise ValueError("The inverted_residual_setting should not be empty.")
        elif not (isinstance(inverted_residual_setting, List) and
                  all([isinstance(s, InvertedResidualConfig) for s in inverted_residual_setting])):
            raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig]")

        if block is None:
            block = InvertedResidual

        if norm_layer is None:
            norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)

        layers: List[nn.Module] = []

        # building first layer
        firstconv_output_c = inverted_residual_setting[0].input_c
        layers.append(ConvBNActivation(3,
                                       firstconv_output_c,
                                       kernel_size=3,
                                       stride=2,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        # building inverted residual blocks
        for cnf in inverted_residual_setting:
            layers.append(block(cnf, norm_layer))

        # building last several layers
        lastconv_input_c = inverted_residual_setting[-1].out_c
        lastconv_output_c = 6 * lastconv_input_c
        layers.append(ConvBNActivation(lastconv_input_c,
                                       lastconv_output_c,
                                       kernel_size=1,
                                       norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(nn.Linear(lastconv_output_c, last_channel),
                                        nn.Hardswish(inplace=True),
                                        nn.Dropout(p=0.2, inplace=True),
                                        nn.Linear(last_channel, num_classes))

        # initial weights
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out")
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def _forward_impl(self, x: Tensor) -> Tensor:
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)

        return x

    def forward(self, x: Tensor) -> Tensor:
        return self._forward_impl(x)


def mobilenet_v3_large(num_classes: int = 1000,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.

    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_large-8738ca79.pth

    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, False, "RE", 1),
        bneck_conf(16, 3, 64, 24, False, "RE", 2),  # C1
        bneck_conf(24, 3, 72, 24, False, "RE", 1),
        bneck_conf(24, 5, 72, 40, True, "RE", 2),  # C2
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 5, 120, 40, True, "RE", 1),
        bneck_conf(40, 3, 240, 80, False, "HS", 2),  # C3
        bneck_conf(80, 3, 200, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 184, 80, False, "HS", 1),
        bneck_conf(80, 3, 480, 112, True, "HS", 1),
        bneck_conf(112, 3, 672, 112, True, "HS", 1),
        bneck_conf(112, 5, 672, 160 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
        bneck_conf(160 // reduce_divider, 5, 960 // reduce_divider, 160 // reduce_divider, True, "HS", 1),
    ]
    last_channel = adjust_channels(1280 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)


def mobilenet_v3_small(num_classes: int = 1000,
                       reduced_tail: bool = False) -> MobileNetV3:
    """
    Constructs a large MobileNetV3 architecture from
    "Searching for MobileNetV3" <https://arxiv.org/abs/1905.02244>.

    weights_link:
    https://download.pytorch.org/models/mobilenet_v3_small-047dcff4.pth

    Args:
        num_classes (int): number of classes
        reduced_tail (bool): If True, reduces the channel counts of all feature layers
            between C4 and C5 by 2. It is used to reduce the channel redundancy in the
            backbone for Detection and Segmentation.
    """
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig, width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels, width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16, 3, 16, 16, True, "RE", 2),  # C1
        bneck_conf(16, 3, 72, 24, False, "RE", 2),  # C2
        bneck_conf(24, 3, 88, 24, False, "RE", 1),
        bneck_conf(24, 5, 96, 40, True, "HS", 2),  # C3
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 240, 40, True, "HS", 1),
        bneck_conf(40, 5, 120, 48, True, "HS", 1),
        bneck_conf(48, 5, 144, 48, True, "HS", 1),
        bneck_conf(48, 5, 288, 96 // reduce_divider, True, "HS", 2),  # C4
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1),
        bneck_conf(96 // reduce_divider, 5, 576 // reduce_divider, 96 // reduce_divider, True, "HS", 1)
    ]
    last_channel = adjust_channels(1024 // reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
                       last_channel=last_channel,
                       num_classes=num_classes)

train.py

import torch
import torch.nn as nn
from torchvision import transforms, datasets
from torch.utils.data import DataLoader
import torch.optim as optim
from model import mobilenet_v3_large
import os
import json
import torchvision.models.mobilenet


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# print(device)

data_transform = {
    "train" : transforms.Compose([transforms.RandomResizedCrop(224),   # 随机裁剪
                                  transforms.RandomHorizontalFlip(),   # 随机翻转
                                  transforms.ToTensor(),
                                  transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
    "val" : transforms.Compose([transforms.Resize(256),      # 长宽比不变，最小边长缩放到256
                                transforms.CenterCrop(224),  # 中心裁剪到 224x224
                                transforms.ToTensor(),
                                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}

#   获取数据集所在的根目录
#   通过os.getcwd()获取当前的目录，并将当前目录与".."链接获取上一层目录
data_root = os.path.abspath(os.path.join(os.getcwd(), ".."))

#   获取花类数据集路径
image_path = data_root + "/data_set/flower_data/"

#   加载数据集
train_dataset = datasets.ImageFolder(root=image_path + "/train",
                                     transform=data_transform["train"])

#   获取训练集图像数量
train_num = len(train_dataset)

#   获取分类的名称
#   {'daisy': 0, 'dandelion': 1, 'roses': 2, 'sunflowers': 3, 'tulips': 4}
flower_list = train_dataset.class_to_idx

#   采用遍历方法，将分类名称的key与value反过来
cla_dict = dict((val, key) for key, val in flower_list.items())

#   将字典cla_dict编码为json格式
json_str = json.dumps(cla_dict, indent=4)
with open("class_indices.json", "w") as json_file:
    json_file.write(json_str)

batch_size = 16
train_loader = DataLoader(train_dataset,
                          batch_size=batch_size,
                          shuffle=True,
                          num_workers=0)

validate_dataset = datasets.ImageFolder(root=image_path + "/val",
                                        transform=data_transform["val"])
val_num = len(validate_dataset)
validate_loader = DataLoader(validate_dataset,
                             batch_size=batch_size,
                             shuffle=True,
                             num_workers=0)

#   定义模型
net = mobilenet_v3_large(num_classes=5)   # 实例化模型
net.to(device)
model_weight_path = "./mobilenet_v3_large_pre.pth"
#   载入模型权重
pre_weights = torch.load(model_weight_path)
#   删除分类权重
pre_dict = {k: v for k, v in pre_weights.items() if "classifier" not in k}
missing_keys, unexpected_keys = net.load_state_dict(pre_dict, strict=False)
#   冻结除最后全连接层以外的所有权重
for param in net.features.parameters():
    param.requires_grad = False

loss_function = nn.CrossEntropyLoss()   # 定义损失函数
#pata = list(net.parameters())   # 查看模型参数
optimizer = optim.Adam(net.parameters(), lr=0.0001)  # 定义优化器

#   设置存储权重路径
save_path = './mobilenetV3.pth'
best_acc = 0.0
for epoch in range(1):
    # train
    net.train()  # 用来管理Dropout方法：训练时使用Dropout方法，验证时不使用Dropout方法
    running_loss = 0.0  # 用来累加训练中的损失
    for step, data in enumerate(train_loader, start=0):
        #   获取数据的图像和标签
        images, labels = data

        #   将历史损失梯度清零
        optimizer.zero_grad()

        #   参数更新
        outputs = net(images.to(device))                   # 获得网络输出
        loss = loss_function(outputs, labels.to(device))   # 计算loss
        loss.backward()                                    # 误差反向传播
        optimizer.step()                                   # 更新节点参数

        #   打印统计信息
        running_loss += loss.item()
        #   打印训练进度
        rate = (step + 1) / len(train_loader)
        a = "*" * int(rate * 50)
        b = "." * int((1 - rate) * 50)
        print("\rtrain loss: {:^3.0f}%[{}->{}]{:.3f}".format(int(rate * 100), a, b, loss), end="")
    print()

    # validate
    net.eval()  # 关闭Dropout方法
    acc = 0.0
    #   验证过程中不计算损失梯度
    with torch.no_grad():
        for data_test in validate_loader:
            test_images, test_labels = data_test
            outputs = net(test_images.to(device))
            predict_y = torch.max(outputs, dim=1)[1]
            #   acc用来累计验证集中预测正确的数量
            #   对比预测值与真实标签，sum()求出预测正确的累加值，item()获取累加值
            acc += (predict_y == test_labels.to(device)).sum().item()
        accurate_test = acc / val_num
        #   如果当前准确率大于历史最优准确率
        if accurate_test > best_acc:
            #   更新历史最优准确率
            best_acc = accurate_test
            #   保存当前权重
            torch.save(net.state_dict(), save_path)
        #   打印相应信息
        print("[epoch %d] train_loss: %.3f  test_accuracy: %.3f"%
              (epoch + 1, running_loss / step, acc / val_num))

print("Finished Training")

predict

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import mobilenet_v3_large


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize(256),
         transforms.CenterCrop(224),
         transforms.ToTensor(),
         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

    # load image
    img_path = "./郁金香.png"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    with open(json_path, "r") as f:
        class_indict = json.load(f)

    # create model
    model = mobilenet_v3_large(num_classes=5).to(device)

    # load model weights
    weights_path = "./mobilenetV3.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path, map_location=device))

    # prediction
    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)],
                                                  predict[i].numpy()))
    plt.show()


if __name__ == '__main__':
    main()

六. 参考内容

李沐. (2019). 动手学深度学习. 北京: 人民邮电出版社. [ISBN: 978-7-115-51364-9]
霹雳吧啦Wz. (202X). 深度学习实战系列 [在线视频]. 哔哩哔哩. URL
PyTorch. (n.d.). PyTorch官方文档和案例 [在线资源]. URL

标签：kernel,nn,MobileNetV3,self,stride,PyTorch,CNN,True
From： https://blog.csdn.net/qq_51872445/article/details/140599341