经典的卷积神经网络模型 - VGGNet

标签：kernel VGGNet 卷积 times padding stride 神经网络 512 size

经典的卷积神经网络模型 - VGGNet

flyfish

VGG网络的名称来源于其开发团队——牛津大学的视觉几何组（Visual Geometry Group）
在2014年，牛津大学的视觉几何组和Google DeepMind公司的研究人员也不例外，研发了一个名为VGG的网络， VGG网络的一个主要贡献是展示了网络的深度（即层数）在提高图像识别性能方面的重要性。他们证明了，通过增加网络的层数，可以显著提高模型的性能。
他们使用了一种非常简单却有效的设计方式，所有的卷积层都使用相同的小卷积核（3x3），这使得网络结构更为一致和简单。
VGG网络由多个卷积层和池化层堆叠而成，卷积层负责提取图像的特征，池化层则用于缩小特征图的尺寸。
这些层之后，网络还有几个全连接层，用于对提取的特征进行分类。
具体来说，VGG16模型有16个权重层，其中包含13个卷积层和3个全连接层。
VGG网络有多个变体，例如VGG11、VGG13、VGG16、VGG19等，这些变体的数字表示网络中权重层（卷积层和全连接层）的总数。例如，VGG16有16个权重层，VGG19有19个权重层。

import torchvision.models as models
vgg16 = models.vgg16()
print(vgg16)

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

手工用函数实现VGGNet

import torch
import torch.nn as nn

class VGG(nn.Module):
    def __init__(self):
        super(VGG, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        self.avgpool = nn.AdaptiveAvgPool2d((7, 7))
        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 1000),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

vgg16 = VGG()
print(vgg16)

特点

特征图尺寸单调递减：是的，从输入的224x224开始，经过每一个MaxPooling层，特征图的尺寸都在减小：224 -> 112 -> 56 -> 28 -> 14 -> 7。
特征图数量单调递增：是的，从输入的3个通道开始，特征图的数量随着网络的加深不断增加：3 -> 64 -> 128 -> 256 -> 512。

假设输入图像的大小为 $224 \times 224 $（RGB图像，3个通道）。

第一段卷积层和池化层 ：

输入： 224 × 224 × 3 224 \times 224 \times 3 224×224×3
Conv2d(3, 64, kernel_size=3, stride=1, padding=1) -> 输出： 224 × 224 × 64 224 \times 224 \times 64 224×224×64
ReLU(inplace=True)
输入： 224 × 224 × 64 224 \times 224 \times 64 224×224×64
Conv2d(64, 64, kernel_size=3, stride=1, padding=1) -> 输出： 224 × 224 × 64 224 \times 224 \times 64 224×224×64
ReLU(inplace=True)
输入： 224 × 224 × 64 224 \times 224 \times 64 224×224×64
MaxPool2d(kernel_size=2, stride=2) -> 输出： 112 × 112 × 64 112 \times 112 \times 64 112×112×64

第二段卷积层和池化层 ：

输入： 112 × 112 × 64 112 \times 112 \times 64 112×112×64
Conv2d(64, 128, kernel_size=3, stride=1, padding=1) -> 输出： 112 × 112 × 128 112 \times 112 \times 128 112×112×128
ReLU(inplace=True)
输入： 112 × 112 × 128 112 \times 112 \times 128 112×112×128
Conv2d(128, 128, kernel_size=3, stride=1, padding=1) -> 输出： 112 × 112 × 128 112 \times 112 \times 128 112×112×128
ReLU(inplace=True)
输入： 112 × 112 × 128 112 \times 112 \times 128 112×112×128
MaxPool2d(kernel_size=2, stride=2) -> 输出： 56 × 56 × 128 56 \times 56 \times 128 56×56×128

第三段卷积层和池化层 ：

输入： 56 × 56 × 128 56 \times 56 \times 128 56×56×128
Conv2d(128, 256, kernel_size=3, stride=1, padding=1) -> 输出： 56 × 56 × 256 56 \times 56 \times 256 56×56×256
ReLU(inplace=True)
输入： 56 × 56 × 256 56 \times 56 \times 256 56×56×256
Conv2d(256, 256, kernel_size=3, stride=1, padding=1) -> 输出： 56 × 56 × 256 56 \times 56 \times 256 56×56×256
ReLU(inplace=True)
输入： 56 × 56 × 256 56 \times 56 \times 256 56×56×256
Conv2d(256, 256, kernel_size=3, stride=1, padding=1) -> 输出： 56 × 56 × 256 56 \times 56 \times 256 56×56×256
ReLU(inplace=True)
输入： 56 × 56 × 256 56 \times 56 \times 256 56×56×256
MaxPool2d(kernel_size=2, stride=2) -> 输出： 28 × 28 × 256 28 \times 28 \times 256 28×28×256

第四段卷积层和池化层 ：

输入： 28 × 28 × 256 28 \times 28 \times 256 28×28×256
Conv2d(256, 512, kernel_size=3, stride=1, padding=1) -> 输出： 28 × 28 × 512 28 \times 28 \times 512 28×28×512
ReLU(inplace=True)
输入： 28 × 28 × 512 28 \times 28 \times 512 28×28×512
Conv2d(512, 512, kernel_size=3, stride=1, padding=1) -> 输出： 28 × 28 × 512 28 \times 28 \times 512 28×28×512
ReLU(inplace=True)
输入： 28 × 28 × 512 28 \times 28 \times 512 28×28×512
Conv2d(512, 512, kernel_size=3, stride=1, padding=1) -> 输出： 28 × 28 × 512 28 \times 28 \times 512 28×28×512
ReLU(inplace=True)
输入： 28 × 28 × 512 28 \times 28 \times 512 28×28×512
MaxPool2d(kernel_size=2, stride=2) -> 输出： 14 × 14 × 512 14 \times 14 \times 512 14×14×512

第五段卷积层和池化层 ：

输入： 14 × 14 × 512 14 \times 14 \times 512 14×14×512
Conv2d(512, 512, kernel_size=3, stride=1, padding=1) -> 输出： 14 × 14 × 512 14 \times 14 \times 512 14×14×512
ReLU(inplace=True)
输入： 14 × 14 × 512 14 \times 14 \times 512 14×14×512
Conv2d(512, 512, kernel_size=3, stride=1, padding=1) -> 输出： 14 × 14 × 512 14 \times 14 \times 512 14×14×512
ReLU(inplace=True)
输入： 14 × 14 × 512 14 \times 14 \times 512 14×14×512
Conv2d(512, 512, kernel_size=3, stride=1, padding=1) -> 输出： 14 × 14 × 512 14 \times 14 \times 512 14×14×512
ReLU(inplace=True)
输入： 14 × 14 × 512 14 \times 14 \times 512 14×14×512
MaxPool2d(kernel_size=2, stride=2) -> 输出： 7 × 7 × 512 7 \times 7 \times 512 7×7×512

平均池化层 ：

输入： 7 × 7 × 512 7 \times 7 \times 512 7×7×512
AdaptiveAvgPool2d(output_size=(7, 7)) -> 输出： 7 × 7 × 512 7 \times 7 \times 512 7×7×512

全连接层 ：

输入： 7 × 7 × 512 = 25088 7 \times 7 \times 512 = 25088 7×7×512=25088 （展平）
Linear(25088, 4096) -> 输出： 4096
ReLU(inplace=True)
Dropout(p=0.5)
输入： 4096
Linear(4096, 4096) -> 输出： 4096
ReLU(inplace=True)
Dropout(p=0.5)
输入： 4096
Linear(4096, 1000) -> 输出： 1000

特征图的尺寸都在减小：224 -> 112 -> 56 -> 28 -> 14 -> 7。
特征图数量单调递增：3 -> 64 -> 128 -> 256 -> 512。

如何数层数(数权重层)

卷积层 (Convolutional Layers)

1. Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
2. Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
3. Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
4. Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
5. Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
6. Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
7. Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
8. Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
9. Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
10. Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
11. Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
12. Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
13. Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

全连接层 (Fully Connected Layers)

1. Linear(in_features=25088, out_features=4096, bias=True)
2. Linear(in_features=4096, out_features=4096, bias=True)
3. Linear(in_features=4096, out_features=1000, bias=True)

统计权重层

卷积层和全连接层总共13+3=16层，因此命名为VGG16。

注意：ReLU和MaxPooling层不算作权重层，因为它们不包含可训练的参数。
在计算神经网络的层数时，ReLU和MaxPooling层不计入可训练层。可训练层指的是那些在训练过程中具有可调节权重的层，如卷积层（Conv2d）和全连接层（Linear）。

ReLU 只对输入进行非线性变换
MaxPooling 只对输入进行下采样数

ReLU层

ReLU（Rectified Linear Unit）是一种激活函数，定义为 f ( x ) = max ⁡ ( 0 , x ) f(x) = \max(0, x) f(x)=max(0,x)
它对输入值进行逐元素的非线性变换，将所有负值设为0，正值保持不变。
ReLU层本质上只是一个数学操作，不涉及任何参数。因此，在训练过程中，没有任何权重需要更新。

MaxPooling层

MaxPooling是一种池化操作，用于减小特征图的尺寸，同时保留最重要的特征。常见的最大池化操作是2x2窗口，每个窗口选取其中的最大值。 MaxPooling层通过固定的规则（选择每个窗口中的最大值）来操作输入特征图，没有任何需要学习的参数。

标签：kernel,VGGNet,卷积,times,padding,stride,神经网络,512,size
From： https://blog.csdn.net/flyfish1986/article/details/140107883

经典的卷积神经网络模型 - VGGNet