首页 > 其他分享 >《动手学深度学习 Pytorch版》 7.7 稠密连接网络

《动手学深度学习 Pytorch版》 7.7 稠密连接网络

时间:2023-09-24 09:02:35浏览次数:41  
标签:稠密 nn torch shape channels Pytorch num 7.7 output

7.7.1 从 ResNet 到 DenseNet

DenseNet 可以视为 ResNet 的逻辑扩展。

ResNet 将函数展开为 \(f(\boldsymbol{x})=x+g(\boldsymbol{x})\),即一个简单的线性项和一个复杂的非线性项。

若将 \(f\) 拓展成超过两部分,则 DenseNet 便是其中一种方案。这即是 DenseNet 和 ResNet 的主要区别。

image

DenseNet 这个名字由变量之间的“稠密连接”而得来。主要由两部分构成:

  • 稠密块:定义如何连接输入和输出。

  • 过渡层:控制通道数量,使其不会太复杂。

何为稠密连接?即最后一层与之前的所有层紧密相连,DenseNet 输出是连接执行从 \(\boldsymbol{x}\) 到其展开式的映射:

\[\boldsymbol{x}\to \left[\boldsymbol{x},f_1(\boldsymbol{x}),f_2([\boldsymbol{x},f_1(\boldsymbol{x})]),f_3([\boldsymbol{x},f_1(x),f_2([\boldsymbol{x},f_1(x)])],\dots)\right] \]

image

7.7.2 稠密块体

import torch
from torch import nn
from d2l import torch as d2l
def conv_block(input_channels, num_channels):
    return nn.Sequential(
        nn.BatchNorm2d(input_channels), nn.ReLU(),
        nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1))
class DenseBlock(nn.Module):
    def __init__(self, num_convs, input_channels, num_channels):
        super(DenseBlock, self).__init__()
        layer = []
        for i in range(num_convs):
            layer.append(conv_block(  # 输入通道按稠密连接调整
                num_channels * i + input_channels, num_channels))
        self.net = nn.Sequential(*layer)

    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            X = torch.cat((X, Y), dim=1)  # 连接通道维度上每个块的输入和输出
        return X
blk = DenseBlock(2, 3, 10)  # 会得到 3+2*10=23 通道数的输出
X = torch.randn(4, 3, 8, 8)
Y = blk(X)
Y.shape
torch.Size([4, 23, 8, 8])

7.7.3 过渡层

由于每个稠密块都会带来通道数的增加,由此会导致模型复杂化。可以使用过渡层来控制模型复杂度,它通过 \(1\times 1\) 的卷积层来减小通道数,并使用步幅为2的平均汇聚层加班班高度和宽度,从而降低模型复杂度。

def transition_block(input_channels, num_channels):
    return nn.Sequential(
        nn.BatchNorm2d(input_channels), nn.ReLU(),
        nn.Conv2d(input_channels, num_channels, kernel_size=1),
        nn.AvgPool2d(kernel_size=2, stride=2))
blk = transition_block(23, 10)  # 通道数缩减为10
blk(Y).shape
torch.Size([4, 10, 4, 4])

7.7.4 DenseNet 模型

b1 = nn.Sequential(  # b1 层和前面一样的
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
num_convs_in_dense_blocks = [4, 4, 4, 4]  # 使用4个稠密块,每个稠密块内使用4个卷积层
num_channels, growth_rate = 64, 32  # 增长率为32则每个稠密块增加4*32=128个通道
blks = []
for i, num_convs in enumerate(num_convs_in_dense_blocks):
    blks.append(DenseBlock(num_convs, num_channels, growth_rate))
    num_channels += num_convs * growth_rate  # 计算上一个稠密块的输出通道数作为下一个块的输入通道数
    if i != len(num_convs_in_dense_blocks) - 1:  # 在稠密块之间添加一个过渡层,使通道数量减半
        blks.append(transition_block(num_channels, num_channels // 2))
        num_channels = num_channels // 2
net = nn.Sequential(
    b1, *blks,
    nn.BatchNorm2d(num_channels), nn.ReLU(),
    nn.AdaptiveAvgPool2d((1, 1)),  # 最终使用全局汇聚层和全连接层输出结果
    nn.Flatten(),
    nn.Linear(num_channels, 10))

7.7.5 训练模型

lr, num_epochs, batch_size = 0.1, 10, 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())  # 大约需要十五分钟,慎跑
loss 0.140, train acc 0.948, test acc 0.914
865.0 examples/sec on cuda:0

image

练习

(1)为什么我们在过渡层使用平均汇聚层而不是最大汇聚层?

我觉得平均汇聚就像考虑所有特征,而最大汇聚就像只考虑最明显的特征。

过渡层如果只考虑最明显特征可能就会有特征损失掉。

不过实测差别似乎不大 。


(2)DenseNet 的优点之一是其模型参数比 ResNet 小。为什么呢?

X = torch.rand(size=(1, 1, 224, 224), device=d2l.try_gpu())
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)
Sequential output shape:	 torch.Size([1, 64, 56, 56])
DenseBlock output shape:	 torch.Size([1, 192, 56, 56])
Sequential output shape:	 torch.Size([1, 96, 28, 28])
DenseBlock output shape:	 torch.Size([1, 224, 28, 28])
Sequential output shape:	 torch.Size([1, 112, 14, 14])
DenseBlock output shape:	 torch.Size([1, 240, 14, 14])
Sequential output shape:	 torch.Size([1, 120, 7, 7])
DenseBlock output shape:	 torch.Size([1, 248, 7, 7])
BatchNorm2d output shape:	 torch.Size([1, 248, 7, 7])
ReLU output shape:	 torch.Size([1, 248, 7, 7])
AdaptiveAvgPool2d output shape:	 torch.Size([1, 248, 1, 1])
Flatten output shape:	 torch.Size([1, 248])
Linear output shape:	 torch.Size([1, 10])

可以看到过渡层的存在很好的抑制了输出通道数,同样的卷积层数,FenseNet 的层数始终没有超过256。


(3)DenseNet 一个诟病的问题是内存或显存消耗过多。

a. 真的是这样吗?可以把输入形状换成 $224\times 224$,来看看实际的显存消耗。

b. 还有其他方法来减少显存消耗吗?需要改变框架么?
net2 = nn.Sequential(
    b1, *blks,
    nn.BatchNorm2d(num_channels), nn.ReLU(),
    nn.AdaptiveAvgPool2d((1, 1)),  # 最终使用全局汇聚层和全连接层输出结果
    nn.Flatten(),
    nn.Linear(num_channels, 10))

lr, num_epochs, batch_size = 0.1, 10, 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)
# d2l.train_ch6(net2, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
# 别跑,一跑显存直接爆炸
# CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 4.00 GiB total capacity; 2.48 GiB already allocated; 109.80 MiB free; 2.61 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

(4)实现 DenseNet 论文表 1 所示的不同 DenseNet 版本。

image

def conv_block_121(input_channels, num_channels):
    return nn.Sequential(
        nn.BatchNorm2d(input_channels), nn.ReLU(),
        nn.Conv2d(input_channels, 4 * input_channels, kernel_size=1),  # 按原作加个BottleNeck
        nn.BatchNorm2d(4 * input_channels), nn.ReLU(),
        nn.Conv2d(4 * input_channels, num_channels, kernel_size=3, padding=1))

class DenseBlock_121(nn.Module):
    def __init__(self, num_convs, input_channels, num_channels):
        super(DenseBlock_121, self).__init__()
        layer = []
        for i in range(num_convs):
            layer.append(conv_block_121(
                num_channels * i + input_channels, num_channels))
        self.net = nn.Sequential(*layer)

    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            X = torch.cat((X, Y), dim=1)
        return X

b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

num_convs_in_dense_blocks_121 = [6, 12, 23, 16]
num_channels, growth_rate = 64, 32
blks_121 = []
for i, num_convs in enumerate(num_convs_in_dense_blocks_121):
    blks_121.append(DenseBlock_121(num_convs, num_channels, growth_rate))
    num_channels += num_convs * growth_rate
    if i != len(num_convs_in_dense_blocks_121) - 1:
        blks_121.append(conv_block_121(num_channels, num_channels // 2))
        num_channels = num_channels // 2

net3 = nn.Sequential(
    b1, *blks_121,
    nn.BatchNorm2d(num_channels), nn.ReLU(),
    nn.AdaptiveAvgPool2d((1, 1)),
    nn.Flatten(),
    nn.Linear(num_channels, 10))

lr, num_epochs, batch_size = 0.1, 10, 64
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
# d2l.train_ch6(net3, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
# 跑不了一点,batch_size都调到 64 了,还是爆显存,看看 shape 得了
# CUDA out of memory. Tried to allocate 90.00 MiB (GPU 0; 4.00 GiB total capacity; 2.49 GiB already allocated; 19.80 MiB free; 2.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

X = torch.rand(size=(1, 1, 224, 224))  # 好吧,看个 shape 都得 6.5 秒
for layer in net3:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)
Sequential output shape:	 torch.Size([1, 64, 56, 56])
DenseBlock_121 output shape:	 torch.Size([1, 256, 56, 56])
Sequential output shape:	 torch.Size([1, 128, 56, 56])
DenseBlock_121 output shape:	 torch.Size([1, 512, 56, 56])
Sequential output shape:	 torch.Size([1, 256, 56, 56])
DenseBlock_121 output shape:	 torch.Size([1, 992, 56, 56])
Sequential output shape:	 torch.Size([1, 496, 56, 56])
DenseBlock_121 output shape:	 torch.Size([1, 1008, 56, 56])
BatchNorm2d output shape:	 torch.Size([1, 1008, 56, 56])
ReLU output shape:	 torch.Size([1, 1008, 56, 56])
AdaptiveAvgPool2d output shape:	 torch.Size([1, 1008, 1, 1])
Flatten output shape:	 torch.Size([1, 1008])
Linear output shape:	 torch.Size([1, 10])

标签:稠密,nn,torch,shape,channels,Pytorch,num,7.7,output
From: https://www.cnblogs.com/AncilunKiang/p/17725574.html

相关文章

  • 《动手学深度学习 Pytorch版》 7.6 残差网络(ResNet)
    importtorchfromtorchimportnnfromtorch.nnimportfunctionalasFfromd2limporttorchasd2l7.6.1函数类如果把模型看作一个函数,我们设计的更强大的模型则可以看作范围更大的函数。为了使函数能逐渐靠拢到最优解,应尽量使函数嵌套,以减少不必要的偏移。如下图,更复......
  • 《动手学深度学习 Pytorch版》 7.5 批量规范化
    7.5.1训练深层网络训练神经网络的实际问题:数据预处理的方式会对最终结果产生巨大影响。训练时,多层感知机的中间层变量可能具有更广的变化范围。更深层的网络很复杂容易过拟合。批量规范化对小批量的大小有要求,只有批量大小足够大时批量规范化才是有效的。用\(\bol......
  • pytorch(3) code
      importtorchimportmatplotlib.pyplotasplttorch.manual_seed(10)lr=0.05#学习率#创建训练数据x=torch.rand(20,1)*10#xdata(tensor),shape=(20,1)#torch.randn(20,1)用于添加噪声y=2*x+(5+torch.randn(20,1))#ydata(tensor)......
  • Anaconda+GPU安装pytorch
    今天搞了半天,才安装上,各种版本问题。最后安装成功: 教程:2023最新pytorch安装教程,简单易懂,面向初学者(Anaconda+GPU)_时宇羽然的博客-CSDN博客......
  • PyTorch
    PyTorch是一个开源的机器学习框架,它提供了丰富的工具和函数来简化深度学习任务的开发和训练。PyTorch使用动态图模型,这意味着它可以在运行时动态构建计算图,这为研究人员和开发者提供了更大的灵活性和可调试性。下面是一些PyTorch的主要特点和功能:1.动态计算图:PyTorch使用动态计算......
  • 《动手学深度学习 Pytorch版》 7.3 网络中的网络(NiN)
    LeNet、AlexNet和VGG的设计模式都是先用卷积层与汇聚层提取特征,然后用全连接层对特征进行处理。AlexNet和VGG对LeNet的改进主要在于扩大和加深这两个模块。网络中的网络(NiN)则是在每个像素的通道上分别使用多层感知机。importtorchfromtorchimportnnfromd2limporttorch......
  • pytorch(5)
         模型复杂度......
  • pytorch学习了解
    importtorchvisionfrommodel1testimport*fromtorch.utils.dataimportDataLoaderfromtorch.utils.tensorboardimportSummaryWritertrian_data=torchvision.datasets.CIFAR10('./datasets',train=True,transform=torchvision.transforms.ToTensor())t......
  • 《动手学深度学习 Pytorch版》 7.2 使用块的网络(VGG)
    importtorchfromtorchimportnnfromd2limporttorchasd2l7.2.1VGG块AlexNet没有提供一个通用的模板来指导后续的研究人员设计新的网络,如今研究人员转向了块的角度思考问题。通过使用循环和子程序,可以很容易地在任何现代深度学习框架的代码中实现这些重复的架构。......
  • pytorch(3)损失函数
    1损失函数|Mean-SquaredLosshttps://zhuanlan.zhihu.com/p/35707643       2交叉熵损失函数https://www.zhihu.com/tardis/zm/art/35709485?source_id=1003                     ......