3. PyTorch主要组成模块（1）

标签：loss nn self torch init PyTorch 模块组成 size

3.1 构建PyTorch项目的基本配置

　　调用常用的库：

import os
import numpy as np
import torch
import torch.nn as nn # 常用的神经网络库
from torch.utils.data import Dataset,DataLoader # 数据集及读取器
import torch.optim as optimizer # 优化器库

　　常用超参数：

# 批大小
batch_size = 16
# 优化器的学习率
lr = 1e-4
# 训练次数
max_epochs = 100

　　使用GPU的设置方法：

# 方案一：os.environ，此方法不需要设置GPU
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1' # 使用0，1两块GPU
# 方案二：使用'device'，后续对要使用GPU的变量设置.to(device)即可
device  = torch.device("cuda:1" if torch.cuda.is_available() else cpu")

3.2 数据读入

　　Pytorch读入数据方式：Dataset + DataLoader：

　　Dataset定义好数据的格式和数据的变换形式；DataLoader用iterative的方式不断读入批次数据。

　　可自己定义Dataset类实现灵活的数据读取，定义的类需要继承Pytorch自身的Dataset类（父类）。定义主要包含三个函数：

__init__：用于向类中传入外部参数，同时定义样本集
__getitem__：用于逐个读取样本集合中的元素，可以进行一定的变换，并将返回训练/验证所需的数据
__len__：用于返回数据集的样本数

　　例1：以cifar10数据集为例给出构建Dataset方式：

import torch
from torchvision import datasets
train_data = datasets.ImageFolder(train_path, transform = data_transform)
val_data = datasets.ImageFolder(train_path,transform = data_transform)

　　这里使用了PyTorch自带的ImageFolder类的用于读取按一定结构存储的图片数据。

　　transform 可以设置对图像相应的操作，如：翻转、剪裁等

　　例2：图片存放在一个文件夹，另外有一个csv文件给出了图片名称对应的标签。这种情况下需要自己来定义Dataset类：

 1 class MyDataset(Dataset):
 2     def __init__(self, data_dir, info_csv, image_list, transform=None):
 3         """
 4         Args:
 5             data_dir: 图像目录的路径.
 6             info_csv: 包含带有相应标签的图像索引的 csv 文件的路径 8             image_list: txt 文件的路径包含训练/验证集的图像名称
 9             transform: 应用于样本的可选变换。
10         """
11         label_info = pd.read_csv(info_csv)
12         image_file = open(image_list).readlines()
13         self.data_dir = data_dir
14         self.image_file = image_file
15         self.label_info = label_info
16         self.transform = transform
17 
18     def __getitem__(self, index):
19         """
20         Args:
21             index: 项目的索引
22         Returns:
23             图像及其标签
24         """
25         image_name = self.image_file[index].strip('\n')
26         raw_label = self.label_info.loc[self.label_info['Image_index'] == image_name]
27         label = raw_label.iloc[:,0]
28         image_name = os.path.join(self.data_dir, image_name)
29         image = Image.open(image_name).convert('RGB')
30         if self.transform is not None:
31             image = self.transform(image)
32         return image, label
33 
34     def __len__(self):
35         return len(self.image_file)

　　注：iloc和.loc

　　联系:

　　(1)操作对象相同：对DataFrame类型进行操作；

　　(2)完成目的相同：用于选取DataFrame中对应行或列中的元素。

　　区别：

　　loc和iloc索引的行列标签类型不同。

　　iloc按位置进行基于整数位置的索引或者选择，而不能使用字符型的标签来索引数据；

　　loc按照标签或者索引、布尔值或者条件进行选择数据，能使用字符型的标签来索引数据；

　　使用DataLoader来按批次读入数据：

1 from torch.utils.data import DataLoader
2 train_loader = torch.utils.data.DataLoader(train_data, batch_size = batch_size, num_works = 4, shuffle = True, drop_last = True)
3 val_loader = torch.utils.data.DataLoader(val_data, batch_size = batch_size,num_works = 4, shuffle = False)

　　其中:

batch_size：样本是按“批”读入的，batch_size就是每次读入的样本数
num_workers：有多少个进程用于读取数据
shuffle：是否将读入的数据打乱
drop_last：对于样本最后一部分没有达到批次数的样本，使其不再参与训练

　　查看加载数据。Pytoch中的DataLoader的读取可以使用next和iter完成：

1 import matplotlib.pyplot as plt 
2 images, labels = next(iter(val_loader)) 
3 print(images.shape) 
4 plt.imshow(images[0].transpose(1,2,0)) 
5 plt.show()

3.3 模型构建

3.4.1 神经网络的构造

　　Module是nn模块里提供的一个模型构造类，是所有神经网络的基类。

　　例：构造多层感知机

import torch
from torch import nn
class MLP(nn.Module):
    #声明带有模型参数的层，这里声明了两个全连接层
    def __init__(self, **kwargs): # **kwargs关键字参数
        # 调用MLP父类Block的构造函数进行必要的初始化。这样构造实例时还可以指定其他函数
        super(MLP, self).__init__(**kwargs)
        self.hidden = nn.Linear(784, 256)
        self.act = nn.ReLU()
        self.output = nn.Linear(256, 10)

    # 定义正向传播：如何根据输入的x返回所需的模型输出
    def forward(self, x):
        o = self.act(self.hidden(x))
        return self.output(o)

　　以上的 MLP 类中⽆须定义反向传播函数。系统将通过⾃动求梯度⽽自动⽣成反向传播所需的 backward 函数。

　　实例化 MLP 类得到模型变量 net 。下⾯的代码初始化 net 并传入输⼊数据 X 做一次前向计算。其中， net(X) 会调用 MLP 继承⾃Module 类的 call 函数，这个函数将调⽤用 MLP 类定义的forward 函数来完成前向计算。

X = torch.rand(2, 784)
net = MLP()
print(net)
net(X)

输出：

MLP(
  (hidden): Linear(in_features=784, out_features=256, bias=True)
  (act): ReLU()
  (output): Linear(in_features=256, out_features=10, bias=True)
)
tensor([[ 0.0149, -0.2641, -0.0040,  0.0945, -0.1277, -0.0092,  0.0343,  0.0627,
         -0.1742,  0.1866],
        [ 0.0738, -0.1409,  0.0790,  0.0597, -0.1572,  0.0479, -0.0519,  0.0211,
         -0.1435,  0.1958]], grad_fn=<AddmmBackward>)

　　　注意：这里并没有将 Module 类命名为 Layer (层)或者 Model (模型)之类的名字，这是因为该类是一个可供⾃由组建的部件。

　　　　　　它的子类既可以是⼀个层(如PyTorch提供的 Linear 类)，⼜可以是一个模型(如这里定义的 MLP 类)，或者是模型的⼀个部分。

3.3.2 神经网络中常见的层

定义不含模型参数的层

　　下面构造的 MyLayer 类通过继承 Module 类自定义了一个将输入减掉均值后输出的层，并将层的计算定义在了 forward 函数里。这个层里不含模型参数。

import torch
from torch import nn

class MyLayer(nn.Module):
    def __init__(self, **kwargs):
        super(MyLayer, self).__init__(**kwargs)
    def forward(self, x):
        return x - x.mean()

　　测试，实例化该层，然后做前向计算

layer = MyLayer()
layer(torch.tensor([1, 2, 3, 4, 5], dtype=torch.float))

输出：

tensor([-2., -1.,  0.,  1.,  2.])

定义含模型参数的层

　　该模型参数可以通过训练学出。

　　Parameter类是Tensor的子类，如果Tensor是一个Parameter，那么他会自动被添加到模型的参数列表里。

　　所以在自定义含模型参数的层时，应该将参数定义成Parameter，除了直接定义成Parameter外，还可以使用ParameterList和ParameterDict分别定义参数的列表和字典。

class MyListDense(nn.Module):
    def __init__(self):
        super(MyListDense, self).__init__()
        self.params = nn.ParameterList([nn.Parameter(torch.randn(4,4)) for i in range(3)])
        self.params.append(nn.Parameter(torch.randn(4, 1)))
    
    def forward(self, x):
        for i in range(len(self.params)):
            x.torch.mm(x, self.params[i])
        return x
net = MyListDense()
print(net)

输出：

MyListDense(
  (params): ParameterList(
      (0): Parameter containing: [torch.FloatTensor of size 4x4]
      (1): Parameter containing: [torch.FloatTensor of size 4x4]
      (2): Parameter containing: [torch.FloatTensor of size 4x4]
      (3): Parameter containing: [torch.FloatTensor of size 4x1]
  )
)

class MyDictDense(nn.Module):
    def __init__(self):
        super(MyDictDense, self).__init__()   
        self.params = nn.ParameterDict({
　　　　　　　　　　'linear1':nn.Parameter(torch.randn(4, 4)),
　　　　　　　　　　'linear2':nn.Parameter(torch.randn(4, 4))})  
        self.params.update({'linear3':nn.Parameter(torch.randn(4, 2))})
    
    def forward(self, x, choice = 'linear1'):
        return torch.mm(x, self.params[choice])

net = MyDictDense()
print(net)

输出：

MyDictDense(
  (params): ParameterDict(
      (linear1): Parameter containing: [torch.FloatTensor of size 4x4]
      (linear2): Parameter containing: [torch.FloatTensor of size 4x4]
      (linear3): Parameter containing: [torch.FloatTensor of size 4x2]
  )
)

定义二维卷积层

　　二维卷积层将输入和卷积核做互相关运算，并加上一个标量偏差来得到输出。

　　卷积层的模型参数包括：卷积核和标量偏差。

　　在训练模型的时候，通常我们先对卷积核随机初始化，然后不断迭代卷积核和偏差。

# 卷积运算（二维相关）
def corr2d(X, K):
    h, w = K.shape
    X, K = X.float(), K.float()
    Y = torch.zeros((X.shape[0] - h +1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i: i+h, j: j+w] * K).sum()
    return Y

# 二维卷积层
class Conv2D(nn.Module):
    def __init__(self, kernel_size):
        super(Conv2D, self).__init__()
        self.weight = nn.Parameter(torch.randn(kernel_size))
        self.bias = nn.Parameter(torch.randn(1))
    
    def forward(self, x):
        return    corr2d(x, self.weight) + self.bias

　　卷积窗口形状为p×q的卷积层称为p×q卷积层。同样，p×q卷积或p×q卷积核说明卷积核的高和宽分别为p和q。

　　填充(padding)是指在输⼊高和宽的两侧填充元素(通常是0元素)。

　　下面的例子里我们创建一个⾼和宽为3的二维卷积层，然后设输⼊高和宽两侧的填充数分别为1。给定一个高和宽为8的输入，我们发现输出的高和宽也是8。

# 定义一个函数来计算卷积层。它对输入和输出做相应的升维和降维
def comp_conv2d(conv2d, X):
    # (1, 1)代表批量大小和通道数
    X = X.view((1, 1) + x.shape)
    Y = conv2d(X)
    return Y.view(Y.shape[2:])# 排除不关心的前两维:批量和通道
# 注意这里是两侧分别填充1⾏或列，所以在两侧一共填充2⾏或列
conv2d = nn.Conv2d(in_channels = 1, out_channels = 1, kernel_size = 3, padding = 1)

X = torch.rand(8, 8)
comp_conv2d(conv2d, X).shape

输出：

torch.Size([8, 8])

　　当卷积核的高和宽不同时，我们也可以通过设置高和宽上不同的填充数使输出和输入具有相同的高和宽。

# 使用高为5、宽为3的卷积核。在⾼和宽两侧的填充数分别为2和1
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(5, 3), padding=(2, 1))
comp_conv2d(conv2d, X).shape

输出：

torch.Size([8, 8])

　　在二维互相关运算中，卷积窗口从输入数组的最左上方开始，按从左往右、从上往下的顺序，依次在输⼊数组上滑动。我们将每次滑动的行数和列数称为步幅(stride)。

conv2d = nn.Conv2D(1, 1, kernel_size = (3, 5), padding = (0, 1), stride = (3, 4))
comp_conv2d(conv2d, X).shape

输出：

torch.Size([2, 2])

　　填充可以增加输出的高和宽。这常用来使输出与输入具有相同的高和宽。

　　步幅可以减小输出的高和宽，例如输出的高和宽仅为输入的高和宽的 ( 为大于1的整数)。

池化层

　　池化层每次对输入数据的一个固定形状窗口(⼜称池化窗口)中的元素计算输出。

　　不同于卷积层里计算输⼊和核的互相关性，池化层直接计算池化窗口内元素的最大值（最大池化）或者平均值（平均池化）。

　　在二维最⼤池化中，池化窗口从输入数组的最左上方开始，按从左往右、从上往下的顺序，依次在输⼊数组上滑动。当池化窗口滑动到某⼀位置时，窗口中的输入子数组的最大值即输出数组中相应位置的元素。

def pool2d(X, pool_size, mode='max'):
    p_h, p_w = pool_size
    Y = torch.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if mode == 'max':
                 Y[i, j] = X[i: i + p_h, j: j + p_w].max()
            elif mode == 'avg':
                Y[i, j] = X[i: i + p_h, j: j + p_w].mean()
    return Y

X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]], dtype=torch.float)
pool2d(X, (2, 2))

输出：

tensor([[4., 5.],
	[7., 8.]])

pool2d(X, (2, 2), 'avg')

输出：

tensor([[2., 3.],
	[5., 6.]])

　　我们可以使用torch.nn包来构建神经网络。nn包依赖于autograd包来定义模型并对它们求导。一个nn.Module包含各个层和一个forward(input)方法，该方法返回output。

3.3.3 模型示例

LeNet

　　这是一个简单的前馈神经网络 (feed-forward network）（LeNet）。它接受一个输入，然后将它送入下一层，一层接一层的传递，最后给出输出。

　　一个神经网络的典型训练过程如下：

定义包含一些可学习参数(或者叫权重）的神经网络
在输入数据集上迭代
通过网络处理输入
计算 loss (输出和正确答案的距离）
将梯度反向传播给网络的参数
更新网络的权重，一般使用一个简单的规则：weight = weight - learning_rate * gradient

　　LeNet代码如下：

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 输入图像channel：1；输出channel：6；5x5卷积核
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # 2x2 Max pooling
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # 如果是方阵,则可以只使用一个数字进行定义
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # 除去批处理维度的其他所有维度
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)

输出：

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

　　注：在 forward 函数中使用任何针对张量的操作和计算。　　　

　　一个模型的可学习参数可以通过net.parameters()返回

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1的权重

输出：
10
torch.Size([6, 1, 5, 5])

# 注意:这个网络 (LeNet）的期待输入是 32x32 的张量。
input = torch.randn(1, 1, 32, 32) # 输入
out = net(input) # 输出
# 清零所有参数的梯度缓存，然后进行随机梯度的反向传播
net.zero_grad() # 梯度清零
out.backward(torch.randn(1, 10))

　　注意：torch.nn只支持小批量处理 (mini-batches）。整个 torch.nn 包只支持小批量样本的输入，不支持单个样本的输入。
　　比如，nn.Conv2d 接受一个4维的张量，即nSamples x nChannels x Height x Width 。如果是一个单独的样本，只需要使用input.unsqueeze(0) 来添加一个“假的”批大小维度。

torch.Tensor - 一个多维数组，支持诸如backward()等的自动求导操作，同时也保存了张量的梯度。
nn.Module - 神经网络模块。是一种方便封装参数的方式，具有将参数移动到GPU、导出、加载等功能。
nn.Parameter - 张量的一种，当它作为一个属性分配给一个Module时，它会被自动注册为一个参数。
autograd.Function - 实现了自动求导前向和反向传播的定义，每个Tensor至少创建一个Function节点，该节点连接到创建Tensor的函数并对其历史进行编码。

AlexNet

class AlexNet(nn.Module):
    def __init__(self):
        super(AlexNet, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 96, 11, 4), # in_channels, out_channels, kernel_size, stride, padding
            nn.ReLU(), # 注意这里的激活函数定义位置和Lenet的区别
            nn.MaxPool2d(3, 2), # kernel_size, stride
            # 减小卷积窗口，使用填充为2来使得输入与输出的高和宽一致，且增大输出通道数
            nn.Conv2d(96, 256, 5, 1, 2),
            nn.ReLU(),
            nn.MaxPool2d(3, 2),
            # 连续3个卷积层，且使用更小的卷积窗口。除了最后的卷积层外，进一步增大了输出通道数。
            # 前两个卷积层后不使用池化层来减小输入的高和宽
            nn.Conv2d(256, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 256, 3, 1, 1),
            nn.ReLU(),
            nn.MaxPool2d(3, 2)
        )
         # 这里全连接层的输出个数比LeNet中的大数倍。使用丢弃层来缓解过拟合
        self.fc = nn.Sequential(
            nn.Linear(256*5*5, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            # 输出层。由于这里使用Fashion-MNIST，所以用类别数为10，而非论文中的1000
            nn.Linear(4096, 10),
        )

    def forward(self, img):
        feature = self.conv(img)
        output = self.fc(feature.view(img.shape[0], -1))
        return output
net = AlexNet()
print(net)

输出：

AlexNet(
  (conv): Sequential(
    (0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU()
    (8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU()
    (10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU()
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=6400, out_features=4096, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=10, bias=True)
  )
)

3.4 模型初始化

　　torch.nn.init内容

　　torch.nn.init提供了以下初始化方法：

1 . torch.nn.init.uniform_(tensor, a=0.0, b=1.0)

2 . torch.nn.init.normal_(tensor, mean=0.0, std=1.0)

3 . torch.nn.init.constant_(tensor, val)

4 . torch.nn.init.ones_(tensor)

5 . torch.nn.init.zeros_(tensor)

6 . torch.nn.init.eye_(tensor)

7 . torch.nn.init.dirac_(tensor, groups=1)

8 . torch.nn.init.xavier_uniform_(tensor, gain=1.0)

9 . torch.nn.init.xavier_normal_(tensor, gain=1.0)

10 . torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan__in', nonlinearity='leaky_relu')

11 . torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')

12 . torch.nn.init.orthogonal_(tensor, gain=1)

13 . torch.nn.init.sparse_(tensor, sparsity, std=0.01)

14 . torch.nn.init.calculate_gain(nonlinearity, param=None)

这些函数除了calculate_gain，所有函数的后缀都带有下划线，意味着这些函数将会直接原地更改输入张量的值。

　　关于计算增益如下表：

nonlinearity	gain
Linear/Identity	1
Conv{1,2,3}D	1
Sigmod	1
Tanh	5/3
ReLU	sqrt(2)
Leaky Relu	sqrt(2/1+neg_slop^2)

　　torch.nn.init使用

　　根据实际模型来使用torch.nn.init进行初始化，通常使用isinstance来进行判断模块（回顾3.3模型构建）属于什么类型。

import torch
import torch.nn as nn

conv = nn.Conv2d(1,3,3)
linear = nn.Linear(10,1)

isinstance(conv,nn.Conv2d) # True #会判断父类与子类的关系，而type不会 
isinstance(linear,nn.Conv2d) # False

　　对于不同的类型层，可以设置不同的权值初始化的方法。

# 查看随机初始化的conv参数
conv.weight.data
# 查看linear的参数
linear.weight.data

输出：

tensor([[[[ 0.1174,  0.1071,  0.2977],
          [-0.2634, -0.0583, -0.2465],
          [ 0.1726, -0.0452, -0.2354]]],
        [[[ 0.1382,  0.1853, -0.1515],
          [ 0.0561,  0.2798, -0.2488],
          [-0.1288,  0.0031,  0.2826]]],
        [[[ 0.2655,  0.2566, -0.1276],
          [ 0.1905, -0.1308,  0.2933],
          [ 0.0557, -0.1880,  0.0669]]]])

tensor([[-0.0089,  0.1186,  0.1213, -0.2569,  0.1381,  0.3125,  0.1118, -0.0063, -0.2330,  0.1956]])

# 对conv进行kaiming初始化
torch.nn.init.kaiming_normal_(conv.weight.data)
# 对linear进行常数初始化
torch.nn.init.constant_(linear.weight.data, 0.3)
linear.weight.data

输出：

tensor([[[[ 0.3249, -0.0500,  0.6703],
          [-0.3561,  0.0946,  0.4380],
          [-0.9426,  0.9116,  0.4374]]],
        [[[ 0.6727,  0.9885,  0.1635],
          [ 0.7218, -1.2841, -0.2970],
          [-0.9128, -0.1134, -0.3846]]],
        [[[ 0.2018,  0.4668, -0.0937],
          [-0.2701, -0.3073,  0.6686],
          [-0.3269, -0.0094,  0.3246]]]])
tensor([[0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000, 0.3000,0.3000]])

　　初始化函数的封装

　　将各种初始化方法定义为一个initialize_weights()的函数并在模型初始化后进行使用

def initialize_weights(self):
    for m in self.modules():
        # 判断是否属于Conv2d
        if isinstance(m, nn.Conv2d):
            torch.nn.init.xavier_normal_(m.weight.data)
            if m.bias is not None:
                torch.nn.init.constant_(m.bias.data, 0.3)
        elif isinstance(m, nn.Linear):
            torch.nn.init.normal_(m.weight.data, 0.1)
            if m.bias is not None:
                torch.nn.init.zeros_(m.bias.data)
        elif isinstance(m, nn.BatchNorm2d):
            m.weight.data.fill_(1)
            m.bias.data.zeros_()

　　上述代码流程是遍历当前模型的每一层，然后判断各层属于什么类型，然后根据不同类型层，设定不同的权值初始化方法。

　　例子：

# 模型定义
class MLP(nn.Modeule):
    # 声明带有模型参数的层，这里声明了两个全连接层
    def __init__(self, **kwargs):
        # 调用MLP父类Block的构造函数来进行必要的初始化。这样在构造实例时还可以指定其他函数
        super(MLP, self).__init__(**kwargs)
        self.hidden = nn.Conv2d(1, 1, 3)
        self.act = nn.ReLU()
        self.output = nn.Linear(10, 1)
    # 定义模型的前向计算，即如何根据输入x计算返回所需要的模型输出
    def forward(self, x):
        o = self.act(self.hidden(x))
        return self.output(o)

mlp = MLP()
print(list(nlp.parameters()))
print("-------初始化-------")

initialize_weight(mlp)
print(list(mlp.parameters()))

输出：

[Parameter containing:
tensor([[[[ 0.2103, -0.1679,  0.1757],
          [-0.0647, -0.0136, -0.0410],
          [ 0.1371, -0.1738, -0.0850]]]], requires_grad=True), Parameter containing:
tensor([0.2507], requires_grad=True), Parameter containing:
tensor([[ 0.2790, -0.1247,  0.2762,  0.1149, -0.2121, -0.3022, -0.1859,  0.2983,
         -0.0757, -0.2868]], requires_grad=True), Parameter containing:
tensor([-0.0905], requires_grad=True)]
"-------初始化-------"
[Parameter containing:
 tensor([[[[-0.3196, -0.0204, -0.5784],
           [ 0.2660,  0.2242, -0.4198],
           [-0.0952,  0.6033, -0.8108]]]], requires_grad=True),
 Parameter containing:
 tensor([0.3000], requires_grad=True),
 Parameter containing:
 tensor([[ 0.7542,  0.5796,  2.2963, -0.1814, -0.9627,  1.9044,  0.4763,  1.2077,
           0.8583,  1.9494]], requires_grad=True),
 Parameter containing:
 tensor([0.], requires_grad=True)]

3.5 损失函数

　　损失函数：也称模型的负反馈，是数据输入到模型当中，产生的结果与真实标签的评价指标，我们的模型可以按照损失函数的目标来做出改进。

3.5.1 二分类交叉熵损失函数

torch.nn.BCELoss(weight = None, size_average = None, reduce = None, reduction = 'mean')

功能：计算二分类任务时的交叉熵（Cross Entropy）函数。在二分类中，label是{0,1}。对于进入交叉熵函数的input为概率分布的形式。一般来说，input为sigmoid激活层的输出，或者softmax的输出。

主要参数：

　　weight: 每个类别的loss设置权值

　　size_average: 数据为bool，为True时，返回的loss为平均值；为False时，返回的各样本的loss之和。

　　reduce: 数据类型为bool，为True时，loss的返回是标量。

计算公式如下：

代码：

m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3, requires_grad = True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
out.backward()
print('BCELoss损失函数的计算结果为',output)

输出：

BCELoss损失函数的计算结果为 tensor(0.5732, grad_fn=<BinaryCrossEntropyBackward>)

3.5.2 交叉熵损失函数

torch.nn.CrossEntropyLoss(weight = None, size_average=None, ignore_index = -100, reduce = None, reduction='mean')

功能：计算交叉熵函数

主要参数：

　　ignore_index：忽略某个类的损失函数

计算公式：

loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target) # 注意与BCELoss()输入的区别，它不需要为sigmoid或softmax的输出
output.backward()
print(output)

输出：

tensor(2.0115, grad_fn=<NllLossBackward>)

3.5.3 L1损失函数

torch.nn.L1Loss(size_average=None, reduce = None, reduction='mean')

功能： 计算输出y和真实标签target之间的差值的绝对值。

reduction参数决定了计算模式：

none：逐个元素计算
sum：所有元素求和，返回标量。
mean：加权平均，返回标量。（默认）
选择none，那么返回的结果是和输入元素相同尺寸的。

计算公式：

适用范围：

回归任务
简单的模型
由于神经网络通常是解决复杂问题，所以很少使用。

3.5.4 MSE损失函数

torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')

功能： 计算输出y和真实标签target之差的平方。

计算公式：

适用范围：

回归任务
数值特征不大
问题维度不高

3.5.5 平滑L1 (Smooth L1)损失函数

torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction = 'mean', beta=1.0)

功能： L1的平滑输出，其功能是减轻离群点带来的影响

计算公式如下：

原理：

　　SmoothL1Loss其实是L2Loss和L1Loss的结合 ，它同时拥有L2 Loss和L1 Loss的部分优点。　　

当输出y和真实标签target差异较小的时候（绝对值差小于1），梯度不至于太大。（损失函数相较L1 Loss比较圆滑）（L2 Loss）
当差别大的时候，梯度值足够小（较稳定，不容易梯度爆炸）（L1 Loss的平移。）。

平滑L1与L1的对比：

　　对于smoothL1来说，在 0 这个尖端处，过渡更为平滑。

适用范围：

回归
当特征中有较大的数值
适合大多数问题

3.5.6 目标泊松分布的负对数似然损失

torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-8, reduce=None, reduction='mean')

功能：泊松分布的负对数似然损失函数

主要参数：

　　log_input：输入是否为对数形式，决定计算公式。

　　full：计算所有 loss，默认为 False。

　　eps：修正项，避免 input 为 0 时，log(input) 为 nan 的情况。

数学公式：

当参数log_input=True：

当参数log_input=False：

3.5.7 KL散度

torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean',log_target=False)

功能： 计算KL散度，也就是计算相对熵。用于不同的连续分布的距离度量，并且对离散采用的连续输出空间分布进行回归通常很有用。

主要参数:

　　reduction：计算模式，可为 none/sum/mean/batchmean。

none：逐个元素计算。
sum：所有元素求和，返回标量。
mean：加权平均，返回标量。
batchmean：batchsize 维度求平均值。

计算公式：

3.5.8 MarginRankingLoss

torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

功能： 计算两个向量之间的相似度，用于排序任务。该方法用于计算两组数据之间的差异。

主要参数:

　　margin：边界值，x1和x2之间的差异值

计算公式：

loss = nn.MarginRankingLoss()
input1 = torch.randn(3, requires_grad=True)
input2 = torch.randn(3, requires_grad=True)
target = torch.randn(3).sign()
output = loss(input1, input2, target)##Loss支持三个变量
output.backward()

3.5.9 多标签边界损失函数

torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')

功能： 对于多标签分类问题计算损失函数。

计算公式：

loss = nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.9, 0.2, 0.4, 0.8]])
# 对于目标y，只考虑标签3和0，而不是标签-1之后
y = torch.LongTensor([[3, 0, -1, 1]])# 真实的分类是，第3类和第0类
output = loss(x, y)

3.5.10 二分类损失函数

torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')

功能： 计算二分类的 logistic 损失。

计算公式：

inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]])  # 两个样本，两个神经元
target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)  # 该 loss 为逐个神经元计算，需要为每个神经元单独设置标签

loss_f = nn.SoftMarginLoss()
output = loss_f(inputs, target)

3.5.11 多分类的折页损失

torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=1.0, size_average=None, reduce=None, reduction='mean')

功能： 计算多分类的折页损失

主要参数:

　　p：可选 1 或 2。

计算公式：

3.5.12 三元组损失

torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-6, swap=False, size_average=None, reduce=None, reduction='mean')

功能： 计算三元组损失。

三元组: 是一种数据的存储或者使用格式。<实体1，关系，实体2>。在项目中，也可以表示为< anchor, positive examples , negative examples>

　　　在这个损失函数中，我们希望让anchor的距离更接近positive examples，而远离negative examples

计算公式：

triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
anchor = torch.randn(100, 128, requires_grad=True)
positive = torch.randn(100, 128, requires_grad=True)
negative = torch.randn(100, 128, requires_grad=True)
output = triplet_loss(anchor, positive, negative)
output.backward()

3.5.13 HingEmbeddingLoss

torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')

功能： 对输出的embedding结果做Hing损失计算

计算公式：

注：输入x应为两个输入之差的绝对值。

公式理解：输出的是正例yn=1,那么loss就是x，如果输出的是负例y=-1，那么输出的loss就是要做一个比较。

3.5.14 余弦相似度

torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')

功能： 对两个向量做余弦相似度，将余弦相似度作为一个距离的计算方式，如果两个向量的距离近，则损失函数值小，反之亦然。

主要参数:

　　margin：可取值[-1,1] ，推荐为[0,0.5] 。

计算公式：

3.5.15 CTC损失函数

torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)

功能： 用于解决时序类数据的分类。计算连续时间序列和目标序列之间的损失。

　　CTCLoss对输入和目标的可能排列的概率进行求和，产生一个损失值，这个损失值对每个输入节点来说是可分的。输入与目标的对齐方式被假定为 "多对一"，限制 目标序列的长度 ≤ 输入长度。

主要参数:

　　blank：空白标签所在的label值，默认为0，需要根据实际的标签定义进行设定；

　　zero_infinity：无穷大的值或梯度值

# 目标被填充
T = 50      # 输入序列长度
C = 20      # 类别数数（包括空白（blank））
N = 16      # Batch size
S = 30      # 批处理中最长目标的目标序列长度（填充长度）)
S_min = 10  # 最小目标长度，用于演示目的

# 初始化输入向量的随机批次，对于 *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()

# 初始化随机批次的目标（0 = 空白，1:C = 类）
target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)

input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()


# 目标是未填充的
T = 50      # Input sequence length
C = 20      # Number of classes (including blank)
N = 16      # Batch size

# Initialize random batch of input vectors, for *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)

# Initialize random batch of targets (0 = blank, 1:C = classes)
target_lengths = torch.randint(low=1, high=T, size=(N,), dtype=torch.long)
target = torch.randint(low=1, high=C, size=(sum(target_lengths),), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()

3.7 训练和评估

　　训练模型首先应该设置模型的状态：训练或测试。

　　训练：模型的参数支持反向传播的修改

　　测试：不应该修改模型参数

model.train() #训练状态
model.eval()  # 测试状态

## 在训练过程中使用for循环读取DataLoader中的全部数据。
for data, label in train_loader:
    # 使用GPU，将数据放到GPU上进行计算
    data, label = data.cuda(), label.cuda()
    # 开始用当前批次数据做训练时，应当先将优化器的梯度置零
    optimizer.zero_grad()
    # 送入模型训练
    output = model(data)
    # 计算loss
    loss = criterion(output, label)
    # 反向传播
    loss.backward()
    # 使用优化器进行参数优化和更新
    optimizer.step()

　　验证/测试的流程基本与训练过程一致，不同点在于：

需要预先设置torch.no_grad，以及将model调至eval模式
不需要将优化器的梯度置零
不需要将loss反向回传到网络
不需要更新optimizer

　　一个完整的图像分类的训练过程如下所示：

def train(epoch):
    model.train()# 训练模式
    train_loss = 0
    for data, label in train_loader:
        data, label = data.cuda(), label.cuda()
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(label, output)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()*data.size(0)
    train_loss = train_loss/len(train_loader.dataset)
        print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, train_loss))

　　一个完整图像分类的验证过程如下所示：

def val(epoch):       
    model.eval() # 测试模式
    val_loss = 0
    with torch.no_grad():# 测试过程不需要反向传播，不需要构建计算图，该语句强制之后的内容不进行计算图构建。
        for data, label in val_loader:
            data, label = data.cuda(), label.cuda()
            output = model(data)
            preds = torch.argmax(output, 1)
            loss = criterion(output, label)
            val_loss += loss.item()*data.size(0)
            running_accu += torch.sum(preds == label.data)
    val_loss = val_loss/len(val_loader.dataset)
    print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, val_loss))

标签：loss,nn,self,torch,init,PyTorch,模块,组成,size
From： https://www.cnblogs.com/5466a/p/16597592.html

3. PyTorch主要组成模块（1）

3.1 构建PyTorch项目的基本配置

3.2 数据读入

3.3 模型构建

3.4.1 神经网络的构造

3.3.2 神经网络中常见的层

3.3.3 模型示例

3.4 模型初始化

torch.nn.init内容

torch.nn.init使用

初始化函数的封装

3.5 损失函数

3.5.1 二分类交叉熵损失函数

3.5.2 交叉熵损失函数

3.5.3 L1损失函数

3.5.4 MSE损失函数

3.5.5 平滑L1 (Smooth L1)损失函数

3.5.6 目标泊松分布的负对数似然损失

3.5.7 KL散度

3.5.8 MarginRankingLoss

3.5.9 多标签边界损失函数

3.5.10 二分类损失函数

3.5.11 多分类的折页损失

3.5.12 三元组损失

3.5.13 HingEmbeddingLoss

3.5.14 余弦相似度

3.5.15 CTC损失函数

3.7 训练和评估

相关文章

赞助商

阅读排行