nn

计算图和autograd是定义复杂算子和自动求导的一个非常强大的范例；但是对于一些大型神经网络来说，原始的autograd可能有点低级。

在我们创建神经网络的时候，我们通常希望将其组织成一层一层的网络，以便进行运算和理解。这些网络层其中一些具有在模型学习过程中的可学习参数。

在TensorFlow中，像Keras，TensorFlow-Slim和TFLearn这样的包提供了对原始计算图的更高级别的抽象，这样构建神经网络变得更加的便捷。

当然在PyTorch中肯定也有这样的包了——nn，这个包定义了一组模块，这些模块大致相当于神经网络层。模块接收输入张量并计算输出张量，同时还可以保持内部状态，例如包含可学习参数的张量。nn还定义了一些在模型训练过程中常用的损失函数。

在这个例子中，我们使用nn继续来实现我们的$y=a+bx+cx^2+dx^3$到$sin(x)$的多项式模型网络：

import torch
import math

# 创建输入输出数据，这里是x和y代表[-π，π]之间的sin(x)的值
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 在这个例子里，输出y是一个关于(x, x^2, x^3)的线性函数
# 所以我们可以将其看做是一个线性神经网络
# 我们先准备好(x, x^2, x^3)的张量
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# 上边代码 x.unsqueeze(-1)之后，x形状变为(2000, 1)，p形状是(3,)。
# 这样才能用到pytorch的矩阵计算广播机制获得形状为(2000, 3)的张量 

# 使用nn定义我们的网络，网络就变成一个个层级结构
# nn.Sequential是一个包括其他模型的模型，将一层一层的网络或者模型按照顺序组织起来
# 线性模型使用线性函数从输入计算得到输出结果，并保持内部的weight和bias张量
# Flatten层是展开层，将输出转化为一维向量以适应y的形状

model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

# nn package里也包含一些常用的损失函数
# 在这里我们使用Mean Squared Error(MSE)作为我们的损失函数
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):

    # 前向过程：将x传递给模型，让模型算出预测的y
    # 模型已经写好了__call__操作，所以你可以像执行函数一样调用模型
    # 当你这样做的时候，你要给模型传递一个输入张量，模型就能给你生成一个输出张量
    y_pred = model(xx)

    # 计算并输出loss
    # 我们将预测值和真实值y传递给loss，loss函数会计算并返回一个包含loss结果的张量
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # 在下一轮梯度更新之前清空一下
    model.zero_grad()

    # 反向过程：计算模型loss关于可学习参数的梯度
    # 所有设置了requires_grad=True的张量的参数都被保留到张量中, 所以这个调用是更新所有的可学习参数
    loss.backward()

    # 使用梯度下降更新参数，每个参数都是一个张量，所以我们这一步和之前一样写就OK了
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# 你可以像使用python的列表一样，使用索引获得模型的不同层
linear_layer = model[0]

# 在线性层中，参数是存储在weight和bias里边的
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

结果：

99 1359.427978515625 199 903.5037841796875 299 601.5579223632812 399 401.5660705566406 499 269.087890625 599 181.32131958007812 699 123.16854858398438 799 84.63236999511719 899 59.091712951660156 999 42.16147994995117 1099 30.937110900878906 1199 23.49416160583496 1299 18.55787467956543 1399 15.28339958190918 1499 13.110761642456055 1599 11.668910026550293 1699 10.711766242980957 1799 10.076279640197754 1899 9.654217720031738 1999 9.373825073242188 Result: y = 0.007346955593675375 + 0.8348207473754883 x + -0.0012674720492213964 x^2 + -0.09021244943141937 x^3

优化

到目前为止，我们已经通过使用torch.no_grad()自行改变含有可学习参数的张量来更新我们的模型权重。对于随机梯度下降等简单优化算法来说，这样实现起来也不是很困难，但是在实际实践中，我们可能需要用到更复杂的优化器，比如AdaGrad、RMSProp、Adam等来训练神经网络。

PyTorch中的optim包抽象了一些优化算法的思想，并对其进行了实现，以供我们直接调用。

接下来的这个例子中，我们将使用nn来定义我们的模型，然后在这里不再自己写优化了，而是直接使用optim包提供的RMSprop算法来优化模型：

import torch
import math

# 创建输入输出数据，这里是x和y代表[-π，π]之间的sin(x)的值
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 在这个例子里，输出y是一个关于(x, x^2, x^3)的线性函数
# 所以我们可以将其看做是一个线性神经网络
# 我们先准备好(x, x^2, x^3)的张量
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# 使用nn定义我们的网络，网络就变成一个个层级结构
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

# 使用nn package提供的MSE loss
loss_fn = torch.nn.MSELoss(reduction='sum')

# 借助optim package定义一个优化器来更新我们模型的参数
# 在这里使用RMSprop优化
# optim package还有许多其他优化算法，感兴趣的自己去看文档
# RMSprop constructor的第一个参数是告诉优化器 需要去更新哪些张量
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

for t in range(2000):
    # 前向过程，给模型输入x，计算获得对应的预测的y值
    y_pred = model(xx)

    # 计算并输出loss
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # 在反向过程之前，使用优化器将所有待更新变量的梯度清零
    # 待更新变量指的就是模型的可学习参数
    # 因为模型情况下调用.backward()的时候不同步骤的梯度会进行积累，而不是每一步都重写
    # 详细原因感兴趣的自己去查一下torch.autograd.backward的文档
    optimizer.zero_grad()

    # 反向过程，计算loss关于模型参数的梯度
    loss.backward()

    # 调用优化器的step方法，让其更新参数
    optimizer.step()

linear_layer = model[0]
print(
    f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

输出结果：

99 5328.826171875 199 1865.436279296875 299 1073.1275634765625 399 827.1493530273438 499 666.9290161132812 599 517.2598876953125 699 384.5140380859375 799 275.61785888671875 899 190.3560028076172 999 125.3379898071289 1099 77.24494171142578 1199 43.95640563964844 1299 23.847721099853516 1399 13.23300552368164 1499 9.694506645202637 1599 8.976227760314941 1699 8.994707107543945 1799 8.907635688781738 1899 8.882875442504883 1999 8.901941299438477 Result: y = -6.768441807025738e-09 + 0.8572093844413757 x + -4.319689494991508e-09 x^2 + -0.09283572435379028 x^3

自定义nn Modules

通过前边的代码我们知道，我们可以使用pytorch提供的现有的网络层堆叠出我们自己的模型。但是有时候你可能需要更复杂的模型结构，不是简单的模块的堆叠，在这种时候你可以构建nn.Module的子类创造自己的模型结构，在其中定义好 forward 过程，接受输入张量，对其处理并得到输出张量，可以用到其他的模型或者autograd操作。

这个例子我们看一下怎么使用自定义的Module子类实现我们的三次多项式：

import torch
import math

class Polynomial3(torch.nn.Module):
    def __init__(self):
        # 在这个构造函数中我们复制四个参数
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        # 在forward函数中，我们接受一个输入张量，同时我们必须返回一个输出张量
        # 计算张量的过程中我们可以使用构造函数中的模块以及任意的运算
        return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3

    def string(self):
        # 和别的Python类一样，你可以在pytorch module中自定义方法
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'

# 创建输入输出数据，这里是x和y代表[-π，π]之间的sin(x)的值
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 实例化我们上边自己实现的类来构造我们的模型
model = Polynomial3()

# 构造损失函数和优化器
#调用 SDG中的model.parameters()，将会自动报包含模型的可学习参数
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
for t in range(2000):
    # 前向过程，给模型输入x，计算获得对应的预测的y值
    y_pred = model(x)

    # 计算并输出loss
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # 清空梯度，反向传播，更新参数！
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')

输出结果：

99 719.6984252929688 199 484.8442077636719 299 327.8070068359375 399 222.7333984375 499 152.38116455078125 599 105.24345397949219 699 73.6366195678711 799 52.42745590209961 899 38.184112548828125 999 28.61081886291504 1099 22.170948028564453 1199 17.835002899169922 1299 14.912996292114258 1399 12.941986083984375 1499 11.611226081848145 1599 10.711792945861816 1699 10.103262901306152 1799 9.691162109375 1899 9.411775588989258 1999 9.222151756286621 Result: y = 0.01420027855783701 + 0.8421593308448792 x + -0.002449784893542528 x^2 + -0.09125629812479019 x^3

流控制和权重共享

为了这个动态图和权重分享的例子，我们来实现一个比较奇怪的模型：实现一个3-5阶的多项式。每次前向过程都随机选一个3-5之间的整数，然后共享权重计算四阶和五阶多项式。

对于这个模型，我们可以使用普通的Python流控制来实现循环过程。对于实现权重共享，我们可以通过简单多次重用同样的参数来定义我们的前向过程。

我们可以通过Modele的子类实现这个模型：

import torch
import math


class DynamicNet(torch.nn.Module):
    def __init__(self):
        # 构造函数中我们要实例化五个参数
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        self.e = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        # 模型的前向过程，我们随机选择4或5，并重复使用e参数来计算这些阶的贡献。
        # 由于每个前向传递都构建一个动态计算图，所以在定义模型的前向传递时，
        # 我们可以使用普通的Python控制流操作符，如循环或条件语句。
        # 这里我们还看到，在定义计算图时，多次重复使用同一参数是完全安全的。
        y = self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3
        for exp in range(4, random.randint(4, 6)):
            y = y + self.e * x ** exp
        return y

    def string(self):
        # 和别的Python类一样，你可以在pytorch module中自定义方法
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3 + {self.e.item()} x^4 ? + {self.e.item()} x^5 ?'

# 创建输入输出数据，这里是x和y代表[-π，π]之间的sin(x)的值
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 实例化我们上边自己实现的类来构造我们的模型
model = DynamicNet()

# 构造我们的损失函数和优化器
# 训练这个奇怪的模型使用普通的梯度下降很困难，所以这里使用动量编码器
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-8, momentum=0.9)
for t in range(30000):
    # 前向过程，给模型输入x，计算获得对应的预测的y值
    y_pred = model(x)

    # 计算并输出loss
    loss = criterion(y_pred, y)
    if t % 2000 == 1999:
        print(t, loss.item())

    # 清空梯度，反向传播，更新参数！
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')

输出结果：

1999 899.025146484375 3999 414.2640075683594 5999 199.9871826171875 7999 98.45460510253906 9999 51.89286422729492 11999 29.33873748779297 13999 18.758146286010742 15999 13.488505363464355 17999 11.053787231445312 19999 9.995283126831055 21999 9.340238571166992 23999 9.19015884399414 25999 9.015266418457031 27999 8.884875297546387 29999 8.613398551940918 Result: y = 0.006221930030733347 + 0.8557982444763184 x + -0.0016655048821121454 x^2 + -0.09348824620246887 x^3 + 0.00011010735033778474 x^4 ? + 0.00011010735033778474 x^5 ? 》>

标签：loss,nn,self,torch,module,item,PyTorch,模型
From： https://blog.51cto.com/Lolitann/5967910

4个例子帮你梳理PyTorch的nn module

nn

优化

自定义nn Modules

流控制和权重共享

相关文章

赞助商

阅读排行