首页 > 编程问答 >用 pytorch 从零开始实现单隐层 MLP

用 pytorch 从零开始实现单隐层 MLP

时间:2024-06-01 10:50:01浏览次数:31  
标签:pytorch mlp

我的代码如下:

import torch
from torchvision import transforms
from torch.utils import data
导入 torchvision


#==============load 数据集
def get_dataloader_workers():
    返回 4


def load_data_fashion_mnist(batch_size,resize=None):
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0,transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(
        root="../data",train=True,transform=trans,download=True)
    mnist_test = torchvision.datasets.FashionMNIST(
        root="../data", train=False, transform=trans, download=True)
    return (data.DataLoader(mnist_train,batch_size,shuffle=True,num_workers=get_dataloader_workers())、
            data.DataLoader(mnist_test,batch_size,shuffle=False,num_workers=get_dataloader_workers()))

batch_size = 256
train_iter,test_iter = load_data_fashion_mnist(batch_size)

#=============model 参数
num_inputs, num_outputs, num_hiddens = 784,10,256

# 参数初始化
W1 = torch.randn((num_inputs,num_hiddens),requires_grad=True)*0.01
b1 = torch.zeros(num_hiddens,requires_grad=True)

W2 = torch.randn((num_hiddens,num_outputs),requires_grad=True)*0.01
b2 = torch.zeros(num_outputs,requires_grad=True)


#===========activation func
def relu(X):
    a = torch.zeros_like(X)
    return torch.max(X,a)

def softmax(X):
    Exp_x = torch.exp(X- X.max(dim=1, keepdim=True)[0])
    partition = Exp_x.sum(axis=1,keepdim=True)
    返回 Exp_x / partition

#===========model
def mlp(X):
    """接收输入数据""";
    X = X.reshape((-1,num_inputs))#展平图像为 784 的长度
    H = relu(torch.matmul(X,W1) + b1) #隐藏层
    O = softmax(torch.matmul(H,W2) + b2)

    返回 O

def cross_entropy_loss(y_hat,y):
    return -torch.log(y_hat[range(len(y_hat)),y])


def sgd(params,lr,batch_size):
    """minibatch sgd""";
    使用 torch.no_grad():
        for params 中的 param
            param -= lr*param.grad/batch_size
            参数.grad.zero_()


lr = 0.03
num_epochs = 3
net = mlp
损耗 = 交叉熵损耗

def train():
    for epoch in range(num_epochs):
        for X,y in train_iter:

            y_hat = net(X)
            print(y_hat[0])
            print(y_hat.shape)
            print(y)

            l = loss(y_hat,y)
            print("The shape of loss: ",l.shape)

            l.sum().backward()
            assert b2.grad is not None,"b2.grad is none";
            断言 W2.grad 不是 None,"W2.grad 为 None";

            sgd([W1,b1,W2,b2],lr,batch_size)

        用 torch.no_grad():
            train_l = loss(net(X),y).mean()
            print(f' epoch {epoch+1}, loss {float(train_l.mean()):f}')

if __name__=="__main__":
    train()
 当我运行代码时,出现了异常在此输入图片描述
但是我检查了前向传递和后向传递,仍然让我感到困惑。

有谁能帮助修复这段代码?请帮帮我!

我试着打印 b2.grad,它正常工作;

但是当我打印 W2.grad 时,却出现了 W2.grad is none 异常


The error message "W2.grad is none" arises because you are only performing the forward pass on your training data once per epoch , and then you calculate the gradients on that same batch. Since W2.grad is dependent on the forward pass through the network, it only gets calculated once per epoch, leading to the error when you try to access it on subsequent batches.

Here's a breakdown of the issue and how to fix it:

Problem:

  1. Single Forward Pass Per Epoch: You only call net(X) once within the inner loop of your training loop. This means the forward pass and subsequent gradient calculation happen only once per epoch.
  2. Stale Gradients: When you call sgd([W1, b1, W2, b2], lr, batch_size) , you are using gradients calculated from the very first batch of that epoch for all subsequent batches in that epoch.

Solution:

  1. Move Forward Pass and Loss Calculation Inside the Batch Loop: You need to calculate the forward pass, the loss, and the gradients for each batch in your training data. This ensures the gradients are calculated and applied correctly for each batch.

Here's the corrected code:

import torch
from torchvision import transforms
from torch.utils import data
import torchvision

# ==============load 数据集
def get_dataloader_workers():
    return 4


def load_data_fashion_mnist(batch_size, resize=None):
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(
        root="../data", train=True, transform=trans, download=True
    )
    mnist_test = torchvision.datasets.FashionMNIST(
        root="../data", train=False, transform=trans, download=True
    )
    return (
        data.DataLoader(
            mnist_train, batch_size, shuffle=True, num_workers=get_dataloader_workers()
        ),
        data.DataLoader(
            mnist_test, batch_size, shuffle=False, num_workers=get_dataloader_workers()
        ),
    )


batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)

# =============model 参数
num_inputs, num_outputs, num_hiddens = 784, 10, 256

# 参数初始化
W1 = torch.randn((num_inputs, num_hiddens), requires_grad=True) * 0.01
b1 = torch.zeros(num_hiddens, requires_grad=True)

W2 = torch.randn((num_hiddens, num_outputs), requires_grad=True) * 0.01
b2 = torch.zeros(num_outputs, requires_grad=True)


# ===========activation func
def relu(X):
    a = torch.zeros_like(X)
    return torch.max(X, a)


def softmax(X):
    Exp_x = torch.exp(X - X.max(dim=1, keepdim=True)[0])
    partition = Exp_x.sum(axis=1, keepdim=True)
    return Exp_x / partition


# ===========model
def mlp(X):
    """接收输入数据"""
    X = X.reshape((-1, num_inputs))  # 展平图像为 784 的长度
    H = relu(torch.matmul(X, W1) + b1)  # 隐藏层
    O = softmax(torch.matmul(H, W2) + b2)

    return O


def cross_entropy_loss(y_hat, y):
    return -torch.log(y_hat[range(len(y_hat)), y])


def sgd(params, lr, batch_size):
    """minibatch sgd"""
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()


lr = 0.03
num_epochs = 3
net = mlp
loss = cross_entropy_loss


def train():
    for epoch in range(num_epochs):
        for X, y in train_iter:
            # ** Calculate forward pass, loss, and gradients for each batch **
            y_hat = net(X)
            l = loss(y_hat, y)

            # Backward pass and optimization 
            l.sum().backward()
            sgd([W1, b1, W2, b2], lr, batch_size) 

        with torch.no_grad():
            # Calculate training loss at the end of each epoch 
            train_l = loss(net(X), y).mean()
            print(f" epoch {epoch+1}, loss {float(train_l.mean()):f}")


if __name__ == "__main__":
    train()

Explanation of Changes:

  • Forward Pass and Loss Calculation Within Batch Loop: The code now calculates y_hat and l (the loss) inside the inner loop, ensuring that gradients are calculated for every batch.
  • Gradient Calculation and Optimization: The l.sum().backward() and sgd(...) calls remain within the batch loop to update the weights and biases based on the gradients calculated for each individual batch.

By making this change, your model will now train correctly as the gradients are calculated and applied for each batch of data, addressing the root cause of the "W2.grad is none" error.

标签:pytorch,mlp
From: 78562670

相关文章

  • PyTorch 的 torch.nn 模块学习
    torch.nn是PyTorch中专门用于构建和训练神经网络的模块。它的整体架构分为几个主要部分,每部分的原理、要点和使用场景如下:1.nn.Module原理和要点:nn.Module是所有神经网络组件的基类。任何神经网络模型都应该继承nn.Module,并实现其forward方法。使用场景:用于定义和......
  • pytorch实现线性回归
    转自:https://www.cnblogs.com/miraclepbc/p/14329186.html导入相关python包importtorchimportpandasaspdimportnumpyasnpimportmatplotlib.pyplotaspltfromtorchimportnn%matplotlibinline加载数据data=pd.read_csv('E:/datasets/dataset/Income1.csv�......
  • 安装并运行pytorch报错“核心已转储”
    1问题和解决概要主机环境:Ubuntu20.04,RTX3090,GPUDriverVersion525.89.02问题:用anaconda创建虚拟环境python3.10,安装pytorch2.2.2-cu118和对应torchvision后,训练模型出现报错:“核心已转储”。定位和解决:查阅资料,确认driver支持cuda-11.8,主机安装cuda-11.8后编译一个sample......
  • 用Pytorch搭建一个简单的CNN(MNIST数据集—十分类问题)
    文章目录前言一、MNIST数据集二、使用步骤1.基本库的导入和随机种子的设定2.MINIST数据集的下载、保存与加载可视化某一批图像数据3.用pytorch搭建CNNCNN的主体部分:由卷积模块和全连接组成。4.训练CNN并保存损失最小的模型网络参数的定义:每一轮训练的主体部分:4.测试训练......
  • PyTorch学习(8):PyTorch中Tensor的合并于拆分(torch.cat, torch.stack, torch.trunk, tor
    1.写在前面       在使用PyTorch执行深度学习开发时,经常会用到对Tensor的合并于拆分操作。如我们在使用CSP时,有时候会需要将Tensor拆分成两部分,其中一部分进行进行CrossStage操作,另一部分执行多重卷积操作,这个时候我们就会用到四个典型的接口,分别是torch.cat,torch......
  • pytorch基本操作
    Referredtohttps://www.bilibili.com/video/BV17X4y1H7dK/?spm_id_from=333.337.search-card.all.click&vd_source=d312c66700fc64b66258a994f0a117adimporttorchimportnumpyasnptorch.cuda.is_available()True1BasicOperation1.1createtensor#from......
  • transformer的Pytorch简易实现
    Transformer(Pytorch)fromscratchcodebyTaeHwanJung(JeffJung)@graykode,DerekMiller@dmmiller612,modifiedbyshwei;modifiedagainbyLittleHenryReference:https://blog.csdn.net/BXD1314/article/details/126187598?spm=1001.2014.3001.5506https://bl......
  • 普通程序员深度学习教程(fastai及PyTorch)1深度学习快速入门-1简介
    1深度学习快速入门本章介绍深度学习背后的关键概念,并在不同的任务中训练我们的第一个模型。如果你不是技术或数学专业出身,也没有关系,我们从工程应用的角度入手,而不是数学科学。1.1深度学习没那么难多数深度学习不需要:高深的数据基础,实际高中数学已经够用大量数据:实际最低小......
  • Python|【Pytorch】基于小波时频图与SwinTransformer的轴承故障诊断研究
    ......
  • 深度学习 PyTorch 笔记 (2) :深度神经网络 DNN
    《动手学深度学习》笔记——深度神经网络DNN教材:https://zh-v2.d2l.ai文章目录1线性回归1.1线性模型1.2损失函数:均方差/L2损失1.3解析解1.4基础优化算法:梯度下降1.5PyTorch实现2softmax回归2.1分类问题2.2softmax运算2.3损失函数:交叉熵损失2.4PyTorch实现......