首页 > 编程语言 >深度学习-DNN深度神经网络-反向传播02-python代码实现nn-41

深度学习-DNN深度神经网络-反向传播02-python代码实现nn-41

时间:2024-02-04 23:56:50浏览次数:26  
标签:02 loss nn batch activations fan 深度 np grads

目录

1. 举例

2. python实现

import numpy as np
from sklearn.datasets import fetch_mldata
from sklearn.utils.extmath import safe_sparse_dot


def train_y(y_true):
    y_ohe = np.zeros(10)
    y_ohe[int(y_true)] = 1
    return y_ohe


mnist = fetch_mldata('MNIST original', data_home='./for_my_own_nn_data/')
X, y = mnist['data'], mnist['target']

print('X shape:', X.shape)
print('y shape:', y.shape)

y = np.array([train_y(y[i]) for i in range(len(y))])

hidden_layer_size = [300, 100]
#max_iter = 20
max_iter = 20
alpha = 0.0001  # l2正则项系数
learning_rate = 0.001


def log_loss(y_true, y_prob):
    # 交叉熵损失
    y_prob = np.clip(y_prob, 1e-10, 1 - 1e-10)
    if y_prob.shape[1] == 1:
        y_prob = np.append(1 - y_prob, y_prob, axis=1)
    if y_true.shape[1] == 1:
        y_true = np.append(1 - y_true, y_true, axis=1)
    return -np.sum(y_true * np.log(y_prob)) / y_prob.shape[0]


def soft_max(x):
    tmp = x - x.max(axis=1)[:, np.newaxis]
    np.exp(tmp, out=x)
    x /= x.sum(axis=1)[:, np.newaxis]
    return x


def relu(x):
    np.clip(x, 0, np.finfo(x.dtype).max, out=x)  # max(0, x)
    return x


def relu_derivation(z, delta):
    # relu的导数要不为0 要不为1 为1则保持不变
    delta[z == 0] = 0


def gen_batch(n, bs):
    start = 0
    for _ in range(n // bs):
        end = start + bs
        yield slice(start, end)
        start = end
    # 遍历完成之后还有剩余的 剩下的全部切出
    if start < n:
        yield slice(start, n)


n_samples, n_features = X.shape
n_outputs = y.shape[1]

batch_size = min(200, n_samples)
layer_units = ([n_features] + hidden_layer_size + [n_outputs])
print("====>layer_units:", layer_units)
n_layers = len(layer_units)
print("====>n_layers: ", n_layers)

# w b 的初始化
coefs_ = []
intercepts_ = []
for i in range(n_layers - 1):
    fan_in = layer_units[i]
    fan_out = layer_units[i + 1]
    factor = 6.
    ini_bound = np.sqrt(factor / (fan_in + fan_out))
    coef_init = np.random.uniform(-ini_bound, ini_bound, (fan_in, fan_out))
    coefs_.append(coef_init)

    intercept_init = np.random.uniform(-ini_bound, ini_bound, fan_out)
    intercepts_.append(intercept_init)

# 正项传播值的初始化
activations = [X]
activations.extend(np.empty((batch_size, n_fan_out)) for n_fan_out in layer_units[1:])

# w的更新量 deltas   alpha * delta
deltas = [np.empty_like(a_layer) for a_layer in activations]

coef_grads = [np.empty((n_fan_in, n_fan_out)) for
              n_fan_in, n_fan_out in zip(layer_units[:-1], layer_units[1:])]
intercept_grads = [np.empty(n_fan_out) for n_fan_out in layer_units[1:]]

loss_ = 0.

for it in range(max_iter):
    arr = np.arange(n_samples)
    np.random.shuffle(arr)
    X = X[arr]
    y = y[arr]

    accumulated_loss = 0.0

    for batch_slice in gen_batch(n_samples, batch_size):
        batch_x = X[batch_slice]
        batch_y = y[batch_slice]

        # 输入层赋值
        activations[0] = batch_x

        # 正向传播
        for i in range(n_layers - 1):
            activations[i + 1] = safe_sparse_dot(activations[i], coefs_[i])
            # 只要不是最后一层 都需要进过激活函数
            if (i + 1) != (n_layers - 1):
                activations[i + 1] = relu(activations[i + 1])

        # 对于最后一层 需要经过softmax
        activations[i + 1] = soft_max(activations[i + 1])

        # 计算loss
        loss = log_loss(batch_y, activations[-1])  # 最后一层的输出  与 y_true 计算交叉熵

        # loss添加正则项
        values = np.sum(np.array([np.dot(s.ravel(), s.ravel()) for s in coefs_]))
        loss += (0.5 * alpha) * values / len(batch_y)
        accumulated_loss += loss * len(batch_y)

        # 反向传播
        last = n_layers - 2
        deltas[last] = activations[-1] - batch_y  # y_predict - y_true

        # 计算倒数第一个W的梯度  即从输出层返回过来的梯度
        # 1. base loss 梯度 (y_hat - y) * x 
        coef_grads[last] = safe_sparse_dot(activations[last].T, deltas[last])

        # 2. L2 loss 对应的梯度
        coef_grads[last] += (alpha * coefs_[last])

        # 求平均
        coef_grads[last] /= n_samples

        # 截距项对应的梯度
        intercept_grads[last] = np.mean(deltas[last], 0)

        # 最后一层算好之后 反向 往前推
        for i in range(n_layers - 2, 0, -1):
            # deltas_previous = deltas * W * 激活函数的导
            deltas[i - 1] = safe_sparse_dot(deltas[i], coefs_[i].T)
            # 用上激活函数的导数
            relu_derivation(activations[i], deltas[i - 1])
            # 计算每个隐藏层前面的W矩阵的梯度
            # 1,base loss对应的梯度
            coef_grads[i - 1] = safe_sparse_dot(activations[i - 1].T, deltas[i - 1])
            # 2,L2 loss对应的梯度
            coef_grads[i - 1] += (alpha * coefs_[i - 1])
            # 3,梯度求平均
            coef_grads[i - 1] /= n_samples
            # 4,截距项,base loss对应的梯度
            intercept_grads[i - 1] = np.mean(deltas[i - 1], 0)

        # 反向传播结束 跟新参数
        grads = coef_grads + intercept_grads  # 只是列表的拼接
        updates = [-learning_rate * grad for grad in grads]
        params = coefs_ + intercepts_
        for param, update in zip(params, updates):
            param += update

    loss_ = accumulated_loss / X.shape[0]
    print("interation: %d, loss=%.8f" % (it, loss_))



标签:02,loss,nn,batch,activations,fan,深度,np,grads
From: https://www.cnblogs.com/cavalier-chen/p/18007236

相关文章

  • 2024.1.21 ~ 2024.2.2 集训总结
    集训大纲Week1:图论:拓扑排序、欧拉回路、二分图、最小生成树数据结构:并查集、堆、单调队列week2:图论:连通性数据结构:线段树图论拓扑排序将DAG上的点以关联性进行排序,得到一个有关联的枚举顺序。有了这种特别的枚举顺序,使得在DAG上DP的转移过程更加合理且有......
  • 20240203
    Android中Service机制基础广播的类型:标准广播(高效,无法截断)、有序广播(同步执行,可以截断,有先后顺序)接收系统广播动态注册监听系统广播BroadcastReceiver的创建方法:新建一个类,让它继承自BroadcastReceiver,并重写父类的onReceive()方法。书上的示例,监听系统时间变化的广播,显示通......
  • DASCTF X CBCTF 2023 WP
    DASCTFXCBCTF2023WPPWNEASYBOX这题一开始通过CAT函数读出canary.txt,然后找溢出点找了两小时......
  • HASHTEAM 强网杯 2024 WP
    2023强网杯强网杯疯狂坐牢,pwn做不了一点只能在强网先锋划划水....太菜了,来年再战!CryptoNotonlyrsa开就完了,直接上代码fromCrypto.Util.numberimport*fromtqdmimporttqdmn=6249734963373034215610144758924910630356277447014258270888329547267471837899275103......
  • HASHTEAMn1ctf2023WP
    N1CTF2023排名25,卡线Cryptowarmupnonce有问题数学模型:e=2^128*e1+e2d=2^128*d1+d2nonce=2^128*e1+d1s=(e+rd)/noncemodn展开s2128*e1+s*d1=2128e1+e2+r(2^128d1+d2)modn(s-1)2128*e1+(s-r*2128)d1-r*d2-e2==0modn造格子注意到d最高2^255一定为1,卡下界SM4......
  • 【专题】2023年碳市场、净零碳、双碳行业报告汇总PDF合集分享(附原数据表)
    原文链接: https://tecdat.cn/?p=35142原文出处:拓端数据部落公众号中国碳金融创新发展是一个备受关注的话题。本白皮书报告合集综合了中国特色与国际经验、理论研究与前沿实践、监管导向与市场声音,全面探讨了金融力量在中国碳市场发展中的角色与作用。阅读原文,获取专题报告合集......
  • 2024牛客寒假算法基础集训营1 J 又鸟之亦心 题解
    Question2024牛客寒假算法基础集训营1J又鸟之亦心Solution挺好的一个题,给了我很多启发显然,先二分最大值\(D\),关键在于\(check\)怎么写考虑到两个人是相对的,第\(i\)次之后肯定有一个人在\(a_i\),具体是谁不重要,也不需要关注是怎么走过来的,我们需要去维护另外一个人可......
  • 24/02/04 CF567E President and Roads
    题目描述Berlandhas$n$cities,thecapitalislocatedincity$s$,andthehistorichometownofthePresidentisincity$t$($s≠t$).Thecitiesareconnectedbyone-wayroads,thetraveltimeforeachoftheroadisapositiveinteger.Once......
  • 深度优先遍历例题(排列数字)
    给定一个整数n,将数字1~n排成—排,将会有很多种排列方法。现在,请你按照字典序将所有的排列方法输出。输入格式共一行,包含一个整数n。输出格式按字典序输出所有排列方案,每个方案占一行。数据范围1≤n≤7#include<iostream>usingnamespacestd;constintN=10;intn;int......
  • .NET周刊【1月第3期 2024-01-24】
    国内文章.NET开源的简单、快速、强大的前后端分离后台权限管理系统https://www.cnblogs.com/Can-daydayup/p/17980851本文介绍了中台Admin,一款基于Vue3和.NET8的开源后台权限管理系统。它具备前后端分离架构,支持多租户、接口和数据权限、动态Api等功能,并集成了多种中间件和服务......