tensorflow代码复现梯度不稳地的详细教程

标签：梯度稳地 print shape tape 复现 tf tensorflow grad

请先了解的我以下文章再来复现代码：

Python和tensorflow安装：CSDN

梯度不稳定问题：CSDN

梯度消失：CSDN

梯度爆炸：https://mp.csdn.net/mp_blog/creation/editor/143983878

代码中使用 tape.gradient (Loss, W) 来计算得出。

1. Cmd命令中安装matplotlib

Win+R打开Windows上的命令提示符，输入使用命令安装matplotlib：

pip install matplotlib

或是使用国内的清华大学镜像来下载安装：

pip install matplotlib -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

图1 打开cmd

matplotlib以及相关依赖项都已经成功安装

图2 cmd安装matplotlib

2. 在开始菜单打开IDLE

图3 IDLE启动项

启动IDLE

图4 IDLE启动界面

3. 新建python文件

在IDLE上点击File-->New File，新建python文件。

图5 IDLE上新建文件

Python文件编辑窗

图6 Python文件编辑窗

4. 编写代码

在python文件编辑窗中写上如下代码

import tensorflow as tf

import numpy as np

import matplotlib.pyplot as plt



print(tf.__version__)  # 查看tensorflow版本，注意需要大于等于 2.0



# 搭建四层的神经网络（不算输入层），此处为这四层的权重矩阵

W1 = tf.Variable(tf.random.normal(shape=(100, 150)))  # 第一层对应的权重矩阵，内含的值按高斯分布（默认：均值为0，方差为1），该层有150个神经元（上一层是输入层，有100个神经元）

W2 = tf.Variable(tf.random.normal(shape=(150, 200)))  # 第二层对应的权重矩阵，内含的值按高斯分布（该层200个神经元）

W3 = tf.Variable(tf.random.normal(shape=(200, 150)))  # 第三层对应的权重矩阵，内含的值按高斯分布（该层150个神经元）

W4 = tf.Variable(tf.random.normal(shape=(150, 20)))  # 第四层对应的权重矩阵，内含的值按高斯分布（该层20个神经元）



# 用tf.GradientTape方法，记录神经网络正向传播（Forward Propagation）时的所有参数，以便后续计算梯度

with tf.GradientTape(persistent=True) as tape:

    inputData = tf.constant(tf.random.normal((1000, 100)))  # 输入1000条数据，每个数据100个维度，匹配于输入层的100个神经元



    f1_befor_ac = tf.matmul(inputData, W1)  # 输入数据通过和W1进行矩阵乘法来输入到第一层的神经元上

    f1 = tf.nn.sigmoid(f1_befor_ac)  # 第一层神经元里使用sigmoid作为激活函数



    f2_befor_ac = tf.matmul(f1, W2)

    f2 = tf.nn.sigmoid(f2_befor_ac)



    f3_befor_ac = tf.matmul(f2, W3)

    f3 = tf.nn.sigmoid(f3_befor_ac)



    f4_befor_ac = tf.matmul(f3, W4)

    f4 = tf.nn.sigmoid(f4_befor_ac)



    loss = f4  # 这里为了简化问题，直接把第四层的输出作为最终的损失函数来输出。实际问题中，损失函数需要输入数据的真实标签和预测标签相计算来得出



# 计算各个层的梯度 △W：

print('gradient: loss -> W1')  # 计算第一层的△W

grad_w1 = tape.gradient(loss, W1).numpy()  # tape.gradient(loss, W1) 得出的是一个 Eager Tensor（Tensorflow 2.0以上版本的 Tensor），这里加上 .numpy()即可以使其变为numpy型数据

print('shape: {0}, mean: {1:.4f}, std: {2:.4f}'.format(grad_w1.shape, np.mean(grad_w1), np.std(grad_w1)))

print('---------------------------------------')

print('gradient: loss -> W2')  # 计算第二层的△W

grad_w2 = tape.gradient(loss, W2).numpy()

print('shape: {0}, mean: {1:.4f}, std: {2:.4f}'.format(grad_w2.shape, np.mean(grad_w2), np.std(grad_w2)))

print('---------------------------------------')

print('gradient: loss -> W3')  # 计算第三层的△W

grad_w3 = tape.gradient(loss, W3).numpy()

print('shape: {0}, mean: {1:.4f}, std: {2:.4f}'.format(grad_w3.shape, np.mean(grad_w3), np.std(grad_w3)))

print('---------------------------------------')

print('gradient: loss -> W4')  # 计算第四层的△W

grad_w4 = tape.gradient(loss, W4).numpy()

print('shape: {0}, mean: {1:.4f}, std: {2:.4f}'.format(grad_w4.shape, np.mean(grad_w4), np.std(grad_w4)))

print('---------------------------------------')



del tape  # 删掉tape上记录的各种参数，减小内存压力

5. 运行

先保存文件，在IDLE上点击Run-->Run Module，执行文件。

图7 IDLE上运行按钮

会在IDLE Shell3.9.13上看到运行结果。

由于输入数据 tf.random.normal((1000, 100)) 的随机性，每次得到的结果会有所不同。本例中，输出结果为：

图8 IDLE上查看运行结果

6. 查看差别

我们也可以将每层的梯度矩阵 $\Delta w$ ，通过频数直方图进行显示，以此来对比“层与层之间”梯度矩阵内梯度值分布的差别。

继续在上面的文件上加上以下灰色背景的代码（从第二行开始）

del tape  # 删掉tape上记录的各种参数，减小内存压力



grad_list = []

grad_list.append(grad_w1.flatten())

grad_list.append(grad_w2.flatten())

grad_list.append(grad_w3.flatten())

grad_list.append(grad_w4.flatten())



for idx, grad in enumerate(grad_list):

    plt.hist(grad, bins=60, density=True, histtype='step', label='for layer: ' + str(idx + 1))  # 实线的阶梯分布

    # plt.hist(grad, bins=60, density=True, alpha=0.3, label='for layer: ' + str(idx + 1))  # 透明的山状分布

plt.legend()

plt.show()

继续F5运行文件，即可看到如下效果图。

图9 梯度爆炸结果

7. 结论

由以上的输出结果以及各层梯度矩阵内梯度值的分布可知（激活函数是sigmoid）：越靠近输入层的权重矩阵，其内部的权重值越普遍接近0（比如layer1，即蓝色部分）。这意味着权重矩阵更新时，该层的权重矩阵无法得到有效的更新，大量的权重梯度值 $\Delta w\approx 0$ 。

标签：梯度,稳地,print,shape,tape,复现,tf,tensorflow,grad
From： https://blog.csdn.net/lzm12278828/article/details/143984432