首页 > 其他分享 >〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!

〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!

时间:2022-11-07 15:39:00浏览次数:50  
标签:acc loss training 22 TensorFlow2.0 Epoch BP test data


使用Numpy在MNIST数据集上实现3层BP神经网络!

  • 本文章是 TensorFlow2.0学习笔记 系列,欢迎关注该,专栏链接: ​​TensorFlow2.0学习笔记​​,文章会持续更细,多希望大家点赞收藏加转发!
  • 文章总目录链接:​​TensorFlow2.0学习笔记总目录!​​

文章目录

  • ​​一. 知识回顾-公式推导​​
  • ​​二. mnist.pkl.gz 数据集预处理​​
  • ​​2.1. load_data() 数据集加载格式​​
  • ​​2.2. load_data_wrapper() 数据集加载格式​​
  • ​​三. 具体实现细节​​
  • ​​3.1. 测试程序​​
  • ​​3.2. 运行结果​​
  • ​​四. 需要辅导可以私聊我!​​

一. 知识回顾-公式推导



〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_公式推导

  • 注意:​python​​​ 中的 ​​zip()​​ 函数用于将可迭代的对象作为参数,将对象中对应的元素打包成一个个元组,然后返回由这些元组组成的列表。
  • ​zip​​​ 语法:​​zip([iterable, ...])​​;返回元组列表。
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> c = [4,5,6,7,8]
>>> zipped = list(zip(a,b)) # 打包为元组的列表
> [(1, 4), (2, 5), (3, 6)]
>>> zip(a,c) # 元素个数与最短的列表一致
> [(1, 4), (2, 5), (3, 6)]
>>> list(zip(*zipped)) # 与 zip 相反,*zipped 可理解为解压,返回二维矩阵式
>[(1, 2, 3), (4, 5, 6)]
  • 具体的公式推导可以参考文章:​​具体公式推导​​


〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_Deeplearning_02


〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_数据集_03

二. mnist.pkl.gz 数据集预处理

  • 下载的数据集​​mnist.pkl.gz​
  • 处理程序如下:
"""
mnist_loader
~~~~~~~~~~~~

A library to load the MNIST image data. For details of the data
structures that are returned, see the doc strings for ``load_data``
and ``load_data_wrapper``. In practice, ``load_data_wrapper`` is the
function usually called by our neural network code.
"""
#### Libraries
# Standard library

import pickle
import gzip

# Third-party libraries
import numpy as np

def load_data():
"""
Return the MNIST data as a tuple containing the training data,
the validation data, and the test data.

The ``training_data`` is returned as a tuple with two entries.
The first entry contains the actual training images. This is a
numpy ndarray with 50,000 entries. Each entry is, in turn, a
numpy ndarray with 784 values, representing the 28 * 28 = 784
pixels in a single MNIST image.

The second entry in the ``training_data`` tuple is a numpy ndarray
containing 50,000 entries. Those entries are just the digit
values (0...9) for the corresponding images contained in the first
entry of the tuple.

The ``validation_data`` and ``test_data`` are similar, except
each contains only 10,000 images.

This is a nice data format, but for use in neural networks it's
helpful to modify the format of the ``training_data`` a little.
That's done in the wrapper function ``load_data_wrapper()``, see
below.
:return:
"""
f = gzip.open('data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
f.close()
return (training_data, validation_data, test_data)

def load_data_wrapper():
"""
Return a tuple containing ``(training_data, validation_data,
test_data)``. Based on ``load_data``, but the format is more
convenient for use in our implementation of neural networks.

In particular, ``training_data`` is a list containing 50,000
2-tuples ``(x, y)``. ``x`` is a 784-dimensional numpy.ndarray
containing the input image. ``y`` is a 10-dimensional
numpy.ndarray representing the unit vector corresponding to the
correct digit for ``x``.

``validation_data`` and ``test_data`` are lists containing 10,000
2-tuples ``(x, y)``. In each case, ``x`` is a 784-dimensional
numpy.ndarry containing the input image, and ``y`` is the
corresponding classification, i.e., the digit values (integers)
corresponding to ``x``.

Obviously, this means we're using slightly different formats for
the training data and the validation / test data. These formats
turn out to be the most convenient for use in our neural network
code.

:return:
"""
tr_d, va_d, te_d = load_data()
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
training_data = list(zip(training_inputs, training_results))
validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
validation_data = list(zip(validation_inputs, va_d[1]))
test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
test_data = list(zip(test_inputs, te_d[1]))
return (training_data, validation_data, test_data)

def vectorized_result(j):
"""
Return a 10-dimensional unit vector with a 1.0 in the jth
position and zeroes elsewhere. This is used to convert a digit
(0...9) into a corresponding desired output from the neural
network.
:param j:
:return:
"""
e = np.zeros((10, 1))
e[j] = 1.0
return e

2.1. load_data() 数据集加载格式

  • 加载代码
def load_data():
f = gzip.open('data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
print(training_data[0], training_data[1])
print(training_data[0].shape, training_data[1].shape, validation_data[0].shape, validation_data[1].shape,
test_data[0].shape, test_data[1].shape)
print(training_data[0][0].shape, training_data[1][0].shape)
f.close()
return (training_data, validation_data, test_data)
  • 显示结果:可以看出来格式


〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_Tensorflow2.0_04

2.2. load_data_wrapper() 数据集加载格式

tr_d, va_d, te_d = load_data()
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
print(training_inputs[0].shape)
training_results = [vectorized_result(y) for y in tr_d[1]]
print(training_results[0].shape)
training_data = list(zip(training_inputs, training_results))
validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
validation_data = list(zip(validation_inputs, va_d[1]))
test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
test_data = list(zip(test_inputs, te_d[1]))
return (training_data, validation_data, test_data)
  • 显示结果:可以看出来格式


〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_公式推导_05

  • zip之后变为如下:


〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_公式推导_06

三. 具体实现细节

  • 随机梯度下降算法可以参考:吴恩达笔记:梯度下降法和随机梯度下降法和小批量梯度对比
  • 好多细节之处参考公式


〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_Deeplearning_07


〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_Deeplearning_08

3.1. 测试程序

"""
"""
network.py
~~~~~~~~~~

A module to implement the stochastic gradient descent learning
algorithm for a forward neural network. Gradients are calculated
using backpropagation. Note that I have focused on making the code
simple, easily readable, and easily modifiable. It is not optimized,
and omits many desirable features.
"""

#### Libraries
# Standard library
import random

# Third-party libraries
import numpy as np


#### Miscellaneous functions
def sigmoid(z):
"""
The sigmoid function.
"""
return 1.0 / (1.0 + np.exp(-z))

def sigmoid_prime(z):
"""Derivative of the sigmoid function."""
return sigmoid(z) * (1 - sigmoid(z))


# 新建一个类,表示3层的神经网络结构!
class Network:
def __init__(self, sizes):
"""
The list ``sizes`` contains the number of neurons in the
respective layers of the network. For example, if the list
was [2, 3, 1] then it would be a three-layer network, with the
first layer containing 2 neurons, the second layer 3 neurons,
and the third layer 1 neuron. The biases and weights for the
network are initialized randomly, using a Gaussian
distribution with mean 0, and variance 1. Note that the first
layer is assumed to be an input layer, and by convention we
won't set any biases for those neurons, since biases are only
ever used in computing the outputs from later layers.

:param size: [784, 30, 10] 元素表示每层的维度,我们设置为一个list
"""
self.num_layers = len(sizes)
# sizes: [784, 30, 10]
self.sizes = sizes
# b: [ch_out, 1] 偏置
self.biases = [np.random.randn(ch_out, 1) for ch_out in sizes[1:]]
# w: [ch_out, ch_in] 权重
self.weights = [np.random.randn(ch_out, ch_in) for ch_in, ch_out in zip(sizes[:-1], sizes[1:])]

def forward(self, x):
"""
Return the output of the network if ``a`` is input.
:param x: [784, 1] 表示输入的纬度。
:return: [10, 1]
"""
for b, w in zip(self.biases, self.weights):
# [30, 784] @ [784, 1]=> [30, 1] + [30, 1] => [30, 1]
# [10, 30] @ [30, 1] + [10, 1] => [10, 1]
z = np.dot(w, x) + b
# [30, 1]
# [10, 1]
x = sigmoid(z)
return x

def train(self, training_data, epochs, batchsz, lr, test_data=None):
"""
Train the neural network using mini-batch stochastic gradient descent.
The ``training_data`` is a list of tuples
``(x, y)`` representing the training inputs and the desired
outputs. The other non-optional parameters are self-explanatory. If
``test_data`` is provided then the network will be evaluated against
the test data after each epoch, and partial progress printed out.
This is useful for tracking progress, but slows things down substantially.
"""
if test_data:
n_test = len(test_data)

n = len(training_data)
for j in range(epochs):
random.shuffle(training_data)

mini_batches = [training_data[k:k+batchsz] for k in range(0, n, batchsz)]

# for every (x,y)
for mini_batch in mini_batches:
loss = self.update_mini_batch(mini_batch, lr) # 返回的有损失值!
print("Epoch {0}: ".format(j), 'loss: ', loss)
if test_data:
# print("Epoch {0}: {1} / {2}".format(j, self.evaluate(test_data), n_test), 'loss: ', loss)
print('test_acc:', self.evaluate(test_data)/n_test)
else:
print("Epoch {0} complete".format(j) )



def update_mini_batch(self, batch, lr):
"""
Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
is the learning rate.
"""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
loss = 0 # 损失值

# for every sample in current batch
for x, y in batch:
# list of every w, b gradient
delta_nabla_b, delta_nabla_w, loss_ = self.backprop(x, y) # 得到当前的梯度值

# 就比如是:[w1, w2, w3]这个是一个样本的,多样本的时候我们应该吧对应位置的累加起来求一个平均值。
nabla_b = [accu + cur for accu, cur in zip(nabla_b, delta_nabla_b)]
nabla_w = [accu + cur for accu, cur in zip(nabla_w, delta_nabla_w)] # cur当前的,accu为之前的;进行对应位置累加。
loss += loss_ #损失值


# 求平均值梯度值w, b ,这个除是点除,因为前面累加也是相应位置进行累加。
nabla_w = [w / len(batch) for w in nabla_w]
nabla_b = [b / len(batch) for b in nabla_b]


# 使用SGD随机梯度下降算法进行更新权值w偏置b
# w = w - lr * nabla_w
self.weights = [w - lr * nabla for w, nabla in zip(self.weights, nabla_w)]
self.biases = [b - lr * nabla for b, nabla in zip(self.biases, nabla_b)]

loss = loss / len(batch) # 损失值

return loss


def backprop(self, x, y):
"""
Return a tuple ``(nabla_b, nabla_w)`` representing the
gradient for the cost function C_x. ``nabla_b`` and
``nabla_w`` are layer-by-layer lists of numpy arrays, similar
to ``self.biases`` and ``self.weights``.
:param x: [1, 784]
:param y: [1, 10], one_hot encoding
:return:
"""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]

# 1. forward
# 为什么反向传播过程中仍然需要forward, 因为我们需要在forward中记录每层z,activation变量,方便我们以后计算梯度。
# 但是为什么还有一个单独的forward过程,因为单独forward方便我们以后的预测。因为做测试的时候不需要backword的。
activation = x
activations = [x] # list to store all the activations, layer by layer
# w*x = z => sigmoid => x/activation
zs = [] # list to store all the z vectors, layer by layer

for b, w in zip(self.biases, self.weights):
# https://stackoverflow.com/questions/34142485/difference-between-numpy-dot-and-python-3-5-matrix-multiplication
# np.dot vs np.matmul = @ vs element-wise *
z = np.dot(w, activation) + b
zs.append(z)
activation = sigmoid(z)
activations.append(activation)

# 损失函数的值
loss = np.power(activation[-1] - y, 2).sum()

# 2. backward pass
# (Ok-tk)*(1-Ok)*Ok 参考公式;倒数第一层

# 2.1 compute gradient on output layer 首先输出层计算梯度。
# [10, 1] * [10, 1] => [10, 1]
# 和下面这2个都可以的: delta = self.cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
delta = activations[-1] * (1 - activations[-1]) * (activations[-1] - y)

nabla_b[-1] = delta
# delta: [10, 1]
# activations[-2]: [30, 1]
# [10, 1] @ [1, 30] => [10, 30]
nabla_w[-1] = np.dot(delta, activations[-2].transpose())

# Note that the variable l in the loop below is used a little
# differently to the notation in Chapter 2 of the book. Here,
# l = 1 means the last layer of neurons, l = 2 is the
# second-last layer, and so on. It's a renumbering of the
# scheme in the book, used here to take advantage of the fact
# that Python can use negative indices in lists.

# 2.2 compute hidden gradient

for l in range(2, self.num_layers):
# [30, 1]
z = zs[-l]
sp = sigmoid_prime(z)

# delta_j的公式
# [10, 30].T @ [10, 1] => [30, 10] @ [10, 1] => [30, 1] * [30, 1] => [30, 1]
delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp # 公式
nabla_b[-l] = delta
# [30, 1] @ [784, 1].T => [30, 784]
nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose()) # 矩阵相乘。

return (nabla_b, nabla_w, loss)


def evaluate(self, test_data):
"""
Return the number of test inputs for which the neural
network outputs the correct result. Note that the neural
network's output is assumed to be the index of whichever
neuron in the final layer has the highest activation.

:param test_data: list of [x, y]
:return:
"""
# x, y不加括号也是一样。
test_results = [(np.argmax(self.forward(x)), y) for (x, y) in test_data]
correct = sum(int(pred == y) for pred, y in test_results)

return correct

def main():
import mnist_loader
# Loading the MNIST data
training_data, validation_data, test_data = mnist_loader.load_data_wrapper()

print(len(training_data), training_data[0][0].shape, training_data[0][1].shape)
print(len(test_data), test_data[0][0].shape, test_data[0][1].shape)
print(test_data[0][1])

# Set up a Network with 30 hidden neurons
net = Network([784, 30, 10])
# Use stochastic gradient descent to learn from the MNIST training_data over
# 30 epochs, with a mini-batch size of 10, and a learning rate of η = 3.0
net.train(training_data, 200, 10, 0.1, test_data=test_data)

if __name__ == '__main__':
main()

3.2. 运行结果

C:\anaconda3\envs\tf2\python.exe F:/Codes/MyCodes/TF2/TF2_5/network.py
50000 (784, 1) (10, 1)
10000 (784, 1) ()
7
Epoch 0: loss: 1.132647628860319
test_acc: 0.5828
Epoch 1: loss: 1.0578827241319213
test_acc: 0.7225
Epoch 2: loss: 1.6204362221682012
test_acc: 0.7645
Epoch 3: loss: 1.3278571954708824
test_acc: 0.7891
Epoch 4: loss: 1.6318255509937707
test_acc: 0.8311
Epoch 5: loss: 1.6581321629577968
test_acc: 0.8484
Epoch 6: loss: 1.6513287523876141
test_acc: 0.8592
Epoch 7: loss: 0.9853115796188012
test_acc: 0.8671
Epoch 8: loss: 1.1087704554589501
test_acc: 0.8699
Epoch 9: loss: 1.6918591610322473
test_acc: 0.8791
Epoch 10: loss: 0.9863359375641604
test_acc: 0.8821
Epoch 11: loss: 0.9854092322320076
test_acc: 0.8865
Epoch 12: loss: 0.984694141702656
test_acc: 0.8907
Epoch 13: loss: 0.9775137267023272
test_acc: 0.8946
Epoch 14: loss: 1.5003901316492476
test_acc: 0.8978
Epoch 15: loss: 1.7716915861120133
test_acc: 0.9002
Epoch 16: loss: 1.0165926944686068
test_acc: 0.9012
Epoch 17: loss: 0.985803526565471
test_acc: 0.9039
Epoch 18: loss: 1.6971768157806548
test_acc: 0.9065
Epoch 19: loss: 0.9932158550708612
test_acc: 0.9073
Epoch 20: loss: 0.989865123852802
test_acc: 0.9088
Epoch 21: loss: 0.9858300586350227
test_acc: 0.9102
Epoch 22: loss: 1.1675667016805211
test_acc: 0.9115
Epoch 23: loss: 0.9921924034474985
test_acc: 0.9133
Epoch 24: loss: 2.48563165228654
test_acc: 0.9147
Epoch 25: loss: 1.7609695632099929
test_acc: 0.9166
Epoch 26: loss: 0.980228512719132
test_acc: 0.9171
Epoch 27: loss: 1.6757957528312144
test_acc: 0.9176
Epoch 28: loss: 2.367184001904443
test_acc: 0.9186
Epoch 29: loss: 1.0870255098965413
test_acc: 0.9192
Epoch 30: loss: 1.6871390324527216
test_acc: 0.9193
Epoch 31: loss: 2.4114307525624237
test_acc: 0.9215
Epoch 32: loss: 1.6436507728417369
test_acc: 0.9217
Epoch 33: loss: 1.209697855929342
test_acc: 0.9215
Epoch 34: loss: 2.2354208734594954
test_acc: 0.9242
Epoch 35: loss: 0.992876644264711
test_acc: 0.9244
Epoch 36: loss: 1.727974529783943
test_acc: 0.9257
Epoch 37: loss: 3.173422974998758
test_acc: 0.9254
Epoch 38: loss: 0.994057385711074
test_acc: 0.9261
Epoch 39: loss: 1.7685019726265438
test_acc: 0.9269
Epoch 40: loss: 1.787343318726783
test_acc: 0.9276
Epoch 41: loss: 1.5580545168915454
test_acc: 0.928
Epoch 42: loss: 1.4557322177150884
test_acc: 0.9282
Epoch 43: loss: 2.502288173796844
test_acc: 0.9283
Epoch 44: loss: 0.964226023719922
test_acc: 0.9293
Epoch 45: loss: 1.7725552048287896
test_acc: 0.9297
Epoch 46: loss: 0.9862273389471732
test_acc: 0.9304
Epoch 47: loss: 1.0343887314213098
test_acc: 0.9304
Epoch 48: loss: 0.9942537595848329
test_acc: 0.9306
Epoch 49: loss: 0.9815037856368498
test_acc: 0.9308
Epoch 50: loss: 1.697253825212717
test_acc: 0.9315
Epoch 51: loss: 0.9939517622373302
test_acc: 0.931
Epoch 52: loss: 0.9876758511512463
test_acc: 0.9312
Epoch 53: loss: 2.070249040476458
test_acc: 0.9322
Epoch 54: loss: 0.9923401161590604
test_acc: 0.9316
Epoch 55: loss: 1.7551223402423441
test_acc: 0.9328
Epoch 56: loss: 2.4961317876156377
test_acc: 0.9323
Epoch 57: loss: 0.98441512223147
test_acc: 0.933
Epoch 58: loss: 2.0594283118501258
test_acc: 0.9334
Epoch 59: loss: 0.9830295547366568
test_acc: 0.9322
Epoch 60: loss: 0.9831873996719361
test_acc: 0.9333
Epoch 61: loss: 3.801473280847201
test_acc: 0.9334

标签:acc,loss,training,22,TensorFlow2.0,Epoch,BP,test,data
From: https://blog.51cto.com/u_15866474/5829899

相关文章

  • 〖TensorFlow2.0笔记19〗过拟合介绍以及解决方法+补充: 实现GPU按需分配!
    文章目录​​一、过拟合与欠拟合​​​​1.1、欠拟合Underfitting​​​​1.2、过拟合Overfitting​​​​1.3、总结​​​​二、交叉验证​​​​2.1、如何检测过拟合​​......
  • 2022-11-07 Acwing每日一题
    本系列所有题目均为Acwing课的内容,发表博客既是为了学习总结,加深自己的印象,同时也是为了以后回过头来看时,不会感叹虚度光阴罢了,因此如果出现错误,欢迎大家能够指出错误,我......
  • Webpack中的高级特性
    自从webpack4以后,官方帮我们集成了很多特性,比如在生产模式下代码压缩自动开启等,这篇文章我们一起来探讨一下webpack给我们提供的高级特性助力开发。探索webpack的高级特性......
  • 22.11.07
    1、旧版中 pytorch.rfft函数与新版pytorch.fft.rfft函数对应修改问题旧版:fft=torch.rfft(input,2,normalized=True,onesided=False)#input为输入的图片或者......
  • zzszoi20221107
    20221107zzszoi模拟赛记录作者zzafanti(FreshOrange)请勿转载这次比赛题目较简单但是思维难度高一点开题顺序\(A-C-B-D\)或\(C-A-D-B\)等等都可大概就是\(A,C\)......
  • 流程图bpmn
    bpmn流程图官网//引入流程图及汉化方案importModelerfrom'bpmn-js/lib/Modeler'//引入节点属性面板importpropertiesPanelModulefrom'bpmn-js-properties-pane......
  • 2022强网拟态-only
    2022强网拟态only比赛的时候没做得出来,赛后复现一下。只有一次doublefree的机会,好在用的是seccomp开的沙盒,使得一开始就有很多空闲堆块,里面也残留有libc指针,通过堆风水......
  • P8773 [蓝桥杯 2022 省 A] 选数异或
    题面给定一个长度为\(n\)的数列\(A_{1},A_{2},\cdots,A_{n}\)和一个非负整数\(x\),给定\(m\)次查询,每次询问能否从某个区间\([l,r]\)中选择两个数使得他......
  • P8622 [蓝桥杯 2014 国 B] 生物芯片
    简要题意有\(N\)个二进制数,编号为\(1\simN\),初始时都是\(0\)。博士进行了\(N-1\)次操作,在第\(i\)次操作时,会将\(1\simN\)中所有编号为\(i+1\)的倍数的二进......
  • 22、统计英文短文中前十个次数最多的单词
    题目:  在words_count.txt英文短文文件中,找出前十个次数最多的单词。思路:  1、创建一个新的空字典  2、遍历修饰所有单词,并逐个添加次数。  3、进行排序。......