〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!

标签：acc loss training 22 TensorFlow2.0 Epoch BP test data

使用Numpy在MNIST数据集上实现3层BP神经网络!

本文章是 TensorFlow2.0学习笔记系列，欢迎关注该，专栏链接: TensorFlow2.0学习笔记，文章会持续更细，多希望大家点赞收藏加转发！
文章总目录链接：TensorFlow2.0学习笔记总目录!

文章目录

一. 知识回顾-公式推导
二. mnist.pkl.gz 数据集预处理

2.1. load_data() 数据集加载格式
2.2. load_data_wrapper() 数据集加载格式

三. 具体实现细节

3.1. 测试程序
3.2. 运行结果

四. 需要辅导可以私聊我!

一. 知识回顾-公式推导

〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_公式推导

注意： python 中的 zip() 函数用于将可迭代的对象作为参数，将对象中对应的元素打包成一个个元组，然后返回由这些元组组成的列表。
zip 语法：zip([iterable, ...])；返回元组列表。

>>> a = [1,2,3]
>>> b = [4,5,6]
>>> c = [4,5,6,7,8]
>>> zipped = list(zip(a,b))      # 打包为元组的列表
> [(1, 4), (2, 5), (3, 6)]
>>> zip(a,c)                   # 元素个数与最短的列表一致
> [(1, 4), (2, 5), (3, 6)]
>>> list(zip(*zipped))           # 与 zip 相反，*zipped 可理解为解压，返回二维矩阵式
>[(1, 2, 3), (4, 5, 6)]

具体的公式推导可以参考文章：具体公式推导

〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_Deeplearning_02

〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_数据集_03

二. mnist.pkl.gz 数据集预处理

下载的数据集mnist.pkl.gz
处理程序如下：

"""
mnist_loader
~~~~~~~~~~~~

A library to load the MNIST image data.  For details of the data
structures that are returned, see the doc strings for ``load_data``
and ``load_data_wrapper``.  In practice, ``load_data_wrapper`` is the
function usually called by our neural network code.
"""
#### Libraries
# Standard library

import pickle
import gzip

# Third-party libraries
import numpy as np

def load_data():
    """
    Return the MNIST data as a tuple containing the training data,
    the validation data, and the test data.

    The ``training_data`` is returned as a tuple with two entries.
    The first entry contains the actual training images.  This is a
    numpy ndarray with 50,000 entries.  Each entry is, in turn, a
    numpy ndarray with 784 values, representing the 28 * 28 = 784
    pixels in a single MNIST image.

    The second entry in the ``training_data`` tuple is a numpy ndarray
    containing 50,000 entries.  Those entries are just the digit
    values (0...9) for the corresponding images contained in the first
    entry of the tuple.

    The ``validation_data`` and ``test_data`` are similar, except
    each contains only 10,000 images.

    This is a nice data format, but for use in neural networks it's
    helpful to modify the format of the ``training_data`` a little.
    That's done in the wrapper function ``load_data_wrapper()``, see
    below.
    :return:
    """
    f = gzip.open('data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
    f.close()
    return (training_data, validation_data, test_data)

def load_data_wrapper():
    """
    Return a tuple containing ``(training_data, validation_data,
    test_data)``. Based on ``load_data``, but the format is more
    convenient for use in our implementation of neural networks.

    In particular, ``training_data`` is a list containing 50,000
    2-tuples ``(x, y)``.  ``x`` is a 784-dimensional numpy.ndarray
    containing the input image.  ``y`` is a 10-dimensional
    numpy.ndarray representing the unit vector corresponding to the
    correct digit for ``x``.

    ``validation_data`` and ``test_data`` are lists containing 10,000
    2-tuples ``(x, y)``.  In each case, ``x`` is a 784-dimensional
    numpy.ndarry containing the input image, and ``y`` is the
    corresponding classification, i.e., the digit values (integers)
    corresponding to ``x``.

    Obviously, this means we're using slightly different formats for
    the training data and the validation / test data.  These formats
    turn out to be the most convenient for use in our neural network
    code.

    :return:
    """
    tr_d, va_d, te_d = load_data()
    training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
    training_results = [vectorized_result(y) for y in tr_d[1]]
    training_data = list(zip(training_inputs, training_results))
    validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
    validation_data = list(zip(validation_inputs, va_d[1]))
    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
    test_data = list(zip(test_inputs, te_d[1]))
    return (training_data, validation_data, test_data)

def vectorized_result(j):
    """
    Return a 10-dimensional unit vector with a 1.0 in the jth
    position and zeroes elsewhere.  This is used to convert a digit
    (0...9) into a corresponding desired output from the neural
    network.
    :param j:
    :return:
    """
    e = np.zeros((10, 1))
    e[j] = 1.0
    return e

2.1. load_data() 数据集加载格式

加载代码

def load_data():
    f = gzip.open('data/mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = pickle.load(f, encoding='latin1')
    print(training_data[0], training_data[1])
    print(training_data[0].shape, training_data[1].shape, validation_data[0].shape, validation_data[1].shape,
          test_data[0].shape, test_data[1].shape)
    print(training_data[0][0].shape, training_data[1][0].shape)
    f.close()
    return (training_data, validation_data, test_data)

显示结果：可以看出来格式

〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_Tensorflow2.0_04

2.2. load_data_wrapper() 数据集加载格式

tr_d, va_d, te_d = load_data()
    training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
    print(training_inputs[0].shape)
    training_results = [vectorized_result(y) for y in tr_d[1]]
    print(training_results[0].shape)
    training_data = list(zip(training_inputs, training_results))
    validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
    validation_data = list(zip(validation_inputs, va_d[1]))
    test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
    test_data = list(zip(test_inputs, te_d[1]))
    return (training_data, validation_data, test_data)

显示结果：可以看出来格式

〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_公式推导_05

zip之后变为如下：

〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_公式推导_06

三. 具体实现细节

随机梯度下降算法可以参考：吴恩达笔记：梯度下降法和随机梯度下降法和小批量梯度对比
好多细节之处参考公式

〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_Deeplearning_07

〖TensorFlow2.0笔记22〗使用Numpy在MNIST数据集上实现3层BP神经网络!_Deeplearning_08

3.1. 测试程序

"""
"""
network.py
~~~~~~~~~~

A module to implement the stochastic gradient descent learning
algorithm for a forward neural network.  Gradients are calculated
using backpropagation.  Note that I have focused on making the code
simple, easily readable, and easily modifiable.  It is not optimized,
and omits many desirable features.
"""

#### Libraries
# Standard library
import random

# Third-party libraries
import numpy as np


#### Miscellaneous functions
def sigmoid(z):
    """
    The sigmoid function.
    """
    return 1.0 / (1.0 + np.exp(-z))

def sigmoid_prime(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z) * (1 - sigmoid(z))


# 新建一个类，表示3层的神经网络结构！
class Network:
        def  __init__(self, sizes):
            """
            The list ``sizes`` contains the number of neurons in the
            respective layers of the network.  For example, if the list
            was [2, 3, 1] then it would be a three-layer network, with the
            first layer containing 2 neurons, the second layer 3 neurons,
            and the third layer 1 neuron.  The biases and weights for the
            network are initialized randomly, using a Gaussian
            distribution with mean 0, and variance 1.  Note that the first
            layer is assumed to be an input layer, and by convention we
            won't set any biases for those neurons, since biases are only
            ever used in computing the outputs from later layers.

            :param size: [784, 30, 10]  元素表示每层的维度，我们设置为一个list
            """
            self.num_layers = len(sizes)
            # sizes: [784, 30, 10]
            self.sizes = sizes
            # b: [ch_out， 1] 偏置
            self.biases = [np.random.randn(ch_out, 1) for ch_out in sizes[1:]]
            # w: [ch_out, ch_in] 权重
            self.weights = [np.random.randn(ch_out, ch_in) for ch_in, ch_out in zip(sizes[:-1], sizes[1:])]

        def forward(self, x):
            """
            Return the output of the network if ``a`` is input.
            :param x: [784, 1] 表示输入的纬度。
            :return: [10, 1]
            """
            for b, w in zip(self.biases, self.weights):
                # [30, 784] @ [784, 1]=> [30, 1] + [30, 1] => [30, 1]
                # [10, 30] @ [30, 1] + [10, 1] => [10, 1]
                z = np.dot(w, x) + b
                # [30, 1]
                # [10, 1]
                x = sigmoid(z)
            return x

        def train(self, training_data, epochs, batchsz, lr, test_data=None):
            """
            Train the neural network using mini-batch stochastic gradient descent.
            The ``training_data`` is a list of tuples
            ``(x, y)`` representing the training inputs and the desired
            outputs. The other non-optional parameters are self-explanatory. If
            ``test_data`` is provided then the network will be evaluated against
            the test data after each epoch, and partial progress printed out.
            This is useful for tracking progress, but slows things down substantially.
            """
            if test_data:
                n_test = len(test_data)

            n = len(training_data)
            for j in range(epochs):
                random.shuffle(training_data)

                mini_batches = [training_data[k:k+batchsz] for k in range(0, n, batchsz)]

                # for every (x,y)
                for mini_batch in mini_batches:
                    loss = self.update_mini_batch(mini_batch, lr)          # 返回的有损失值！
                print("Epoch {0}: ".format(j), 'loss: ', loss)
                if test_data:
                    # print("Epoch {0}: {1} / {2}".format(j, self.evaluate(test_data), n_test), 'loss: ', loss)
                    print('test_acc:', self.evaluate(test_data)/n_test)
                else:
                    print("Epoch {0} complete".format(j) )



        def update_mini_batch(self, batch, lr):
            """
            Update the network's weights and biases by applying
            gradient descent using backpropagation to a single mini batch.
            The ``mini_batch`` is a list of tuples ``(x, y)``, and ``eta``
            is the learning rate.
            """
            nabla_b = [np.zeros(b.shape) for b in self.biases]
            nabla_w = [np.zeros(w.shape) for w in self.weights]
            loss = 0  # 损失值

            # for every sample in current batch
            for x, y in batch:
                # list of every w, b gradient
                delta_nabla_b, delta_nabla_w, loss_ = self.backprop(x, y)  # 得到当前的梯度值

                # 就比如是：[w1, w2, w3]这个是一个样本的，多样本的时候我们应该吧对应位置的累加起来求一个平均值。
                nabla_b = [accu + cur for accu, cur in zip(nabla_b, delta_nabla_b)]
                nabla_w = [accu + cur for accu, cur in zip(nabla_w, delta_nabla_w)] # cur当前的，accu为之前的；进行对应位置累加。
                loss += loss_                   #损失值


            # 求平均值梯度值w, b ,这个除是点除,因为前面累加也是相应位置进行累加。
            nabla_w  = [w / len(batch) for w in nabla_w]
            nabla_b  = [b / len(batch) for b in nabla_b]


            # 使用SGD随机梯度下降算法进行更新权值w偏置b
            # w = w - lr * nabla_w
            self.weights = [w - lr * nabla for w, nabla in zip(self.weights, nabla_w)]
            self.biases = [b - lr * nabla  for b, nabla in zip(self.biases, nabla_b)]

            loss = loss / len(batch)  # 损失值

            return loss


        def backprop(self, x, y):
            """
            Return a tuple ``(nabla_b, nabla_w)`` representing the
            gradient for the cost function C_x.  ``nabla_b`` and
            ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
            to ``self.biases`` and ``self.weights``.
            :param x: [1, 784]
            :param y: [1, 10], one_hot encoding
            :return:
            """
            nabla_b = [np.zeros(b.shape) for b in self.biases]
            nabla_w = [np.zeros(w.shape) for w in self.weights]

            # 1. forward
            # 为什么反向传播过程中仍然需要forward, 因为我们需要在forward中记录每层z，activation变量，方便我们以后计算梯度。
            # 但是为什么还有一个单独的forward过程，因为单独forward方便我们以后的预测。因为做测试的时候不需要backword的。
            activation = x
            activations = [x]   # list to store all the activations, layer by layer
            # w*x = z => sigmoid => x/activation
            zs = []             # list to store all the z vectors, layer by layer

            for b, w in zip(self.biases, self.weights):
                # https://stackoverflow.com/questions/34142485/difference-between-numpy-dot-and-python-3-5-matrix-multiplication
                # np.dot vs np.matmul = @ vs element-wise *
                z = np.dot(w, activation) + b
                zs.append(z)
                activation = sigmoid(z)
                activations.append(activation)

            # 损失函数的值
            loss = np.power(activation[-1] - y, 2).sum()

            # 2. backward pass
            # (Ok-tk)*(1-Ok)*Ok      参考公式;倒数第一层

            # 2.1 compute gradient on output layer  首先输出层计算梯度。
            # [10, 1] * [10, 1] => [10, 1]
            #  和下面这2个都可以的：  delta = self.cost_derivative(activations[-1], y) *  sigmoid_prime(zs[-1])
            delta = activations[-1] * (1 - activations[-1]) * (activations[-1] - y)

            nabla_b[-1] = delta
            # delta: [10, 1]
            # activations[-2]: [30, 1]
            # [10, 1] @ [1, 30] => [10, 30]
            nabla_w[-1] = np.dot(delta, activations[-2].transpose())

            # Note that the variable l in the loop below is used a little
            # differently to the notation in Chapter 2 of the book.  Here,
            # l = 1 means the last layer of neurons, l = 2 is the
            # second-last layer, and so on.  It's a renumbering of the
            # scheme in the book, used here to take advantage of the fact
            # that Python can use negative indices in lists.

            # 2.2 compute hidden gradient

            for l in range(2, self.num_layers):
                # [30, 1]
                z = zs[-l]
                sp = sigmoid_prime(z)

                # delta_j的公式
                # [10, 30].T @ [10, 1] => [30, 10] @ [10, 1] => [30, 1] * [30, 1] => [30, 1]
                delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp # 公式
                nabla_b[-l] = delta
                # [30, 1] @ [784, 1].T => [30, 784]
                nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose()) # 矩阵相乘。

            return (nabla_b, nabla_w, loss)


        def evaluate(self, test_data):
            """
            Return the number of test inputs for which the neural
            network outputs the correct result. Note that the neural
            network's output is assumed to be the index of whichever
            neuron in the final layer has the highest activation.

            :param test_data: list of [x, y]
            :return:
            """
            # x, y不加括号也是一样。
            test_results = [(np.argmax(self.forward(x)), y) for (x, y) in test_data]
            correct = sum(int(pred == y) for pred, y in test_results)

            return correct

def main():
    import mnist_loader
    # Loading the MNIST data
    training_data, validation_data, test_data = mnist_loader.load_data_wrapper()

    print(len(training_data), training_data[0][0].shape, training_data[0][1].shape)
    print(len(test_data), test_data[0][0].shape, test_data[0][1].shape)
    print(test_data[0][1])

    # Set up a Network with 30 hidden neurons
    net = Network([784, 30, 10])
    # Use stochastic gradient descent to learn from the MNIST training_data over
    # 30 epochs, with a mini-batch size of 10, and a learning rate of η = 3.0
    net.train(training_data, 200, 10, 0.1, test_data=test_data)

if __name__ == '__main__':
    main()

3.2. 运行结果

C:\anaconda3\envs\tf2\python.exe F:/Codes/MyCodes/TF2/TF2_5/network.py
50000 (784, 1) (10, 1)
10000 (784, 1) ()
7
Epoch 0:  loss:  1.132647628860319
test_acc: 0.5828
Epoch 1:  loss:  1.0578827241319213
test_acc: 0.7225
Epoch 2:  loss:  1.6204362221682012
test_acc: 0.7645
Epoch 3:  loss:  1.3278571954708824
test_acc: 0.7891
Epoch 4:  loss:  1.6318255509937707
test_acc: 0.8311
Epoch 5:  loss:  1.6581321629577968
test_acc: 0.8484
Epoch 6:  loss:  1.6513287523876141
test_acc: 0.8592
Epoch 7:  loss:  0.9853115796188012
test_acc: 0.8671
Epoch 8:  loss:  1.1087704554589501
test_acc: 0.8699
Epoch 9:  loss:  1.6918591610322473
test_acc: 0.8791
Epoch 10:  loss:  0.9863359375641604
test_acc: 0.8821
Epoch 11:  loss:  0.9854092322320076
test_acc: 0.8865
Epoch 12:  loss:  0.984694141702656
test_acc: 0.8907
Epoch 13:  loss:  0.9775137267023272
test_acc: 0.8946
Epoch 14:  loss:  1.5003901316492476
test_acc: 0.8978
Epoch 15:  loss:  1.7716915861120133
test_acc: 0.9002
Epoch 16:  loss:  1.0165926944686068
test_acc: 0.9012
Epoch 17:  loss:  0.985803526565471
test_acc: 0.9039
Epoch 18:  loss:  1.6971768157806548
test_acc: 0.9065
Epoch 19:  loss:  0.9932158550708612
test_acc: 0.9073
Epoch 20:  loss:  0.989865123852802
test_acc: 0.9088
Epoch 21:  loss:  0.9858300586350227
test_acc: 0.9102
Epoch 22:  loss:  1.1675667016805211
test_acc: 0.9115
Epoch 23:  loss:  0.9921924034474985
test_acc: 0.9133
Epoch 24:  loss:  2.48563165228654
test_acc: 0.9147
Epoch 25:  loss:  1.7609695632099929
test_acc: 0.9166
Epoch 26:  loss:  0.980228512719132
test_acc: 0.9171
Epoch 27:  loss:  1.6757957528312144
test_acc: 0.9176
Epoch 28:  loss:  2.367184001904443
test_acc: 0.9186
Epoch 29:  loss:  1.0870255098965413
test_acc: 0.9192
Epoch 30:  loss:  1.6871390324527216
test_acc: 0.9193
Epoch 31:  loss:  2.4114307525624237
test_acc: 0.9215
Epoch 32:  loss:  1.6436507728417369
test_acc: 0.9217
Epoch 33:  loss:  1.209697855929342
test_acc: 0.9215
Epoch 34:  loss:  2.2354208734594954
test_acc: 0.9242
Epoch 35:  loss:  0.992876644264711
test_acc: 0.9244
Epoch 36:  loss:  1.727974529783943
test_acc: 0.9257
Epoch 37:  loss:  3.173422974998758
test_acc: 0.9254
Epoch 38:  loss:  0.994057385711074
test_acc: 0.9261
Epoch 39:  loss:  1.7685019726265438
test_acc: 0.9269
Epoch 40:  loss:  1.787343318726783
test_acc: 0.9276
Epoch 41:  loss:  1.5580545168915454
test_acc: 0.928
Epoch 42:  loss:  1.4557322177150884
test_acc: 0.9282
Epoch 43:  loss:  2.502288173796844
test_acc: 0.9283
Epoch 44:  loss:  0.964226023719922
test_acc: 0.9293
Epoch 45:  loss:  1.7725552048287896
test_acc: 0.9297
Epoch 46:  loss:  0.9862273389471732
test_acc: 0.9304
Epoch 47:  loss:  1.0343887314213098
test_acc: 0.9304
Epoch 48:  loss:  0.9942537595848329
test_acc: 0.9306
Epoch 49:  loss:  0.9815037856368498
test_acc: 0.9308
Epoch 50:  loss:  1.697253825212717
test_acc: 0.9315
Epoch 51:  loss:  0.9939517622373302
test_acc: 0.931
Epoch 52:  loss:  0.9876758511512463
test_acc: 0.9312
Epoch 53:  loss:  2.070249040476458
test_acc: 0.9322
Epoch 54:  loss:  0.9923401161590604
test_acc: 0.9316
Epoch 55:  loss:  1.7551223402423441
test_acc: 0.9328
Epoch 56:  loss:  2.4961317876156377
test_acc: 0.9323
Epoch 57:  loss:  0.98441512223147
test_acc: 0.933
Epoch 58:  loss:  2.0594283118501258
test_acc: 0.9334
Epoch 59:  loss:  0.9830295547366568
test_acc: 0.9322
Epoch 60:  loss:  0.9831873996719361
test_acc: 0.9333
Epoch 61:  loss:  3.801473280847201
test_acc: 0.9334

标签：acc,loss,training,22,TensorFlow2.0,Epoch,BP,test,data
From： https://blog.51cto.com/u_15866474/5829899