RNN 循环神经网络 tensorflow keras

标签：plt RNN keras train onehot tensorflow model id 输入

RNN 循环神经网络，转自 https://blog.csdn.net/weixin_46969441/article/details/121584330

循环神经网络

循环核：参数时间共享，循环层提取时间信息。

下图是一个记忆体：存储每个时刻状态的信息
设定记忆体个数
改变记忆体容量

当记忆体个数被指定，输入 x，输出 y 被指定。

记忆体当前时刻存储信息为 Ht
等于 tanh(当前时刻输入特征Xt 乘以矩阵 Wxh 加上一时刻记忆体存储状态Ht-1 乘以矩阵 Whh 加上偏置项Bh)

Yt是当前状态的输出特征

可以理解为一个全连接的神经网络，输出最终的结果。

总共三个参数 Wxh, Whh, Why 需要更新。

将循环核部分进行展开

沿的是时间轴的方向展开的。

前向传播是更新记忆体的状态（记忆体内存储的状态信息 ht 在每个时刻都被刷新），而三个参数矩阵 wxh、whh、why 和两个偏置项 bh 和 by 自始至终都是固定不变的。
反向传播是更新三个参数 Why Whh Wxh（三个参数和偏置项有梯度下降法更新）。

RNN每个时刻的节点都可能有一个输出，所以 RNN 的总损失为所有时刻（或部分时刻）上的损失和。

循环计算层数

1个循环核就是 1层
层数是沿着输出方向增加的.

如下图所示，
一层就一个记忆体
两层就两个记忆体
三层就三个记忆体

代码实现循环核

最后一个循环核，设置为 return_sequences = False
中间核的循环层，设置 return_sequences = True, 每个时间步都把 Ht 输出给下一层
return_sequences = True 各个时间步都输出 ht
return_sequences = False 仅最后时间步输出ht

下图结果是 return_sequences = False 的示意图

举个例子：
SimpleRNN（3，return_sequences=True）
表示：三个循环核，只在最后一个循环核输出Ht
循环神网络的输入必须是三维的

三个维度分别如下：
第一个维度：送入样本数。
第二个维度：循环核展开步数。
第三个维度：时间步输入特征个数。

下图是一个例子：
要送入两组数据
每组数据要经过一个时间步，得到输出结果
每个时间步输入特征数值为 3
该循环网络的输入维度就是：【2，1，3】

第二个例子：
1组数据
四个时间步
每组数据特征数为 2
该循环网络的输入维度就是：【1，4，2】

举例子理解神经网络循环计算过程
字符预测

规则：

a->b
b->c
c->d
d->e
e->a

将五个字母使用数字，通过独热码表示

随机生产三个W参数 Wxh， Whh, Why

x 维度 1*5

Wxh 维度 5*3

ht-1 1*3

Whh 3*3

bh 1*3

记忆体为 Ht

网络结构示意图

最开始时候，记忆体状态初始化为 [0.0, 0.0, 0.0]

开始计算
这样脑中的记忆就因为当前的输入，产生了更新，新的记忆体出来了。

然后，需要进入输出网络，输出 yt
把提取到的时间信息，通过全连接进行识别预测。
整个网络的输出层计算如下：

代码部分（RNN实现字母输出）

#------------------------
# 导包
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os
#------------------------
# 送入字符
# 五个字母 abcde
input_word = "abcde"
#------------------------
# 转换成数字表示字母

# 把字母用数字替代
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4} # 单词映射到数值id的词典
# 转换成独热码表示
id_to_onehot = {0: [1., 0., 0., 0., 0.],
               1: [0., 1., 0., 0., 0.],
               2: [0., 0., 1., 0., 0.],
               3: [0., 0., 0., 1., 0.],
                4: [0., 0., 0., 0., 1.]} # id编码为one-hot

#------------------------
#生成训练集
x_train = [ id_to_onehot[w_to_id['a']],
           id_to_onehot[w_to_id['b']],
           id_to_onehot[w_to_id['c']],
            id_to_onehot[w_to_id['d']],
            id_to_onehot[w_to_id['e']]]

y_train = [w_to_id['b'],
           w_to_id['c'],
           w_to_id['d'],
           w_to_id['e'],
           w_to_id['a']] ## 为何不和x_train 一样采用onehot？

#------------------------
# 注意：
#输入特征a 对应标签b
#输入特征b 对应标签c
#输入特征c 对应标签d
#输入特征d 对应标签e
#输入特征e 对应标签a
#------------------------
# 打乱顺序, x_train 和 y_train 需要同步打乱， np.random.seed(7) 都是7保证一样
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)
#------------------------
将输入数据形状变为 RNN网络输入的形状
[送入样本数，循环核时间展开步数，每个时间步输入特征个数]。
送入样本数：x_train 的 len。此例为5
循环核时间展开步数：输入一个字母，就直接展开输出，所以为1。
每个时间步输入特征个数：这里用了 one_hot 编码，所以是5个。
#------------------------
# 使 x_train 符合 SimpleRNN 输入要求：[送入样本数，循环核时间展开步数，每个时间步输入特征个数]。
# 此处整个数据集送入，送入样本数为 len(x_train)；
# 输入1个字母出结果，循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5。
x_train = np.reshape(x_train, (len(x_train), 1, 5))
y_train = np.array(y_train)
#------------------------

如果处理句子的话：
第一个维度表示输入的句子数目。
第二个维度表示每次句子包含的词向量个数（需要统一）。怎么做到统一？
第三个维度就是每个词向量的维度，也要维度一致。
#------------------------
# 搭建网络模型
model = tf.keras.Sequential([
SimpleRNN(3), # 这里设置记忆体的个数，记忆体个数越多，占用资源越多，记忆力越好
Dense(5, activation='softmax') # 独热码 5个字母，映射为5 # 这层是全连接层
])

区分循环核时间展开步数和记忆体个数

循环核时间展开步数举例子理解：
这个例子需要输入四个字母，才能预测，那么我们就需要四个时间步。
比如：输入abcd，预测e；输入bcde，预测a；输入cdea，预测b；输入deab，预测c；输入eabc，预测d。

下图红色部分就是记忆体

#------------------------
# 配置模型参数
model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

#------------------------
设置模型保存
和
断点继续导入旧模型

model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

checkpoint_save_path = "./checkpoint/rnn_onehot_1pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss') # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

#------------------------
# 训练模型
history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

#------------------------
# 查看网络结构
model.summary()

#------------------------
保存参数
# print(model.trainable_variables)
file = open('./weights.txt', 'w') # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

#------------------------
# 绘制loss 和 acc
# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

#------------------------
进行预测
输入需要执行几次预测任务
等待输入字母
将字母转换为独热码
reshape为RNN的输入形状

preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [id_to_onehot[w_to_id[alphabet1]]]
    # 使alphabet符合SimpleRNN输入要求：[送入样本数，循环核时间展开步数，每个时间步输入特征个数]。此处验证效果送入了1个样本，送入样本数为1；输入1个字母出结果，所以循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
    alphabet = np.reshape(alphabet, (1, 1, 5))
    result = model.predict([alphabet])
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

全部代码 tensorflow 2.2.0 + numpy 1.19.2 环境可以运行

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os

input_word = "abcde"
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4} # 单词映射到数值id的词典
id_to_onehot = {0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
                4: [0., 0., 0., 0., 1.]} # id编码为one-hot

x_train = [id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']],
           id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']]]
y_train = [w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e'], w_to_id['a']]

np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

# 使x_train符合SimpleRNN输入要求：[送入样本数，循环核时间展开步数，每个时间步输入特征个数]。
# 此处整个数据集送入，送入样本数为len(x_train)；输入1个字母出结果，循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
x_train = np.reshape(x_train, (len(x_train), 1, 5))
y_train = np.array(y_train)

model = tf.keras.Sequential([
    SimpleRNN(3),
    Dense(5, activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

checkpoint_save_path = "./checkpoint/rnn_onehot_1pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss') # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

model.summary()

# print(model.trainable_variables)
file = open('./weights.txt', 'w') # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

###############################################    show   ###############################################

# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

############### predict #############

preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [id_to_onehot[w_to_id[alphabet1]]]
    # 使alphabet符合SimpleRNN输入要求：[送入样本数，循环核时间展开步数，每个时间步输入特征个数]。此处验证效果送入了1个样本，送入样本数为1；输入1个字母出结果，所以循环核时间展开步数为1; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
    alphabet = np.reshape(alphabet, (1, 1, 5))
    result = model.predict([alphabet])
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
simple_rnn (SimpleRNN)       multiple                  27        
_________________________________________________________________
dense (Dense)                multiple                  20        
=================================================================
Total params: 47
Trainable params: 47
Non-trainable params: 0
_________________________________________________________________

例子改进：

变为连续输入四个字母，预判下一个字母的输出情况的可能性

代码部分：
导包

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN
import matplotlib.pyplot as plt
import os

传入数据，并用数字表示 y_train
one_hot 处理 x_train

input_word = "abcde"
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4} # 单词映射到数值id的词典
id_to_onehot = {0: [1., 0., 0., 0., 0.], 1: [0., 1., 0., 0., 0.], 2: [0., 0., 1., 0., 0.], 3: [0., 0., 0., 1., 0.],
                4: [0., 0., 0., 0., 1.]} # id编码为one-hot

创建训练集
x_train = [
    [id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']], id_to_onehot[w_to_id['d']]],
    [id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']], id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']]],
    [id_to_onehot[w_to_id['c']], id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']], id_to_onehot[w_to_id['a']]],
    [id_to_onehot[w_to_id['d']], id_to_onehot[w_to_id['e']], id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']]],
    [id_to_onehot[w_to_id['e']], id_to_onehot[w_to_id['a']], id_to_onehot[w_to_id['b']], id_to_onehot[w_to_id['c']]],
]
y_train = [w_to_id['e'], w_to_id['a'], w_to_id['b'], w_to_id['c'], w_to_id['d']]

打乱训练集
np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

将训练集 reshape成RNN输入维度
# 使x_train符合SimpleRNN输入要求：[送入样本数，循环核时间展开步数，每个时间步输入特征个数]。
# 此处整个数据集送入，送入样本数为len(x_train)；输入4个字母出结果，循环核时间展开步数为4; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
x_train = np.reshape(x_train, (len(x_train), 4, 5))
y_train = np.array(y_train)


搭建网络结构
model = tf.keras.Sequential([
    SimpleRNN(3),
    Dense(5, activation='softmax')
])

给网络结构配置参数
model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

设置模型保存
checkpoint_save_path = "./checkpoint/rnn_onehot_4pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss') # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

训练模型
history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

查看网络结构
model.summary()

创建txt用于保存模型训练的参数
file = open('./weights.txt', 'w') # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

打印出模型的loss和acc曲线
# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

进行预测
preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [id_to_onehot[w_to_id[a]] for a in alphabet1]
    # 使alphabet符合SimpleRNN输入要求：[送入样本数，循环核时间展开步数，每个时间步输入特征个数]。此处验证效果送入了1个样本，送入样本数为1；输入4个字母出结果，所以循环核时间展开步数为4; 表示为独热码有5个输入特征，每个时间步输入特征个数为5
    alphabet = np.reshape(alphabet, (1, 4, 5)) # 数据为1个，时间步为4，特征为5
    result = model.predict([alphabet])
    pred = tf.argmax(result, axis=1) # 选出可能性最大的作为输出
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

Embedding 新的编码方法（不同于one_hot编码）

出现原因：独热码位宽要和词汇量保存一致。
词汇量大的话，导致资源浪费。

独热码:数据量大过于稀疏，映射之间是独立的，没有表现出关联性

Embedding:是一种单词编码方法，用低维向量实现了编码，这种编码通过神经网络训练优化，能表达出单词间的相关性。

tf.keras.layers.Embedding(词汇表大小，编码维度)
词汇表大小：编码表示多少单词。
编码维度：几个数字表示一个单词。

tf.keras.layers.Embedding(100,3)
表示编码100个单词， 3个数字表示一个单词。
对1-100进行编码，[4]编码为[0.25，0.1，0.11]

数据送入Embedding时，数据必须是二维的
【送入样本数，循环核时间展开步数】

前面不改变的部分：

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN, Embedding
import matplotlib.pyplot as plt
import os

input_word = "abcde"
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4} # 单词映射到数值id的词典

x_train = [w_to_id['a'], w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e']] ## 0, 1, 2, 3, 4
y_train = [w_to_id['b'], w_to_id['c'], w_to_id['d'], w_to_id['e'], w_to_id['a']] ## 1, 2, 3, 4, 0

np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

将上面预测字母的案例，one_hot部分修改为Embedding部分：

# 使x_train符合Embedding输入要求：[送入样本数，循环核时间展开步数] ，
# 此处整个数据集送入所以送入，送入样本数为len(x_train)；输入1个字母出结果，循环核时间展开步数为1。
x_train = np.reshape(x_train, (len(x_train), 1))
y_train = np.array(y_train)

reshape 训练集 x_train 部分，
第一个参数：len(x_train) 是送入样本个数。这是当前样本集的个数，5个
第二个参数：1 表示循环核时间展开步数，意思就是一个输入，才会有一个输出。

model = tf.keras.Sequential([
    Embedding(5, 2), # Embedding层，对输入数据进行编码。生成5行，2列的可训练参数矩阵
    SimpleRNN(3),
    Dense(5, activation='softmax')
])

注意：Embedding层只能作为模型的第一层

关于embedding通俗的理解：

首先是得理解什么是 one_hot编码：
eg: 假设一个句子有10个字，每个字都刚好不一样，那么用字母0-9替代
就是如下
我从哪里来要到何处去
0 1 2 3 4 5 6 7 8 9
转变其为 one_hot编码就是
# 我从哪里来，要到何处去
[
[1 0 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 0 1]

但是如果对于一篇文章来说，假设其有100w条句子，这篇文章里面有10w个不同的字，
那是不是得写成 10w X 100w, 就是 10w 行， 100w 列的超大矩阵。
这样的话，太浪费空间了。

因此选择将其进行矩阵转换，即改变维度。

就比如说，这是一个2 x 6 的矩阵通过和一个6 x 3的矩阵相乘，它就可以变成一个2x3的矩阵。
这不就形成一个降维，说白了，就是将特征进行合并了，就跟 1x1维度的向量对卷积层的降维一样。

那么为什么要降维呢？
这是作者写的实在是太好了！

这是两张兔子的图片，对比其，找出不同的地方。

当我们距离图片 1 米远的时候，我们更容易一眼就看出图中间有个红色的爱心是不同处之一。
当我距离0.5米，会发现，右上角，省略号不同。
当我们距离25cm时候，发现，耳朵有一只是不同的。
当我们距离更近一些，我们发现，兔子的脸上，也是有一些不同的。
再近一些，发现右边，天空，的白云不同。

总结：
距离的远近会影响我们的观察效果。
同理也是一样的，低维的数据可能包含的特征是非常笼统的，
通过不停地拉近拉远来改变感受野，让我们对这幅图有不同的观察点，找出不同之处。

embedding 不仅仅是降低数据的维度，它还可以对数据进行升维。
对低维的数据进行升维时，可能把一些其他特征给放大了，或者把笼统的特征给分开了。
比如：通过来回靠近和远离屏幕，发现45厘米是最佳观测点，这个距离能10秒就把5个不同点找出来了。

当然这也是 CNN层数越深准确率越高，卷积层卷了又卷，池化层池了又升，升了又降，全连接层连了又连。
因为我们也不知道它什么时候突然就学到了某个有用特征。
但是不管怎样，学习都是好事，所以让机器多卷一卷，多连一连，反正错了多少我会用交叉熵告诉你，怎么做才是对的我会用梯度下降算法告诉你，只要给你时间，你迟早会学懂。
因此，理论上，只要层数深，只要参数足够，NN能拟合任何特征。

剩余一样的部分：

model.compile(optimizer=tf.keras.optimizers.Adam(0.01),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

checkpoint_save_path = "./checkpoint/run_embedding_1pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss') # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

model.summary()

# print(model.trainable_variables)
file = open('./weights.txt', 'w') # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

进行预测的时候，注意需要对预测的数据进行 reshape一下
分别是送入样本数，和循环核时间的展开步数都要写出来。

preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [w_to_id[alphabet1]]
    # 使alphabet符合Embedding输入要求：[送入样本数，循环核时间展开步数]。
    # 此处验证效果送入了1个样本，送入样本数为1；输入1个字母出结果，循环核时间展开步数为1。
    alphabet = np.reshape(alphabet, (1, 1))
    result = model.predict(alphabet)
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])

案例： embedding 实现多个字母输入，预测一个字母的情况

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, SimpleRNN, Embedding
import matplotlib.pyplot as plt
import os

设置输入
将字母转换成数字

input_word = "abcdefghijklmnopqrstuvwxyz"
w_to_id = {'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4,
           'f': 5, 'g': 6, 'h': 7, 'i': 8, 'j': 9,
           'k': 10, 'l': 11, 'm': 12, 'n': 13, 'o': 14,
           'p': 15, 'q': 16, 'r': 17, 's': 18, 't': 19,
           'u': 20, 'v': 21, 'w': 22, 'x': 23, 'y': 24, 'z': 25} # 单词映射到数值id的词典

training_set_scaled = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
                       11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
                       21, 22, 23, 24, 25]


建立两个 list 用于存储训练用的数据集

x_train = []
y_train = []

通过for 循环，每四个数作为输入特征添加到 x_train 中，
第五个数字，作为标签，添加到 y_train中。

for i in range(4, 26):
    x_train.append(training_set_scaled[i - 4:i])
    y_train.append(training_set_scaled[i])

同步打乱训练集标签和特征的顺序

np.random.seed(7)
np.random.shuffle(x_train)
np.random.seed(7)
np.random.shuffle(y_train)
tf.random.set_seed(7)

将输入特征变为 Embedding 期待输入的形状
第一个维度：送入样本的数量，这里为整个数据集，是22个，len(x_train)
第二个维度：循环核时间的展开步数，连续四个输入才会有一个输出。所以为 4—>1。

# 使x_train符合Embedding输入要求：[送入样本数，循环核时间展开步数] ，
# 此处整个数据集送入所以送入，送入样本数为len(x_train)；输入4个字母出结果，循环核时间展开步数为4。
x_train = np.reshap e(x_train, (len(x_train), 4))
y_train = np.array(y_train)

搭建模型
model = tf.keras.Sequential([
    Embedding(26, 2),    # 26 表示词汇量26，2表示每个单词用两个数值编码
    SimpleRNN(10), # 10个记忆体的循环层
    Dense(26, activation='softmax')   #全连接层实现输出的计算
])

设置模型保存
checkpoint_save_path = "./checkpoint/rnn_embedding_4pre1.ckpt"

if os.path.exists(checkpoint_save_path + '.index'):
    print('-------------load the model-----------------')
    model.load_weights(checkpoint_save_path)

cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,
                                                 save_weights_only=True,
                                                 save_best_only=True,
                                                 monitor='loss') # 由于fit没有给出测试集，不计算测试集准确率，根据loss，保存最优模型

训练模型
history = model.fit(x_train, y_train, batch_size=32, epochs=100, callbacks=[cp_callback])

查看网络结构
model.summary()

将训练参数保存到txt文件内
file = open('./weights.txt', 'w') # 参数提取
for v in model.trainable_variables:
    file.write(str(v.name) + '\n')
    file.write(str(v.shape) + '\n')
    file.write(str(v.numpy()) + '\n')
file.close()

绘制 loss 和 acc曲线
# 显示训练集和验证集的acc和loss曲线
acc = history.history['sparse_categorical_accuracy']
loss = history.history['loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='Training Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='Training Loss')
plt.title('Training Loss')
plt.legend()
plt.show()

输入数据进行预测
preNum = int(input("input the number of test alphabet:"))
for i in range(preNum):
    alphabet1 = input("input test alphabet:")
    alphabet = [w_to_id[a] for a in alphabet1]
    # 使alphabet符合Embedding输入要求：[送入样本数，时间展开步数]。
    # 此处验证效果送入了1个样本，送入样本数为1；输入4个字母出结果，循环核时间展开步数为4。
    alphabet = np.reshape(alphabet, (1, 4))
    result = model.predict([alphabet])
    pred = tf.argmax(result, axis=1)
    pred = int(pred)
    tf.print(alphabet1 + '->' + input_word[pred])


references:
https://blog.csdn.net/weixin_42078618/article/details/82999906
————————————————
原文链接：https://blog.csdn.net/weixin_46969441/article/details/121584330

REF

https://blog.csdn.net/weixin_46969441/article/details/121584330

标签：plt,RNN,keras,train,onehot,tensorflow,model,id,输入
From： https://www.cnblogs.com/emanlee/p/17125060.html

RNN 循环神经网络 tensorflow keras

循环神经网络

将循环核部分进行展开

循环计算层数

代码实现循环核

区分循环核时间展开步数和记忆体个数

例子改进：

Embedding 新的编码方法（不同于one_hot编码）

案例： embedding 实现多个字母输入，预测一个字母的情况

相关文章

赞助商

阅读排行

RNN 循环神经网络 tensorflow keras

循环神经网络

将循环核部分进行展开

循环计算层数

代码实现循环核

区分 循环核时间展开步数 和 记忆体个数

例子改进：

Embedding 新的编码方法（不同于one_hot编码）

案例： embedding 实现 多个字母输入，预测一个字母的情况

相关文章

赞助商

阅读排行

区分循环核时间展开步数和记忆体个数

案例： embedding 实现多个字母输入，预测一个字母的情况