下面是 Python 代码。就同样的随机数据,分别在单精度、双精度下做模拟训练与预测,最后比较它们预测的值,发现不一致。
大家看看,代码是否有问题?
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers.legacy import SGD
import random
import os
# 固定所有随机种子
np.random.seed(42)
tf.random.set_seed(42)
random.seed(42)
# 禁用 GPU 的非确定性行为
tf.config.experimental.enable_op_determinism()
# 确保TensorFlow使用单线程
tf.config.threading.set_inter_op_parallelism_threads(1)
tf.config.threading.set_intra_op_parallelism_threads(1)
# 确保NumPy使用单线程
os.environ['OMP_NUM_THREADS'] = '1'
# 确保TensorFlow使用CPU进行计算
os.environ['CUDA_VISIBLE_DEVICES'] = ''
# 生成模拟数据
def generate_data(num_samples, sequence_length):
int_x = np.random.randint(0, 10, size=(num_samples, sequence_length))
x = int_x.astype(np.float64) # 生成双精度数据
y = np.zeros((num_samples, sequence_length), dtype=np.float64) # 生成双精度数据
for i in range(num_samples):
for j in range(sequence_length):
if int_x[i][j] == 1:
y[i][j] = 1
if j + 1 < sequence_length:
y[i][j + 1] = 1
return x, y
# 超参数设置
num_samples = 10000
sequence_length = 10
batch_size = 32
epochs = 10
learning_rate = 0.001
# 构建模型
def build_model(dtype):
tf.keras.backend.clear_session()
tf.random.set_seed(42)
model = Sequential()
model.add(LSTM(128, input_shape=(sequence_length, 1), return_sequences=True, dtype=dtype,
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=42),
recurrent_initializer=tf.keras.initializers.Orthogonal(seed=42),
bias_initializer=tf.keras.initializers.Zeros()))
model.add(Dense(1, activation='sigmoid', dtype=dtype,
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=42),
bias_initializer=tf.keras.initializers.Zeros()))
return model
# 使用确定性的优化器
optimizer = SGD(learning_rate=learning_rate, momentum=0.0, nesterov=False)
# 准备数据
x, y = generate_data(num_samples, sequence_length)
x_float64 = x.reshape(num_samples, sequence_length, 1)
y_float64 = y.reshape(num_samples, sequence_length, 1)
# 双精度训练
model_float64 = build_model(tf.float64)
model_float64.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
history_float64 = model_float64.fit(x_float64, y_float64, batch_size=batch_size, epochs=epochs, verbose=1)
predictions_float64 = model_float64.predict(x_float64)
# 单精度训练
x_float32 = x_float64.astype(np.float32)
y_float32 = y_float64.astype(np.float32)
model_float32 = build_model(tf.float32)
model_float32.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
history_float32 = model_float32.fit(x_float32, y_float32, batch_size=batch_size, epochs=epochs, verbose=1)
predictions_float32 = model_float32.predict(x_float32)
# 比较预测结果
print("First few elements of third column in double precision predictions:")
print(predictions_float64.flatten()[:5])
print("First few elements of third column in single precision predictions:")
print(predictions_float32.flatten()[:5])
# 检查是否完全相同
if np.allclose(predictions_float64, predictions_float32, atol=1e-7):
print("Predictions are consistent between double and single precision.")
else:
print("Predictions are not consistent between double and single precision.")
运行后输出:
点评:这段代码有问题吗?若没问题,那么最后结果误差比较大:单精度与双精度的结果的有效数字中,只有1位或2位相同数字。这意味着什么呢?错误的词向量。
假设大模型给出的正确答案是100个字(即大模型要一个接一个地吐出100个字),若第20个字的计算有一点点误差,那么第20个字就错了,第21个就更错了,第22个就更更错了,...,所以后面的80个字就全错了(错的一塌糊涂,即一本正经的胡说八道)。本来是笔直的一条路,可是在第20个字那儿,稍微拐了一下,后果就是越到后面,偏的越厉害。
标签:float64,sequence,单双,length,精度,tf,幻觉,model,float32 From: https://blog.csdn.net/zaim1/article/details/143422690