首页 > 编程语言 >深度学习算法原理实现——模型欠拟合和过拟合处理

深度学习算法原理实现——模型欠拟合和过拟合处理

时间:2023-11-17 15:05:45浏览次数:37  
标签:layers loss plt val 拟合 算法 train 深度 model

欠拟合:

from tensorflow.keras import regularizers
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import imdb
from tensorflow.keras.datasets import mnist


def plot_val_loss_and_acc(model):
    import matplotlib.pyplot as plt
    val_loss = model.history["val_loss"]
    epochs = range(1, 21)
    plt.plot(epochs, val_loss, "b--",
            label="Validation loss")
    plt.xlabel("Epochs")
    plt.ylabel("Loss")
    plt.legend()

    val_acc = model.history["val_accuracy"]
    epochs = range(1, 21)
    plt.plot(epochs, val_acc, "b-",
            label="Validation accuracy")
    plt.title("validation loss and accuracy")
    plt.xlabel("Epochs")
    plt.ylabel("Accuracy/Loss")
    plt.legend()
    plt.show()


(train_images, train_labels), _ = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
model = keras.Sequential([layers.Dense(10, activation="softmax")])
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
history_small_model = model.fit(
    train_images, train_labels,
    epochs=20,
    batch_size=128,
    validation_split=0.2)
plot_val_loss_and_acc(history_small_model)

  

深度学习算法原理实现——模型欠拟合和过拟合处理_过拟合

python深度学习这本书里提到的欠拟合现象和解决思路:

 

深度学习算法原理实现——模型欠拟合和过拟合处理_tensorflow_02

ok,有了思路,我们改进下:

from tensorflow.keras import regularizers
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import imdb
from tensorflow.keras.datasets import mnist


def plot_val_loss_and_acc(model):
    import matplotlib.pyplot as plt
    val_loss = model.history["val_loss"]
    epochs = range(1, 21)
    plt.plot(epochs, val_loss, "b--",
            label="Validation loss")
    plt.xlabel("Epochs")
    plt.ylabel("Loss")
    plt.legend()

    val_acc = model.history["val_accuracy"]
    epochs = range(1, 21)
    plt.plot(epochs, val_acc, "b-",
            label="Validation accuracy")
    plt.title("validation loss and accuracy")
    plt.xlabel("Epochs")
    plt.ylabel("Accuracy/Loss")
    plt.legend()
    plt.show()


(train_images, train_labels), _ = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
model = keras.Sequential([layers.Dense(10, activation="softmax")])
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
history_small_model = model.fit(
    train_images, train_labels,
    epochs=20,
    batch_size=128,
    validation_split=0.2)
plot_val_loss_and_acc(history_small_model)

model = keras.Sequential([
    layers.Dense(128, activation="relu"),
    layers.Dense(128, activation="relu"),
    layers.Dense(10, activation="softmax"),
])
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
history_large_model = model.fit(
    train_images, train_labels,
    epochs=20,
    batch_size=128,
    validation_split=0.2)
plot_val_loss_and_acc(history_large_model)

  

看到有过拟合的迹象了!

深度学习算法原理实现——模型欠拟合和过拟合处理_tensorflow_03

 

 接下来,我们看看L1/L2正则化和 dropout处理过拟合:

from tensorflow.keras import regularizers
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import imdb
from tensorflow.keras.datasets import mnist


def plot_val_loss_and_acc(model):
    import matplotlib.pyplot as plt
    val_loss = model.history["val_loss"]
    epochs = range(1, 21)
    plt.plot(epochs, val_loss, "b--",
            label="Validation loss")
    plt.xlabel("Epochs")
    plt.ylabel("Loss")
    plt.legend()

    val_acc = model.history["val_accuracy"]
    epochs = range(1, 21)
    plt.plot(epochs, val_acc, "b-",
            label="Validation accuracy")
    plt.title("validation loss and accuracy")
    plt.xlabel("Epochs")
    plt.ylabel("Accuracy/Loss")
    plt.legend()
    plt.show()


# (train_images, train_labels), _ = mnist.load_data()
# train_images = train_images.reshape((60000, 28 * 28))
# train_images = train_images.astype("float32") / 255
# model = keras.Sequential([layers.Dense(10, activation="softmax")])
# model.compile(optimizer="rmsprop",
#               loss="sparse_categorical_crossentropy",
#               metrics=["accuracy"])
# history_small_model = model.fit(
#     train_images, train_labels,
#     epochs=20,
#     batch_size=128,
#     validation_split=0.2)
# plot_val_loss_and_acc(history_small_model)

# model = keras.Sequential([
#     layers.Dense(128, activation="relu"),
#     layers.Dense(128, activation="relu"),
#     layers.Dense(10, activation="softmax"),
# ])
# model.compile(optimizer="rmsprop",
#               loss="sparse_categorical_crossentropy",
#               metrics=["accuracy"])
# history_large_model = model.fit(
#     train_images, train_labels,
#     epochs=20,
#     batch_size=128,
#     validation_split=0.2)
# plot_val_loss_and_acc(history_large_model)

# L1/L2 and dropout
############################################################################

(train_data, train_labels), _ = imdb.load_data(num_words=10000)

def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results
train_data = vectorize_sequences(train_data)

# small model !!! 
model = keras.Sequential([
    layers.Dense(4, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
history_small_original = model.fit(train_data, train_labels,
                             epochs=20, batch_size=512, validation_split=0.4)
plot_val_loss_and_acc(history_small_original)


# we need more complex or large model, but overfitted!!!
model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dense(16, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
history_original = model.fit(train_data, train_labels,
                             epochs=20, batch_size=512, validation_split=0.4)

plot_val_loss_and_acc(history_original)

"""
Version of the model with lower capacity
model = keras.Sequential([
    layers.Dense(4, activation="relu"),
    layers.Dense(4, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
history_smaller_model = model.fit(
    train_data, train_labels,
    epochs=20, batch_size=512, validation_split=0.4)

Version of the model with higher capacity    
model = keras.Sequential([
    layers.Dense(512, activation="relu"),
    layers.Dense(512, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
history_larger_model = model.fit(
    train_data, train_labels,
    epochs=20, batch_size=512, validation_split=0.4)
"""

### Adding L2 weight regularization to the model
model = keras.Sequential([
    layers.Dense(16,
                 kernel_regularizer=regularizers.l2(0.002),
                 activation="relu"),
    layers.Dense(16,
                 kernel_regularizer=regularizers.l2(0.002),
                 activation="relu"),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
history_l2_reg = model.fit(
    train_data, train_labels,
    epochs=20, batch_size=512, validation_split=0.4)

plot_val_loss_and_acc(history_l2_reg)

# from tensorflow.keras import regularizers
# regularizers.l1(0.001)
#  regularizers.l1_l2(l1=0.001, l2=0.001)

model = keras.Sequential([
    layers.Dense(16, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(16, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(1, activation="sigmoid")
])
model.compile(optimizer="rmsprop",
              loss="binary_crossentropy",
              metrics=["accuracy"])
history_dropout = model.fit(
    train_data, train_labels,
    epochs=20, batch_size=512, validation_split=0.4)
plot_val_loss_and_acc(history_dropout)

 

原始的欠拟合模型:

深度学习算法原理实现——模型欠拟合和过拟合处理_拟合_04

过拟合的模型:

深度学习算法原理实现——模型欠拟合和过拟合处理_tensorflow_05

 

加入正则化 后的:

深度学习算法原理实现——模型欠拟合和过拟合处理_tensorflow_06

 

加入dropout后的:

深度学习算法原理实现——模型欠拟合和过拟合处理_过拟合_07

 

深度学习算法原理实现——模型欠拟合和过拟合处理_拟合_08

 

 

 

 

  

 



标签:layers,loss,plt,val,拟合,算法,train,深度,model
From: https://blog.51cto.com/u_11908275/8447588

相关文章

  • 深度学习算法原理实现——线性分类器
     importnumpyasnpimporttensorflowastfimportmatplotlib.pyplotaspltdefmodel(inputs):returntf.matmul(inputs,W)+bdefsquare_loss(targets,predictions):per_sample_losses=tf.square(targets-predictions)returntf.reduce_mean......
  • 深度学习算法原理实现——自写神经网络识别mnist手写数字和训练模型
    代码来自:https://weread.qq.com/web/reader/33f32c90813ab71c6g018fffkd3d322001ad3d9446802347《python深度学习》fromtensorflow.keras.datasetsimportmnistfromtensorflow.kerasimportoptimizersimporttensorflowastfimportnumpyasnpclassNaiveDense:d......
  • 算法学习笔记(1):CDQ分治
    CDQ分治对比普通分治把一种问题划分成不同子问题,递归处理子问题内部的答案,再考虑合并子问题的答案。再看CDQ分治有广泛的应用,但我不会。但在接下来的题目体会到大概:将可能产生的对于答案的贡献分为两类:\(f(l,mid)\)与\(f(mid+1,r)\)内部产生的贡献,其贡献已在递......
  • 字符串哈希算法
    一、字符串哈希:将一串字符串映射成一个整数,并用它来代替字符串进行比较。这样俩个字符串的比较就变成俩个整数的比较,可以将时间复杂度减少至O(1)二、哈希函数:为了将字符串转化为整数,需要一个哈希函数hash,使得以下条件成立:如果字符串s==t那么hash(s)==hash(t)。一般情况下采......
  • 数组类算法题——合并非递减数组
    合并非递减数组题目:给你两个按非递减顺序排列的整数数组nums1和nums2,另有两个整数m和n,分别表示nums1和nums2中的元素数目。请你合并nums2到nums1中,使合并后的数组同样按非递减顺序排列。注意:最终,合并后数组不应由函数返回,而是存储在数组nums1中。为了应......
  • 区间树上查找所有与给定区间相交的区间-算法复杂度正确性证明
    区间树是在平衡树上维护的数据结构,按照左端点大小排序。详见《算法导论》。算法设计思路红黑树的拓展在红黑树上维护结点属性\(min,max\):\(min\)表示该结点及其所有后代结点中的区间低端的最小值。\(max\)表示该结点及其所有后代结点中的区间高端的最大值。在插入时,对结点......
  • Dijkstra算法
    Dijkstra算法1.算法基本介绍Dijkstra算法是一个基于「贪心」、「广度优先搜索」、「动态规划」求一个图中一个点到其他所有点的最短路径的算法,时间复杂度O(n2)。Dijkstra算法通常是求解单源最短路中最快的算法,但它无法处理存在负权边的情况(原因在正确性证明中)。Dijkstra本质......
  • 最小生成树(Kruskal和Prim算法)
    最小生成树(Kruskal和Prim算法)部分资料来源于:最小生成树(Kruskal算法)_kruskal算法求最小生成树-CSDN博客、【算法】最小生成树——Prim和Kruskal算法-CSDN博客关于图的几个概念定义:连通图:在无向图中,若任意两个顶点vi与vj都有路径相通,则称该无向图为连通图。强连通图:在有向图中,若......
  • 蓝桥杯第三周算法竞赛D题&&E题
    发现更多计算机知识,欢迎访问Cr不是铬的个人网站D迷宫逃脱拿到题目一眼应该就能看出是可以用动态规划来解决。但是怎么定义dp呢?这个题增加难度的点就在当所在位置与下一个要去的位置互质的时候,会消耗一把钥匙。当没有钥匙的时候就不能移动了。想到这里,我们可以定义一个三维的d......
  • 算法总结
    贪心算法解决问题:最优化问题;优点:是解决最优化问题的最优策略,时间复杂度低;缺点:要满足局部最优解可以推出全局最优解,这意味着在考场上想出一个贪心策略需要通过举例以及证明。常见思考方式:如果是决定谁先做谁后做的,类比排队问题,邻项交换;如果先后有限制关系,比如谁先做......