实验二：逻辑回归算法实现与测试

标签：逻辑训练 pred 算法 train 测试 test sklearn

一、实验目的

深入理解对数几率回归（即逻辑回归的）的算法原理，能够使用 Python 语言实现对数

几率回归的训练与测试，并且使用五折交叉验证算法进行模型训练与评估。

二、实验内容

（1）从 scikit-learn 库中加载 iris 数据集，使用留出法留出 1/3 的样本作为测试集（注

意同分布取样）；

（2）使用训练集训练对数几率回归（逻辑回归）分类算法；

（3）使用五折交叉验证对模型性能（准确度、精度、召回率和 F1 值）进行评估和选

择；

（4）使用测试集，测试模型的性能，对测试结果进行分析，完成实验报告中实验二的

部分。

三、算法步骤、代码、及结果

1. 算法伪代码

# 输入：训练数据 (X_train, y_train)，学习率 (alpha)，迭代次数 (num_iterations)

# 输出：训练好的模型参数 w 和 b

# 1. 初始化参数 w 和 b

w = 0 # 权重初始化为 0

b = 0 # 偏置初始化为 0

# 2. 设置学习率和迭代次数

alpha = 0.01 # 学习率

num_iterations = 1000 # 最大迭代次数

# 3. 定义 Sigmoid 函数

function sigmoid(z):

return 1 / (1 + exp(-z))

# 4. 定义损失函数：Log-Loss (交叉熵损失)

function compute_loss(X, y, w, b):

m = len(y) # 样本数量

cost = 0

for i = 1 to m:

z = dot_product(w, X[i]) + b # 计算预测值

prediction = sigmoid(z) # 计算 Sigmoid 输出

cost += -y[i] * log(prediction) - (1 - y[i]) * log(1 - prediction)

return cost / m # 返回平均损失

# 5. 训练模型：梯度下降法

for i = 1 to num_iterations:

# 计算模型的预测值

m = len(X_train) # 样本数量

dw = 0 # 权重的梯度

db = 0 # 偏置的梯度

# 计算梯度（损失函数关于 w 和 b 的导数）

for i = 1 to m:

z = dot_product(w, X_train[i]) + b # 计算预测值

prediction = sigmoid(z) # 计算 Sigmoid 输出

# 计算梯度

dw += (prediction - y_train[i]) * X_train[i]

db += prediction - y_train[i]

# 更新参数

w -= alpha * dw / m # 更新权重

b -= alpha * db / m # 更新偏置

# 每迭代 100 次输出一次损失值

if i % 100 == 0:

loss = compute_loss(X_train, y_train, w, b)

print("Iteration", i, "loss:", loss)

# 6. 返回训练好的模型参数

return w, b

2. 算法主要代码

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import StratifiedKFold

# 1. 加载Iris数据集
data = load_iris()
X = data.data
y = data.target

# 2. 留出法：将数据分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)

# 3. 使用逻辑回归算法训练模型
log_reg = LogisticRegression(max_iter=200)
log_reg.fit(X_train, y_train)

# 4. 使用五折交叉验证评估模型
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cross_val_scores = cross_val_score(log_reg, X_train, y_train, cv=kfold, scoring='accuracy')

# 输出五折交叉验证的平均准确率
print(f'五折交叉验证的平均准确率: {cross_val_scores.mean():.4f}')

# 计算其他评估指标（精度、召回率、F1分数）
y_train_pred = log_reg.predict(X_train)
train_report = classification_report(y_train, y_train_pred)
print("训练集评估报告:\n", train_report)

# 5. 使用测试集评估模型
y_test_pred = log_reg.predict(X_test)

# 计算测试集的性能
test_report = classification_report(y_test, y_test_pred)
print("测试集评估报告:\n", test_report)

# 生成混淆矩阵
conf_matrix = confusion_matrix(y_test, y_test_pred)
print("混淆矩阵:\n", conf_matrix)

# 提供准确度（accuracy）作为最终模型评估的一个关键指标
test_accuracy = np.mean(y_test == y_test_pred)
print(f'测试集准确率: {test_accuracy:.4f}')

（2）调用库方法

1、numpy库：np.mean()用于计算输入数组的平均值，用于计算模型的准确度。

2、pandas库

3、sklearn.datasets库：load_iris()用于加载Iris 数据集sklearn.model_selection库：train_test_split（）用于将数据集划分为训练集和测试集。参数有X：特征数据；y：目标标签；test_size=0.33：设置测试集的大小为 33%；random_state=42：设置随机种子以保证结果可复现；stratify=y：按照目标标签 y 的分布来划分数据，确保训练集和测试集中的标签分布一致。cross_val_score()用于计算交叉验证的得分（如准确度）。参数有log_reg：使用的模型（逻辑回归模型）；X_train：训练集的特征数据；y_train：训练集的目标标签；cv=kfold：交叉验证折数和分层设置，使用之前定义的 StratifiedKFold；scoring='accuracy'：评估指标为准确度。

4、StratifiedKFold()库：参数有n_splits=5：指定交叉验证折数为 5；shuffle=True：是否对数据进行打乱；random_state=42：确保结果可复现。

5、sklearn.linear_model库：LogisticRegression(max_iter=200)参数有max_iter=200：指定最大迭代次数为 200。

6、sklearn.metrics库：classification_report()有参数y_train 或 y_test：实际的标签；y_train_pred 或 y_test_pred：预测的标签；confusion_matrix()有参数y_test：实际标签；y_test_pred：预测标签。

3. 训练结果截图（包括：准确率、精度（查准率）、召回率（查全率）、F1）

1. 测试结果截图（包括：准确率、精度（查准率）、召回率（查全率）、F1）

标签：逻辑,训练,pred,算法,train,测试,test,sklearn
From： https://www.cnblogs.com/yuanxinglan/p/18551677

实验二：逻辑回归算法实现与测试

相关文章

赞助商

阅读排行