首页 > 其他分享 >医学数据分析实训 项目七 继承学习--空气质量指标--天气质量分析和预测

医学数据分析实训 项目七 继承学习--空气质量指标--天气质量分析和预测

时间:2024-09-18 15:21:58浏览次数:10  
标签:-- quality test optimized 实训 pred print 空气质量 aqi

项目七:集成学习

实践目的
  1. 理解集成学习算法原理;
  2. 熟悉并掌握常用集成学习算法的使用方法;
  3. 熟悉模型性能评估的方法;
  4. 掌握模型优化的方法。
实践平台
  • 操作系统:Windows7及以上
  • Python版本:3.8.x及以上
  • 集成开发环境:PyCharm或Anoconda
实践内容

数据集文件名为“aqi.csv”,包含了2020年全国空气质量数据,该数据集主要记录了2020年1月至2020年9月的空气质量指标,包括日期、AQI、质量等级、PM2.5含量(ppm)、PM10含量(ppm)、SO2含量(ppm)、CO含量(ppm)、NO2含量(ppm)、O3_8h含量(ppm)等字段。

本项目实践所涉及的业务为天气质量分析和预测。将数据分为训练集和测试集,通过集成学习建立算法模型预测AQI值和质量等级。

(一)数据理解及准备
  1. 导入本案例所需的Python包;
  2. 通过describe()、info()方法、shape属性等对读入的数据对象进行探索性分析。
  3. 结合实际数据情况,对数据集进行适当的预处理;
  4. 提取用于数据分析的特征,并划分训练集和测试集。
(二)模型建立、预测及优化
任务一:随机森林
  1. 回归模型

    • 通过RandomForestRegressor()方法建立模型并训练;
    • 使用该模型预测AQI值;
    • 使用评价指标对模型进行评价,包括平方绝对误差(MAE)、均方误差(MSE)、均方根误差(RMSE)、平方绝对百分比误差(MAPE)、回归系数score;
    • 使用GridSearchCV网格搜索函数对模型进行优化,并通过best_params_属性返回性能最好的参数组合;
    • 根据以上参数对模型进行优化,并输出新模型的平方绝对误差(MAE)、均方误差(MSE)、均方根误差(RMSE)、平方绝对百分比误差(MAPE)、回归系数score评价指标,与优化前的指标进行对比;
    • 使用feature_importances_属性输出模型每个特征的重要度,并按重要程度进行排序;
    • 使用优化后的模型进行预测,并输出预测结果;
    • 可视化展示预测值和测试值的对比情况。
  2. 分类模型

    • 通过RandomForestClassifier()方法建立模型并训练;
    • 使用该模型预测空气质量等级;
    • 使用confusion_matrix()、accuracy_scorer()、precision_score()、recall_score()、f1_score()方法分别对模型的混淆矩阵、准确率、精确率、召回率、f1值指标进行评价,并输出评价结果;
    • 如评价结果不理想需对模型进行优化。
任务二:梯度提升机 (GBM)
  1. 回归模型

    • 通过GradientBoostingRegressor()方法建立模型并训练;
    • 使用该模型预测AQI值;
    • 使用评价指标对模型进行评价,包括平方绝对误差(MAE)、均方误差(MSE)、均方根误差(RMSE)、平方绝对百分比误差(MAPE)、回归系数score;
    • 使用GridSearchCV网格搜索函数对模型进行优化,并通过best_params_属性返回性能最好的参数组合;
    • 根据以上参数对模型进行优化,并输出新模型的平方绝对误差(MAE)、均方误差(MSE)、均方根误差(RMSE)、平方绝对百分比误差(MAPE)、回归系数score评价指标,与优化前的指标进行对比;
    • 使用feature_importances_属性输出模型每个特征的重要度,并按重要程度进行排序;
    • 使用优化后的模型进行预测,并输出预测结果;
    • 可视化展示预测值和测试值的对比情况。
  2. 分类模型

    • 通过GradientBoostingClassifier()方法建立模型并训练;
    • 使用该模型预测空气质量等级;
    • 使用confusion_matrix()、accuracy_scorer()、precision_score()、recall_score()、f1_score()方法分别对模型的混淆矩阵、准确率、精确率、召回率、f1值指标进行评价,并输出评价结果;
    • 如评价结果不理想需对模型进行优化。
任务三:轻量级梯度提升机 (LightGBM)
  1. 回归模型

    • 通过LGBMRegressor()方法建立模型并训练;
    • 使用该模型预测AQI值;
    • 使用评价指标对模型进行评价,并输出评价结果;
    • 如评价结果不理想需对模型进行优化。
  2. 分类模型

    • 通过LGBMClassifier()方法建立模型并训练;
    • 使用该模型预测空气质量等级;
    • 使用评价指标对模型进行评价,并输出评价结果;
    • 如评价结果不理想需对模型进行优化。

(一)数据理解及准备

# 导入必要的库
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, GradientBoostingClassifier
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error, r2_score, confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
import matplotlib.pyplot as plt
import lightgbm as lgb

# 读取数据
data = pd.read_csv('output/modified_data.csv')


# 显示数据基本信息
print("数据信息:")
print(data.info())
print("\n数据描述:")
print(data.describe())
print("\n数据形状:", data.shape)
# 检查并处理缺失值
if data.isnull().sum().sum() > 0:
    # 可以选择填充缺失值或删除含有缺失值的行
    # 这里简单地用列的平均值填充
    data.fillna(data.mean(), inplace=True)

# 转换日期格式
data['Date'] = pd.to_datetime(data['Date'])
print(data.head)

# 特征提取
features = ['PM2_5_(ppm)', 'PM10_(ppm)', 'SO2_(ppm)', 'CO_(ppm)', 'NO2_(ppm)', 'O3_8h_(ppm)']
target_aqi = 'AQI'
target_quality = 'Quality_Level'

# 划分训练集和测试集
X = data[features]
y_aqi = data[target_aqi]
y_quality = data[target_quality]

X_train, X_test, y_aqi_train, y_aqi_test, y_quality_train, y_quality_test = train_test_split(X, y_aqi, y_quality, test_size=0.2, random_state=42)

(二)模型建立、预测及优化

任务一:随机森林
# 1 建立随机森林回归模型 训练模型
rf_reg = RandomForestRegressor(random_state=42)
rf_reg.fit(X_train, y_aqi_train)

# 2 预测 AQI 值
y_aqi_pred = rf_reg.predict(X_test)
print('随机森林回归模型预测 AQI 值:', y_aqi_pred)
# 3 计算评估指标
mae = mean_absolute_error(y_aqi_test, y_aqi_pred)
mse = mean_squared_error(y_aqi_test, y_aqi_pred)
rmse = np.sqrt(mse)
mape = mean_absolute_percentage_error(y_aqi_test, y_aqi_pred)
r2 = r2_score(y_aqi_test, y_aqi_pred)
print("随机森林回归模型评价指标:")
print(f'MAE: {mae}, \nMSE: {mse}, \nRMSE: {rmse}, \nMAPE: {mape}, \nR2_SCORE: {r2}')

随机森林回归模型预测 AQI 值: [124.48 81.15 72.38 71.58 45.77 82.31 34.6 31.42 80.58 83.7
47.59 74.32 75.95 47.13 39.53 33.33 45.76 75.11 80.87 42.57
39.87 58.52 44.34 45.51 60.06 40.73 51.15 45.06 51.46 43.2
70.71 37.2 127.29 31.26 86.79 43.56 90.83 66. 111.21 80.26
33.47 53.14 47.4 130.66 73.89 47.37 47.58 47.16 66.56 39.78
44.36 115.1 105.81 110.77 74.06]
随机森林回归模型评价指标:
MAE: 3.0150909090909086,
MSE: 25.965259999999997,
RMSE: 5.095611837650116,
MAPE: 0.05528859263399927,
R2_SCORE: 0.965415994168552

# 4. 使用 GridSearchCV 网格搜索函数对模型进行优化
# 定义参数网格
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# 创建 GridSearchCV 对象
grid_search = GridSearchCV(estimator=rf_reg, param_grid=param_grid, cv=5, scoring='neg_mean_squared_error')
# 进行网格搜索
grid_search.fit(X_train, y_aqi_train)
# 获取最佳参数组合
best_params = grid_search.best_params_
print(f'最佳参数组合: {best_params}')

最佳参数组合: {‘max_depth’: 20, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘n_estimators’: 100}

# 5. 根据最佳参数重新训练模型
best_rf_reg = RandomForestRegressor(**best_params, random_state=42)
best_rf_reg.fit(X_train, y_aqi_train)

# 预测并评价优化后的模型
y_aqi_pred_optimized = best_rf_reg.predict(X_test)
print('优化后的随机森林回归模型预测 AQI 值:', y_aqi_pred_optimized)
mae_optimized = mean_absolute_error(y_aqi_test, y_aqi_pred_optimized)
mse_optimized = mean_squared_error(y_aqi_test, y_aqi_pred_optimized)
rmse_optimized = np.sqrt(mse_optimized)
mape_optimized = mean_absolute_percentage_error(y_aqi_test, y_aqi_pred_optimized)
r2_optimized = r2_score(y_aqi_test, y_aqi_pred_optimized)
    
print("优化后的随机森林回归模型评价指标:")
print(f'MAE: {mae_optimized}, \nMSE: {mse_optimized}, \nRMSE: {rmse_optimized}, \nMAPE: {mape_optimized}, \nR2_SCORE: {r2_optimized}')


优化后的随机森林回归模型预测 AQI 值: [124.48 81.15 72.38 71.58 45.77 82.31 34.6 31.42 80.58 83.7
47.59 74.32 75.95 47.13 39.53 33.33 45.76 75.11 80.87 42.57
39.87 58.52 44.34 45.51 60.06 40.73 51.15 45.06 51.46 43.2
70.71 37.2 127.29 31.26 86.79 43.56 90.83 66. 111.21 80.26
33.47 53.14 47.4 130.66 73.89 47.37 47.58 47.16 66.56 39.78
44.36 115.1 105.81 110.77 74.06]
优化后的随机森林回归模型评价指标:
MAE: 3.0150909090909086,
MSE: 25.965259999999997,
RMSE: 5.095611837650116,
MAPE: 0.05528859263399927,
R2_SCORE: 0.965415994168552

# 比较优化前后的指标
print("优化前后指标对比:")
print(f"优化前: MAE: {mae}, MSE: {mse}, RMSE: {rmse}, MAPE: {mape}, R2_SCORE: {r2}")
print(f"优化后: MAE: {mae_optimized}, MSE: {mse_optimized}, RMSE: {rmse_optimized}, MAPE: {mape_optimized}, R2_SCORE: {r2_optimized}")

优化前后指标对比:
优化前: MAE: 3.0150909090909086, MSE: 25.965259999999997, RMSE: 5.095611837650116, MAPE: 0.05528859263399927, R2_SCORE: 0.965415994168552
优化后: MAE: 3.0150909090909086, MSE: 25.965259999999997, RMSE: 5.095611837650116, MAPE: 0.05528859263399927, R2_SCORE: 0.965415994168552

未优化成功

# 6. 使用feature_importances_属性输出模型每个特征的重要度 
# 特征重要度
importances = best_rf_reg.feature_importances_
feature_importances = pd.Series(importances, index=features).sort_values(ascending=False)
# 7. 输出预测结果
print(feature_importances)

# 8. 可视化展示预测值和测试值的对比情况
plt.figure(figsize=(10, 6))
plt.scatter(y_aqi_test, y_aqi_pred_optimized, alpha=0.5)
plt.xlabel('Actual AQI')
plt.ylabel('Predicted AQI')
plt.title('Actual vs Predicted AQI')
plt.show()

PM10_(ppm) 0.400419
PM2_5_(ppm) 0.291729
O3_8h_(ppm) 0.288429
CO_(ppm) 0.008184
NO2_(ppm) 0.007934
SO2_(ppm) 0.003305
dtype: float64

在这里插入图片描述

任务二:GBM
回归模型
# 1. 通过 GradientBoostingRegressor()方法建立模型并训练
gb_reg = GradientBoostingRegressor(random_state=42)
gb_reg.fit(X_train, y_aqi_train)
# 2. 使用该模型预测 AQI 值
y_aqi_pred = gb_reg.predict(X_test)
print('GBM回归模型预测 AQI 值:', y_aqi_pred)

# 3. 使用评价指标对模型进行评价
mae = mean_absolute_error(y_aqi_test, y_aqi_pred)
mse = mean_squared_error(y_aqi_test, y_aqi_pred)
rmse = np.sqrt(mse)
mape = mean_absolute_percentage_error(y_aqi_test, y_aqi_pred)
r2 = r2_score(y_aqi_test, y_aqi_pred)
print("Gradient Boosting Regression Model Evaluation Metrics:")
print(f'MAE: {mae}, \nMSE: {mse}, \nRMSE: {rmse}, \nMAPE: {mape}, \nR2_SCORE: {r2}')


GBM回归模型预测 AQI 值: [122.57416247 83.36233285 73.90280417 71.61249735 45.90407098
83.09407824 35.38809475 32.1115523 81.92797541 83.40916295
48.82405535 74.28270394 74.96495747 45.69629863 39.59354642
33.09971192 45.41896268 75.52727318 81.71507209 47.02496198
41.96486507 59.76085878 45.10753769 46.1912337 59.05166283
49.05189862 53.29885368 47.58476507 46.59894793 42.17298408
70.67172663 35.57436497 130.76443134 33.12142879 85.93142525
41.04272972 88.25804535 64.42863259 112.47587802 80.12500147
32.96123373 55.09504267 50.37469809 125.99062665 75.72767345
48.10707457 51.29551088 47.94867709 70.66198919 40.51320902
40.7250176 115.95276244 114.3584965 112.04106305 74.86570745]
Gradient Boosting Regression Model Evaluation Metrics:
MAE: 3.017067274506405,
MSE: 19.567961603563685,
RMSE: 4.4235688763218874,
MAPE: 0.058486439950287586,
R2_SCORE: 0.9739367717401174

# 4. 使用 GridSearchCV 网格搜索函数对模型进行优化
# 定义参数网格
param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 5, 10],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}
# 创建 GridSearchCV 对象
grid_search = GridSearchCV(estimator=gb_reg, param_grid=param_grid, cv=5, scoring='neg_mean_squared_error')
# 进行网格搜索
grid_search.fit(X_train, y_aqi_train)

# 获取最佳参数组合
best_params = grid_search.best_params_
print(f'Best Parameters: {best_params}')

Best Parameters: {‘learning_rate’: 0.1, ‘max_depth’: 5, ‘min_samples_leaf’: 1, ‘min_samples_split’: 10, ‘n_estimators’: 300}

# 5. 根据最佳参数重新训练模型
best_gb_reg = GradientBoostingRegressor(**best_params, random_state=42)
best_gb_reg.fit(X_train, y_aqi_train)

# 使用优化后的模型进行预测
y_aqi_pred_optimized = best_gb_reg.predict(X_test)
print('优化后的模型预测结果:', y_aqi_pred_optimized)
# 计算优化后的模型评估指标
mae_optimized = mean_absolute_error(y_aqi_test, y_aqi_pred_optimized)
mse_optimized = mean_squared_error(y_aqi_test, y_aqi_pred_optimized)
rmse_optimized = np.sqrt(mse_optimized)
mape_optimized = mean_absolute_percentage_error(y_aqi_test, y_aqi_pred_optimized)
r2_optimized = r2_score(y_aqi_test, y_aqi_pred_optimized)

print("优化梯度增强回归模型评价指标:")
print(f'Optimized MAE: {mae_optimized}, \nOptimized MSE: {mse_optimized}, \nOptimized RMSE: {rmse_optimized}, \nOptimized MAPE: {mape_optimized}, \nOptimized R2_SCORE: {r2_optimized}')


优化后的模型预测结果: [124.50273685 83.46470773 76.67313626 71.43908717 46.06087546
82.48270218 35.45781481 30.29347664 80.68483493 83.63975494
48.01910073 75.04391558 74.6780025 44.02048381 39.16875902
31.57326064 45.52152266 74.54621085 81.98742113 41.15229431
40.05067005 60.0349372 43.40693783 42.44777993 60.0874834
46.4533299 53.98613726 45.00781228 51.56679542 38.97574632
73.97473389 36.03646256 131.65412729 30.82872235 86.88627133
44.17166092 89.64827072 66.71578258 112.06193027 80.82544043
32.13607404 53.33558888 48.52689834 125.55765644 77.38396113
48.52990476 51.07272122 48.89955218 69.66154718 40.70715896
49.21862157 117.74301294 107.39395475 111.89285961 75.11097803]
优化梯度增强回归模型评价指标:
Optimized MAE: 2.7372135954667853,
Optimized MSE: 20.880137541908642,
Optimized RMSE: 4.569478913608054,
Optimized MAPE: 0.048316013543643156,
Optimized R2_SCORE: 0.9721890403365572

# 比较优化前后的指标
print("优化前后评价指标的比较:")
print(f"优化前: MAE: {mae}, MSE: {mse}, RMSE: {rmse}, MAPE: {mape}, R2_SCORE: {r2}")
print(f"优化后: MAE: {mae_optimized}, MSE: {mse_optimized}, RMSE: {rmse_optimized}, MAPE: {mape_optimized}, R2_SCORE: {r2_optimized}")

优化前后评价指标的比较:
优化前: MAE: 3.017067274506405, MSE: 19.567961603563685, RMSE: 4.4235688763218874, MAPE: 0.058486439950287586, R2_SCORE: 0.9739367717401174
优化后: MAE: 2.7372135954667853, MSE: 20.880137541908642, RMSE: 4.569478913608054, MAPE: 0.048316013543643156, R2_SCORE: 0.9721890403365572

# 6. 输出特征重要性
importances = best_gb_reg.feature_importances_
feature_importances = pd.Series(importances, index=features).sort_values(ascending=False)
print(feature_importances)

# 可视化预测值和测试值的对比
plt.figure(figsize=(10, 6))
plt.scatter(y_aqi_test, y_aqi_pred_optimized, alpha=0.5)
plt.xlabel('Actual AQI')
plt.ylabel('Predicted AQI')
plt.title('Actual vs Predicted AQI')
plt.show()

PM10_(ppm) 0.422780
O3_8h_(ppm) 0.303769
PM2_5_(ppm) 0.265263
NO2_(ppm) 0.006753
SO2_(ppm) 0.001192
CO_(ppm) 0.000243
dtype: float64

在这里插入图片描述

分类模型
# 1 建立模型并训练
gbm_clf = GradientBoostingClassifier(random_state=42)
gbm_clf.fit(X_train, y_quality_train)

# 2 预测空气质量等级
y_quality_pred = gbm_clf.predict(X_test)
print('GBM分类模型预测结果:', y_quality_pred)
# 3 评价模型
conf_matrix = confusion_matrix(y_quality_test, y_quality_pred)
accuracy = accuracy_score(y_quality_test, y_quality_pred)
precision = precision_score(y_quality_test, y_quality_pred, average='weighted')
recall = recall_score(y_quality_test, y_quality_pred, average='weighted')
f1 = f1_score(y_quality_test, y_quality_pred, average='weighted')

print(f'Confusion Matrix:\n{conf_matrix}')
print(f'Accuracy: {accuracy}, \nPrecision: {precision}, \nRecall: {recall}, \nF1 Score: {f1}')

GBM分类模型预测结果: [‘C’ ‘B’ ‘B’ ‘B’ ‘A’ ‘B’ ‘A’ ‘A’ ‘B’ ‘B’ ‘A’ ‘B’ ‘B’ ‘A’ ‘A’ ‘A’ ‘A’ ‘B’
‘B’ ‘A’ ‘A’ ‘B’ ‘A’ ‘A’ ‘B’ ‘B’ ‘B’ ‘B’ ‘B’ ‘A’ ‘B’ ‘A’ ‘C’ ‘A’ ‘B’ ‘A’
‘B’ ‘B’ ‘C’ ‘B’ ‘A’ ‘B’ ‘B’ ‘C’ ‘B’ ‘A’ ‘A’ ‘B’ ‘B’ ‘A’ ‘B’ ‘C’ ‘C’ ‘C’
‘B’]
Confusion Matrix:
[[20 3 0]
[ 0 25 0]
[ 0 0 7]]
Accuracy: 0.9454545454545454,
Precision: 0.9512987012987013,
Recall: 0.9454545454545454,
F1 Score: 0.9450955363197574


# 4. 对模型进行优化
# 定义参数网格
param_grid = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 5, 10],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# 创建 StratifiedKFold 对象
# stratified_kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# 创建 GridSearchCV 对象
grid_search = GridSearchCV(estimator=gbm_clf, param_grid=param_grid, cv=2, scoring='accuracy')
# 进行网格搜索
grid_search.fit(X_train, y_quality_train)

# 获取最佳参数组合
best_params = grid_search.best_params_
print(f'Best Parameters: {best_params}')

Best Parameters: {‘learning_rate’: 0.1, ‘max_depth’: 5, ‘min_samples_leaf’: 4, ‘min_samples_split’: 2, ‘n_estimators’: 200}

# 5. 根据最佳参数重新训练模型
best_gbm_clf = GradientBoostingClassifier(**best_params, random_state=42)
best_gbm_clf.fit(X_train, y_quality_train)

# 6. 使用优化后的模型进行预测
y_quality_pred_optimized = best_gbm_clf.predict(X_test)
print('优化后的模型预测空气API结果:', y_quality_pred_optimized)
# 7. 计算优化后的模型评估指标
conf_matrix_optimized = confusion_matrix(y_quality_test, y_quality_pred_optimized)
accuracy_optimized = accuracy_score(y_quality_test, y_quality_pred_optimized)
precision_optimized = precision_score(y_quality_test, y_quality_pred_optimized, average='weighted')
recall_optimized = recall_score(y_quality_test, y_quality_pred_optimized, average='weighted')
f1_optimized = f1_score(y_quality_test, y_quality_pred_optimized, average='weighted')

print(f'Optimized Confusion Matrix:\n{conf_matrix_optimized}')
print(f'Optimized Accuracy: {accuracy_optimized}, \nOptimized Precision: {precision_optimized}, \nOptimized Recall: {recall_optimized}, \nOptimized F1 Score: {f1_optimized}')

# 比较前后的指标
print("优化前后评价指标的比较:\n")
print(f"优化前: Accuracy: {accuracy}, Precision: {precision}, Recall: {recall}, F1 Score: {f1}")
print(f"优化后: Accuracy: {accuracy_optimized}, Precision: {precision_optimized}, Recall: {recall_optimized}, F1 Score: {f1_optimized}")

优化后的模型预测空气API结果: [‘C’ ‘B’ ‘B’ ‘B’ ‘A’ ‘B’ ‘A’ ‘A’ ‘B’ ‘B’ ‘A’ ‘B’ ‘B’ ‘A’ ‘A’ ‘A’ ‘A’ ‘B’
‘B’ ‘A’ ‘A’ ‘B’ ‘A’ ‘A’ ‘B’ ‘B’ ‘B’ ‘A’ ‘A’ ‘A’ ‘B’ ‘A’ ‘C’ ‘A’ ‘B’ ‘A’
‘B’ ‘B’ ‘C’ ‘B’ ‘A’ ‘B’ ‘A’ ‘C’ ‘B’ ‘A’ ‘A’ ‘B’ ‘B’ ‘A’ ‘B’ ‘C’ ‘C’ ‘C’
‘B’]
Optimized Confusion Matrix:
[[22 1 0]
[ 1 24 0]
[ 0 0 7]]
Optimized Accuracy: 0.9636363636363636,
Optimized Precision: 0.9636363636363636,
Optimized Recall: 0.9636363636363636,
Optimized F1 Score: 0.9636363636363636
优化前后评价指标的比较:

优化前: Accuracy: 0.9454545454545454, Precision: 0.9512987012987013, Recall: 0.9454545454545454, F1 Score: 0.9450955363197574
优化后: Accuracy: 0.9636363636363636, Precision: 0.9636363636363636, Recall: 0.9636363636363636, F1 Score: 0.9636363636363636

任务三:LIGHTGBM
# 1. 使用 LGBMRegressor() 方法建立回归模型并训练
# 1.1 使用 LGBMRegressor() 方法建立回归模型并训练
lgb_reg = lgb.LGBMRegressor(random_state=42)
lgb_reg.fit(X_train, y_aqi_train)

# 1.2 使用该模型预测 AQI 值
y_aqi_pred = lgb_reg.predict(X_test)

# 1.3 对模型进行评价
mse = mean_squared_error(y_aqi_test, y_aqi_pred)
r2 = r2_score(y_aqi_test, y_aqi_pred)

print("LGBMRegressor Model Evaluation Metrics:")
print(f'Mean Squared Error (MSE): {mse}')
print(f'R^2 Score: {r2}')

LGBMRegressor Model Evaluation Metrics:
Mean Squared Error (MSE): 70.6065599506794
R^2 Score: 0.9059567406190894

from sklearn.preprocessing import LabelEncoder

# 2. 使用 LGBMClassifier() 方法建立分类模型并训练
# 2.1 使用 LGBMClassifier() 方法建立分类模型并训练
# 将类别标签转换为整数
label_encoder = LabelEncoder()
y_quality_train_encoded = label_encoder.fit_transform(y_quality_train)
y_quality_test_encoded = label_encoder.transform(y_quality_test)

lgb_clf = lgb.LGBMClassifier(random_state=42)
lgb_clf.fit(X_train, y_quality_train_encoded)

# 2.2 使用该模型预测空气质量等级
y_quality_pred = lgb_clf.predict(X_test)
print('预测空气质量等级结果:', y_quality_pred)

# 2.3 对模型进行评价
conf_matrix = confusion_matrix(y_quality_test_encoded, y_quality_pred)
accuracy = accuracy_score(y_quality_test_encoded, y_quality_pred)
precision = precision_score(y_quality_test_encoded, y_quality_pred, average='weighted')
recall = recall_score(y_quality_test_encoded, y_quality_pred, average='weighted')
f1 = f1_score(y_quality_test_encoded, y_quality_pred, average='weighted')

print("LGBMClassifier Model Evaluation Metrics:")
print(f'Confusion Matrix:\n{conf_matrix}')
print(f'Accuracy: {accuracy}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'F1 Score: {f1}')

预测空气质量等级结果: [2 1 1 1 0 1 0 0 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 1 1 1 0 0 0 1 0 2 0 1 0 1
1 2 1 0 1 0 2 1 0 0 1 1 0 1 2 1 2 1]
LGBMClassifier Model Evaluation Metrics:
Confusion Matrix:
[[22 1 0]
[ 1 24 0]
[ 0 1 6]]
Accuracy: 0.9454545454545454
Precision: 0.9468531468531469
Recall: 0.9454545454545454
F1 Score: 0.9452900041135335

# 3. 如评价结果不理想需对模型进行优化
# 3.1 定义参数网格
param_grid_reg = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.2],
    'num_leaves': [31, 63, 127],
    'max_depth': [-1, 5, 10],
    'min_child_samples': [20, 50, 100]
}

param_grid_clf = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.2],
    'num_leaves': [31, 63, 127],
    'max_depth': [-1, 5, 10],
    'min_child_samples': [20, 50, 100]
}

# 3.2 创建 GridSearchCV 对象
grid_search_reg = GridSearchCV(estimator=lgb_reg, param_grid=param_grid_reg, cv=5, scoring='neg_mean_squared_error')
grid_search_clf = GridSearchCV(estimator=lgb_clf, param_grid=param_grid_clf, cv=5, scoring='accuracy')

# 3.3 进行网格搜索
grid_search_reg.fit(X_train, y_aqi_train)
grid_search_clf.fit(X_train, y_quality_train_encoded)

# 3.4 获取最佳参数组合
best_params_reg = grid_search_reg.best_params_
best_params_clf = grid_search_clf.best_params_

print(f'Best Parameters for Regression: {best_params_reg}')
print(f'Best Parameters for Classification: {best_params_clf}')

Best Parameters for Regression: {‘learning_rate’: 0.1, ‘max_depth’: 5, ‘min_child_samples’: 20, ‘n_estimators’: 100, ‘num_leaves’: 31}
Best Parameters for Classification: {‘learning_rate’: 0.1, ‘max_depth’: 5, ‘min_child_samples’: 20, ‘n_estimators’: 100, ‘num_leaves’: 31}

# 3.5 根据最佳参数重新训练模型
best_lgb_reg = lgb.LGBMRegressor(**best_params_reg, random_state=42)
best_lgb_clf = lgb.LGBMClassifier(**best_params_clf, random_state=42)

best_lgb_reg.fit(X_train, y_aqi_train)
best_lgb_clf.fit(X_train, y_quality_train_encoded)

# 3.6 使用优化后的模型进行预测
y_aqi_pred_optimized = best_lgb_reg.predict(X_test)
y_quality_pred_optimized = best_lgb_clf.predict(X_test)
print('优化后的模型预测空气质量结果:', y_aqi_pred_optimized)
print('优化后的模型预测空气API结果:', y_quality_pred_optimized)

# 3.7 对优化后的模型进行评价
mse_optimized = mean_squared_error(y_aqi_test, y_aqi_pred_optimized)
r2_optimized = r2_score(y_aqi_test, y_aqi_pred_optimized)

conf_matrix_optimized = confusion_matrix(y_quality_test_encoded, y_quality_pred_optimized)
accuracy_optimized = accuracy_score(y_quality_test_encoded, y_quality_pred_optimized)
precision_optimized = precision_score(y_quality_test_encoded, y_quality_pred_optimized, average='weighted')
recall_optimized = recall_score(y_quality_test_encoded, y_quality_pred_optimized, average='weighted')
f1_optimized = f1_score(y_quality_test_encoded, y_quality_pred_optimized, average='weighted')

print("Optimized LGBMRegressor Model Evaluation Metrics:")
print(f'Optimized Mean Squared Error (MSE): {mse_optimized}')
print(f'Optimized R^2 Score: {r2_optimized}')

print("Optimized LGBMClassifier Model Evaluation Metrics:")
print(f'Optimized Confusion Matrix:\n{conf_matrix_optimized}')
print(f'Optimized Accuracy: {accuracy_optimized}')
print(f'Optimized Precision: {precision_optimized}')
print(f'Optimized Recall: {recall_optimized}')
print(f'Optimized F1 Score: {f1_optimized}')

优化后的模型预测空气质量结果: [119.6722501 90.43625783 76.95094102 69.7134339 45.49522429
85.11016632 35.29020956 34.373013 76.12352252 86.39110431
48.60966258 71.83512479 73.98876859 44.13139587 40.82771554
34.96190592 45.33698962 73.97657317 84.40383692 48.74370587
42.31917891 54.61740284 43.89328402 50.84420449 61.99838848
44.00117867 54.84723723 47.00982841 47.98332788 49.8258541
62.28614705 36.04575205 113.75560249 34.31105093 88.98552298
45.43941569 106.8158533 62.86787307 111.01787045 82.98067324
34.80876636 65.3185259 50.05687814 115.46064086 84.07845619
49.74122766 52.93800566 54.78650467 54.40771277 40.1914266
36.17207261 107.11934225 97.1210987 100.3162011 74.79308805]
优化后的模型预测空气API结果: [2 1 1 1 0 1 0 0 1 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 1 1 1 0 0 0 1 0 2 0 1 0 1
1 2 1 0 1 0 2 1 0 0 1 1 0 0 2 1 2 1]
Optimized LGBMRegressor Model Evaluation Metrics:
Optimized Mean Squared Error (MSE): 75.10416813845606
Optimized R^2 Score: 0.8999662245297593
Optimized LGBMClassifier Model Evaluation Metrics:
Optimized Confusion Matrix:
[[22 1 0]
[ 2 23 0]
[ 0 1 6]]
Optimized Accuracy: 0.9272727272727272
Optimized Precision: 0.9287878787878787
Optimized Recall: 0.9272727272727272
Optimized F1 Score: 0.9271536973664632


# 比较优化前后的指标
print("Comparison of Evaluation Metrics Before and After Optimization:")
print(f"Regression: MSE: {mse} -> {mse_optimized}, R^2 Score: {r2} -> {r2_optimized}")
print(f"Classification: Accuracy: {accuracy} -> {accuracy_optimized}, Precision: {precision} -> {precision_optimized}, Recall: {recall} -> {recall_optimized}, F1 Score: {f1} -> {f1_optimized}")

Comparison of Evaluation Metrics Before and After Optimization:
Regression: MSE: 70.6065599506794 -> 75.10416813845606, R^2 Score: 0.9059567406190894 -> 0.8999662245297593
Classification: Accuracy: 0.9454545454545454 -> 0.9272727272727272, Precision: 0.9468531468531469 -> 0.9287878787878787, Recall: 0.9454545454545454 -> 0.9272727272727272, F1 Score: 0.9452900041135335 -> 0.9271536973664632

标签:--,quality,test,optimized,实训,pred,print,空气质量,aqi
From: https://blog.csdn.net/m0_73678713/article/details/142322216

相关文章

  • 第二章 物理层
    第二章物理层1.数据通信的基础知识2.物理层的基本概念3.传输媒体4.信道复用技术5.数字传输系统6.宽带接入技术1.数据通信的基础知识1.1常用术语:通信的目的是传送消息,话音、文字、图像、视频等都是消息。数据是运送消息的实体,是使用特定方式表示的,是有意义的......
  • AI写作助力自媒体,传统模式将被颠覆
    AI在自媒体创作中的崛起 人工智能的不断发展正在彻底改变自媒体行业的运作方式。创作不再依赖单一的个人力量,AI技术的引入使得内容生成变得高效、快速。自媒体工作者可以依靠机器学习算法,获取丰富的知识和灵感,即使在众多竞争者中也能迅速脱颖而出。这种变化让人们重新审视写......
  • ERROR: Failed to build installable wheels for some pyproject.toml based projects
    问题描述安装fastembed包的时候发现在PyStemmer这一步的时候报错:Buildingwheelsforcollectedpackages:PyStemmerBuildingwheelforPyStemmer(setup.py)...errorerror:subprocess-exited-with-error×pythonsetup.pybdist_wheeldidnotrunsucce......
  • 制导弹药(导弹) 中制导律+末制导律+导引头控制 打击目标弹道程序
    声明:本文仅用于学习交流用途1.引言最近用matlab写了一个空地导弹的有控弹道程序,现在把它分写出来:导弹从空中发射后,分别经过(1)中制导段;(2)末制导段,,,两个部分。在这里,中制导律的俯仰和偏航通道用的是程序角控制,末制导律俯仰通道用的弹道成型制导律,偏航通道用的比例导引。......
  • 【Python基础】要想学好Python,学会函数是必不可少的。一篇文章带你了解Python函数!!!
    Python函数的详细讲解在Python中,函数是组织好的、可重复使用的代码块,用于执行特定的任务。函数可以接受输入参数,并返回结果。定义函数在Python中,使用def关键字来定义函数。函数的基本语法如下:deffunction_name(parameters):#函数体passfunction_name:函数的名......
  • uni-app生命周期
    目录一、页面生命周期1、onLoad【常用】2、onShow【常用】3、onReady【常用】4、onHide【常用】5、onPullDownRefresh【常用】6、onReachBottom【常用】二、应用生命周期1、onLaunch【常用】2、onShow【常用】3、onHide【常用】三、组件生命周期1、beforeCreate......
  • 打卡信奥刷题(774)用Scratch图形化工具信P5739[普及组/提高组] 【深基7.例7】计算阶乘
    【深基7.例7】计算阶乘题目描述求n!n!n!,也就是1×......
  • antd-Vue 3.X版本 a-back-top使用
    api中例子本地项目中没显示出来原因是没有图标 采用引用图标的方式展示使用的时候需注意:1.target是找到滚动的目标元素,不然也显示不出2.visibilityHeight默认是200滚动不到这个数值可能也显示不出<div><a-back-top:target="targetFunc":visibilityHeight="100">......
  • 【渗透测试】ATT&CK靶场一,phpmyadmin,域渗透,内网横向移动攻略
    前言VulnStack,作为红日安全团队匠心打造的知识平台,其独特优势在于全面模拟了国内企业的实际业务场景,涵盖了CMS、漏洞管理及域管理等核心要素。这一设计理念源于红日安全团队对ATT&CK红队评估设计模式的深刻理解和巧妙应用。靶场环境的构建与题目设计均围绕环境搭建、漏洞利用、内......
  • find基础命令与提权教程
    关于我博客主页:https://mp.weixin.qq.com/mp/homepage?__biz=Mzg2Nzk0NjA4Mg==&hid=2&sn=54cc29945318b7d954c2e04fcd6060cd前言在信息安全的广阔领域中,系统命令的权限设置至关重要。find命令作为Linux常用的文件查找工具,在日常管理中广泛应用。然而,当find命令被错误地赋予SUI......