【机器学习】Optuna机器学习模型调参(LightGBM、XGBoost)

标签：suggest 机器 Optuna study XGBoost trial train optuna test

文章目录

1. optuna简介
2. LGBM和XGBoost调参汇总

2.1 LGBM

2.1.1 定义Objective
2.1.2 调参try
2.1.3 绘图
2.1.4 最佳参数

2.2 XGBOOST

2.2.1 定义Objectove
2.2.2 调参try
2.2.3 绘图
2.2.4 最佳参数

1. optuna简介

在Kaggle比赛的过程中我发现了一个问题（大家的Kernel模型中包含了众多c超参数设置，但是这些参数是如何设置的呢？），并在Discussion中提出了我的问题，并得到了众多大佬的回答，如下：

【机器学习】Optuna机器学习模型调参(LightGBM、XGBoost)_参数设置

关于回答我汇总后发现都提到了关于optuna库的使用，optuna是什么呢？optuna是一个使用python编写的超参数调节框架。一个极简的 optuna 的优化程序中只有三个最核心的概念，目标函数(objective)，单次试验(trial)，和研究(study). 其中 objective 负责定义待优化函数并指定参/超参数数范围，trial 对应着 objective 的单次执行，而 study 则负责管理优化，决定优化的方式，总试验的次数、试验结果的记录等功能。

下面举一个简单的栗子，有助于大家的理解：

定义 $【机器学习】Optuna机器学习模型调参(LightGBM、XGBoost)_数据集_02$ ，求 $【机器学习】Optuna机器学习模型调参(LightGBM、XGBoost)_数据集_03$ 取得最大值时， $【机器学习】Optuna机器学习模型调参(LightGBM、XGBoost)_数据集_04$ 的取值？

import optuna
 
def objective(trial):
    x = trial.suggest_uniform('x', -10, 10)
    y = trial.suggest_uniform('y', -10, 10)
    return (x + y) ** 2
 
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
 
print(study.best_params)
print(study.best_value)

2. LGBM和XGBoost调参汇总

2.1 LGBM

2.1.1 定义Objective

from lightgbm import LGBMRegressor
import optuna
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split, KFold
import optuna.integration.lightgbm as oplgb

def objective(trial):
    X_train, X_test, y_train, y_test=train_test_split(data, target, train_size=0.3)# 数据集划分
    param = {
        'metric': 'rmse', 
        'random_state': 48,
        'n_estimators': 20000,
        'reg_alpha': trial.suggest_loguniform('reg_alpha', 1e-3, 10.0),
        'reg_lambda': trial.suggest_loguniform('reg_lambda', 1e-3, 10.0),
        'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.3,0.4,0.5,0.6,0.7,0.8,0.9, 1.0]),
        'subsample': trial.suggest_categorical('subsample', [0.4,0.5,0.6,0.7,0.8,1.0]),
        'learning_rate': trial.suggest_categorical('learning_rate', [0.006,0.008,0.01,0.014,0.017,0.02]),
        'max_depth': trial.suggest_categorical('max_depth', [5, 7, 9, 11, 13, 15, 17, 20, 50]),
        'num_leaves' : trial.suggest_int('num_leaves', 1, 1000),
        'min_child_samples': trial.suggest_int('min_child_samples', 1, 300),
        'cat_smooth' : trial.suggest_int('cat_smooth', 1, 100)      
    }
    
    lgb=LGBMRegressor(**param)
    lgb.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=100, verbose=False)
    pred_lgb=lgb.predict(X_test)
    rmse = mean_squared_error(y_test, pred_lgb, squared=False)
    return rmse

2.1.2 调参try

study=optuna.create_study(direction='minimize')
n_trials=50 # try50次
study.optimize(objective, n_trials=n_trials)

2.1.3 绘图

optuna.visualization.plot_optimization_history(study)# 绘制

optuna.visualization.plot_parallel_coordinate(study)#

optuna.visualization.plot_param_importances(study)#

2.1.4 最佳参数

params=study.best_params
params['metric'] = 'rmse'

2.2 XGBOOST

2.2.1 定义Objectove

def objective(trial):
    data = train.iloc[:, :-1]
    target = train.target
    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.3, random_state=42)
    param = {
        'lambda': trial.suggest_loguniform('lambda', 1e-3, 10.0),
        'alpha': trial.suggest_loguniform('alpha', 1e-3, 10.0),
        'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]),
        'subsample': trial.suggest_categorical('subsample', [0.4, 0.5, 0.6, 0.7, 0.8, 1.0]),
        'learning_rate': trial.suggest_categorical('learning_rate',
                                                   [0.008, 0.009, 0.01, 0.012, 0.014, 0.016, 0.018, 0.02]),
        'n_estimators': 4000,
        'max_depth': trial.suggest_categorical('max_depth', [5, 7, 9, 11, 13, 15, 17, 20]),
        'random_state': trial.suggest_categorical('random_state', [24, 48, 2020]),
        'min_child_weight': trial.suggest_int('min_child_weight', 1, 300),
    }
    model = xgb.XGBRegressor(**param)
    model.fit(train_x, train_y, eval_set=[(test_x, test_y)], early_stopping_rounds=100, verbose=False)
    preds = model.predict(test_x)
    rmse = mean_squared_error(test_y, preds, squared=False)
    return rmse

2.2.2 调参try

study = optuna.create_study(direction='minimize')
n_trials=1
study.optimize(objective, n_trials=n_trials)
print('Number of finished trials:', len(study.trials))
print("------------------------------------------------")
print('Best trial:', study.best_trial.params)
print("------------------------------------------------")
print(study.trials_dataframe())
print("------------------------------------------------")

2.2.3 绘图

optuna.visualization.plot_optimization_history(study).show()
#plot_parallel_coordinate: interactively visualizes the hyperparameters and scores
optuna.visualization.plot_parallel_coordinate(study).show()
'''plot_slice: shows the evolution of the search. You can see where in the hyperparameter space your search
went and which parts of the space were explored more.'''
optuna.visualization.plot_slice(study).show()
optuna.visualization.plot_contour(study, params=['alpha',
                            #'max_depth',
                            'lambda',
                            'subsample',
                            'learning_rate',
                            'subsample']).show()
#Visualize parameter importances.
optuna.visualization.plot_param_importances(study).show()
#Visualize empirical distribution function
optuna.visualization.plot_edf(study).show()

2.2.4 最佳参数

params=study.best_params

标签：suggest,机器,Optuna,study,XGBoost,trial,train,optuna,test
From： https://blog.51cto.com/u_16165815/6521631