【LGBM】LightGBM sklearn API超参数解释与使用方法(优化)

标签：None LightGBM LGBM train API split model reg importances

接下来我们进一步解释LGBM的sklearn API中各评估器中的超参数及使用方法。

在LGBM的sklearn API中，总共包含四个模型类（也就是四个评估器），分别是lightgbm.LGBMModel、LGBMClassifier 和 LGBMRegressor 以及LGBMRanker：

LGBMModel

LGBMModel 是 LightGBM 的基础模型类，它提供了所有 LightGBM 模型的通用接口。虽然它本身不是为特定任务设计的，但它包含了所有基本的训练和预测方法。

主要方法：

fit(X, y, sample_weight=None, init_score=None, eval_set=None, eval_names=None, eval_sample_weight=None, eval_class_weight=None, eval_init_score=None, eval_metric=None, early_stopping_rounds=None, verbose=True, feature_name='auto', categorical_feature='auto', callbacks=None, init_model=None)
predict(X, raw_score=False, start_iteration=0, num_iteration=None, pred_leaf=False, pred_contrib=False, **kwargs)
feature_importances_：返回特征的重要性评分。

LGBMClassifier

LGBMClassifier 是用于分类任务的模型类，适用于二分类和多分类问题。

主要超参数：

boosting_type='gbdt'：提升类型，可选值有 'gbdt' (默认), 'dart', 'goss', 'rf'。
num_leaves=31：每棵树的最大叶子数。
max_depth=-1：树的最大深度，负值表示不限制。
learning_rate=0.1：学习率，控制每次迭代的学习步长。
n_estimators=100：提升树的数量。
subsample_for_bin=200000：构造直方图时使用的样本数量。
min_split_gain=0.0：分裂节点所需的最小增益。
min_child_weight=0.001：叶子节点的最小权重。
min_child_samples=20：叶子节点的最小样本数。
subsample=1.0：每棵树训练时使用的样本比例。
colsample_bytree=1.0：每棵树训练时使用的特征比例。
reg_alpha=0.0：L1 正则化系数。
reg_lambda=0.0：L2 正则化系数。
random_state=None：随机种子，用于复现结果。
n_jobs=-1：并行任务数，-1 表示使用所有可用的 CPU 核心。
silent=True：是否静默模式，不显示训练过程中的信息。
importance_type='split'：特征重要性的计算方式，可选值有 'split' 和 'gain'。

示例代码：

from lightgbm import LGBMClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# 加载数据集
data = load_iris()
X, y = data.data, data.target

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 创建 LGBMClassifier 模型
model = LGBMClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    num_leaves=31,
    subsample=0.8,
    colsample_bytree=0.8,
    reg_alpha=0.1,
    reg_lambda=0.1,
    random_state=42
)

# 训练模型
model.fit(X_train, y_train)

# 预测
predictions = model.predict(X_test)

# 获取特征重要性
feature_importances = model.feature_importances_
print("Feature importances:", feature_importances)

LGBMRegressor

LGBMRegressor 是用于回归任务的模型类，适用于预测连续值目标变量的问题。

主要超参数：

boosting_type='gbdt'：提升类型，可选值有 'gbdt' (默认), 'dart', 'goss', 'rf'。
num_leaves=31：每棵树的最大叶子数。
max_depth=-1：树的最大深度，负值表示不限制。
learning_rate=0.1：学习率，控制每次迭代的学习步长。
n_estimators=100：提升树的数量。
subsample_for_bin=200000：构造直方图时使用的样本数量。
min_split_gain=0.0：分裂节点所需的最小增益。
min_child_weight=0.001：叶子节点的最小权重。
min_child_samples=20：叶子节点的最小样本数。
subsample=1.0：每棵树训练时使用的样本比例。
colsample_bytree=1.0：每棵树训练时使用的特征比例。
reg_alpha=0.0：L1 正则化系数。
reg_lambda=0.0：L2 正则化系数。
random_state=None：随机种子，用于复现结果。
n_jobs=-1：并行任务数，-1 表示使用所有可用的 CPU 核心。
silent=True：是否静默模式，不显示训练过程中的信息。
importance_type='split'：特征重要性的计算方式，可选值有 'split' 和 'gain'。

示例代码：

from lightgbm import LGBMRegressor
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

# 加载数据集
data = load_boston()
X, y = data.data, data.target

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 创建 LGBMRegressor 模型
model = LGBMRegressor(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    num_leaves=31,
    subsample=0.8,
    colsample_bytree=0.8,
    reg_alpha=0.1,
    reg_lambda=0.1,
    random_state=42
)

# 训练模型
model.fit(X_train, y_train)

# 预测
predictions = model.predict(X_test)

# 获取特征重要性
feature_importances = model.feature_importances_
print("Feature importances:", feature_importances)

LGBMRanker

LGBMRanker 是用于排序任务的模型类，适用于需要对一组项目进行排序的问题，常见于信息检索和推荐系统中。

主要超参数：

boosting_type='gbdt'：提升类型，可选值有 'gbdt' (默认), 'dart', 'goss', 'rf'。
num_leaves=31：每棵树的最大叶子数。
max_depth=-1：树的最大深度，负值表示不限制。
learning_rate=0.1：学习率，控制每次迭代的学习步长。
n_estimators=100：提升树的数量。
subsample_for_bin=200000：构造直方图时使用的样本数量。
min_split_gain=0.0：分裂节点所需的最小增益。
min_child_weight=0.001：叶子节点的最小权重。
min_child_samples=20：叶子节点的最小样本数。
subsample=1.0：每棵树训练时使用的样本比例。
colsample_bytree=1.0：每棵树训练时使用的特征比例。
reg_alpha=0.0：L1 正则化系数。
reg_lambda=0.0：L2 正则化系数。
random_state=None：随机种子，用于复现结果。
n_jobs=-1：并行任务数，-1 表示使用所有可用的 CPU 核心。
silent=True：是否静默模式，不显示训练过程中的信息。
importance_type='split'：特征重要性的计算方式，可选值有 'split' 和 'gain'。

特殊参数：

group：每个查询组的大小，必须在 fit 方法中提供。
eval_at=[1, 2, 3]：评估排序性能时使用的排名位置。

示例代码：

from lightgbm import LGBMRanker
import numpy as np

# 生成示例数据
X = np.random.rand(100, 10)  # 100 个样本，每个样本有 10 个特征
y = np.random.randint(0, 5, 100)  # 目标变量，假设是 0 到 4 的评分
group = [10] * 10  # 每个查询组有 10 个样本

# 创建 LGBMRanker 模型
model = LGBMRanker(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    num_leaves=31,
    subsample=0.8,
    colsample_bytree=0.8,
    reg_alpha=0.1,
    reg_lambda=0.1,
    random_state=42
)

# 训练模型
model.fit(X, y, group=group)

# 预测
predictions = model.predict(X)

# 获取特征重要性
feature_importances = model.feature_importances_
print("Feature importances:", feature_importances)

总结

LGBMModel：基础模型类，通常不直接使用。
LGBMClassifier：用于分类任务，支持二分类和多分类。
LGBMRegressor：用于回归任务，预测连续值目标变量。
LGBMRanker：用于排序任务，适用于信息检索和推荐系统。

标签：None,LightGBM,LGBM,train,API,split,model,reg,importances
From： https://blog.csdn.net/m0_73972962/article/details/131387816

【LGBM】LightGBM sklearn API超参数解释与使用方法(优化)

LGBMModel

主要方法：

LGBMClassifier

主要超参数：

示例代码：

LGBMRegressor

主要超参数：

示例代码：

LGBMRanker

主要超参数：

特殊参数：

示例代码：

总结

相关文章

赞助商

阅读排行