一、概述
pipeline实现了对特征处理与机器学习的封装流程化管理,期间处理的参数可以很方便的在测试集和未来数据上反复使用。
-
Pipeline都是执行各学习器中对应的方法,如果该学习器没有该方法,则报错
-
假设该pipeline有n个学习器
-
fit依次对前n-1的学习器执行fit和transform方法,并且对最后一个学习器执行fit方法
-
predict先对n-1学习器执行transform方法,然后执行最后一个学习器的predict方法
-
score先对n-1学习器执行transform方法,然后执行最后一个学习器的score方法
二、代码展示
from sklearn.preprocessing import StandardScaler from sklearn.linear_model import Ridge from sklearn.model_selection import train_test_split from sklearn.datasets import load_boston from sklearn.pipeline import Pipeline from sklearn.preprocessing import PolynomialFeatures import warnings warnings.filterwarnings("ignore") X,y = load_boston(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1) pipe_lr = Pipeline([ ('pf', PolynomialFeatures(degree=3,include_bias=False,interaction_only=False)), ('sc', StandardScaler()), ('clf', Ridge(alpha=0.8))]) # fit依次对前n-1的学习器执行fit和transform方法,并且对最后一个学习器执行fit方法 pipe_lr.fit(X_train, y_train) # score依次对前n-1的学习器执行transform方法,并且对最后一个学习器执行score方法 print(f'Train score: {pipe_lr.score(X_train, y_train):.5%},Test score: {pipe_lr.score(X_test, y_test):.5%},') # pip实现的就是下面代码的功能 X,y = load_boston(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1) pf = PolynomialFeatures(degree=3,include_bias=False,interaction_only=False) X_train = pf.fit_transform(X_train) X_test = pf.transform(X_test) sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) clf = Ridge(alpha=0.8) clf.fit(X_train, y_train) print(f'Train score: {clf.score(X_train, y_train):.5%},Test score: {clf.score(X_test, y_test):.5%},')
标签:Pipeline,score,fit,transform,工作,train,test,import From: https://www.cnblogs.com/qianslup/p/16972565.html