模型亮点
- 初始测试集上评分为0.51,调参后测试集上评分为0.75
- 数据集由sklearn自带
-----------------------------------------以下为模型具体实现-----------------------------------------
Step1.数据读取
from sklearn.datasets import load_iris iris=load_iris() x=iris.data y=iris.target import pandas as pd df_x=pd.DataFrame(x) df_y=pd.DataFrame(y) df_x.columns=['sepal_length','sepal_width','petal_length','petal_width'] df_y.columns=['class']
Step2.数据清洗
from sklearn.model_selection import train_test_split x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=1)
Step3.启动聚类分析
from sklearn.cluster import KMeans n=2 #初始参数设置 def kmeans(n,x_train,y_train): model=KMeans(n_clusters=n) #初始参数设置 model.fit(x_train,y_train) return model model=kmeans(n,x_train,y_train)
Step4.模型评估-有标签
from sklearn.metrics import adjusted_rand_score dic={} #定义空字典,存放兰德指数 def lande(model,x_test,y_test): labels_true=y_test labels_pred=model.predict(x_test) print("兰德指数:",round(adjusted_rand_score(labels_true,labels_pred),2)) dic[n]=(round(adjusted_rand_score(labels_true,labels_pred),2)) return model print("-----初始聚成 2 簇-----") model=lande(model,x_test,y_test)
Step5.优化参数
for n in range(3,9): print("-----聚成",n,"簇-----") model=kmeans(n,x_train,y_train) #训练模型 model=lande(model,x_test,y_test) #评估模型 print("-----最优参数-----") print("最优簇数:",list(dic.keys())[list(dic.values()).index(max(dic.values()))]) #字典由value反查key print("最优兰德指数",max(dic.values())) #字典value中最大值
Step6.保存最优模型
n=list(dic.keys())[list(dic.values()).index(max(dic.values()))] #最优簇数 model=kmeans(n,x_train,y_train) #训练最优模型 from sklearn.externals import joblib joblib.dump(model,'d:\kmeans_labels.pkl') new_model=joblib.load('d:\kmeans_labels.pkl') new_model.predict(x_test)
-END
标签:Means,标签,labels,dic,print,train,test,model,聚类分析 From: https://www.cnblogs.com/peitongshi/p/17478187.html