K-Means算法
是无监督的聚类算法
是很典型的基于 距离的 聚类 算法
如果在一次迭代前后,J的值没有发生变化,说明算法已经收敛,结束迭代。
K值:要得到的簇的个数
质心:每个簇的均值向量,即向量各维取平均即可
距离量度:常用欧几里得距离和余弦相似度(先标准化)
import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
if __name__ == '__main__':
# 获取测试数据
dataset = load_iris()
X = dataset.data
y = dataset.target
Xd_train, Xd_test, y_train, y_test = train_test_split(X, y, random_state=14)
clf = KMeans(n_clusters=3, random_state=0).fit(Xd_train)
# print(clf.labels_)
y_predicted = clf.predict(Xd_test)
# 准确率
accuracy = np.mean(y_predicted == y_test) * 100
print("y_test ", y_test)
print("y_predicted", y_predicted)
print("accuracy:", accuracy)