首页 > 其他分享 >k-means k均值聚类的弱点/缺点

k-means k均值聚类的弱点/缺点

时间:2023-11-07 11:35:19浏览次数:29  
标签:use means may 均值 cluster 聚类 data mean

Similar to other algorithm, K-mean clustering has many weaknesses:


1 When the numbers of data are not so many, initial grouping will determine the cluster significantly.  当数据数量不是足够大时,初始化分组很大程度上决定了聚类,影响聚类结果。
2 The number of cluster, K, must be determined before hand.  要事先指定K的值。
3 We never know the real cluster, using the same data, if it is inputted in a different order may produce different cluster if the number of data is a few. 数据数量不多时,输入的数据的顺序不同会导致结果不同。
4 Sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum. 对初始化条件敏感。
5 We never know which attribute contributes more to the grouping process since we assume that each attribute has the same weight. 无法确定哪个属性对聚类的贡献更大。
6 weakness of arithmetic mean is not robust to outliers. Very far data from the centroid may pull the centroid away from the real one. 使用算术平均值对outlier不鲁棒。
7 The result is circular cluster shape because based on distance.  因为基于距离,故结果是圆形的聚类形状。


One way to overcome those weaknesses is to use K-mean clustering only if there are available many data. To overcome outliers problem, we can use median instead of mean.  克服缺点的方法: 使用尽量多的数据;使用中位数代替均值来克服outlier的问题。

Some people pointed out that K means clustering cannot be used for other type of data rather than quantitative data. This is not true! See how you can use multivariate data up to n dimensions (even mixed data type) here. The key to use other type of dissimilarity is in the distance matrix.



From: https://blog.51cto.com/emanlee/8228743


  • [机器学习复习笔记] Clustering 聚类 (K-means实现)
  • [机器学习复习笔记] Spectral Clustering 谱聚类
  • K-means聚类算法
  • 关于“聚类算法”
        今天我在csdn上看到一篇文章关于聚类算法的文章。我了解到聚类算法是一类无监督学习的算法,用于将数据集中的对象按照相似性进行分组或聚集。聚类算法的目标是将相似的数据点归为一类,同时将不相似的数据点分开。        常见的聚类算法包括:1.K-means聚类算法。......
  • R语言文本挖掘:kmeans聚类分析上海玛雅水公园景区五一假期评论词云可视化|附代码数据
  • R语言有限混合模型聚类FMM、广义线性回归模型GLM混合应用分析威士忌市场和研究专利申
  • 【scipy 基础】--聚类
  • R : 数据范围、数据均值、标准误差
  • SPSS Modeler分析物流发货明细数据:K-MEANS(K均值)聚类和Apriori关联规则挖掘|附代码数据
  • 排序&平均值
    #include<iostream>usingnamespacestd;intm[5],n,num=0;voidp1_2(inttf){ for(intj=0;j<5;j++){ for(inti=0;i<5;i++){ if(tf==1){ if(m[j]<m[i]){ num=m[j]; m[j]=m[i]; m[i]=num; } }elseif(tf==2){ if(m[j......