假设对于17个样本点( v 1 , v 2 , . . . , v 17 ) 进行聚类:
某一种算法得到聚类结果为:
A=[1 2 1 1 1 1 1 2 2 2 2 3 1 1 3 3 3]
标准的聚类结果为:
B=[1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3]
问题:需要度量算法结果与标准结果之间的相似度,如果结果越相似NMI值应接近1;如果算法结果很差则NMI值接近0。
计算过程:略,有需要回原版看一下
代码实现过程:
- 可以直接调用scikit-learn包中集成的度量函数
- 自己编写函数实现计算过程
# -*- coding:utf-8 -*- ''' Created on 2017年10月28日 @summary: 利用Python实现NMI计算 @author: dreamhome ''' import math import numpy as np from sklearn import metrics def NMI(A,B): #样本点数 total = len(A) A_ids = set(A) B_ids = set(B) #互信息计算 MI = 0 eps = 1.4e-45 for idA in A_ids: for idB in B_ids: idAOccur = np.where(A==idA) idBOccur = np.where(B==idB) idABOccur = np.intersect1d(idAOccur,idBOccur) px = 1.0*len(idAOccur[0])/total py = 1.0*len(idBOccur[0])/total pxy = 1.0*len(idABOccur)/total MI = MI + pxy*math.log(pxy/(px*py)+eps,2) # 标准化互信息 Hx = 0 for idA in A_ids: idAOccurCount = 1.0*len(np.where(A==idA)[0]) Hx = Hx - (idAOccurCount/total)*math.log(idAOccurCount/total+eps,2) Hy = 0 for idB in B_ids: idBOccurCount = 1.0*len(np.where(B==idB)[0]) Hy = Hy - (idBOccurCount/total)*math.log(idBOccurCount/total+eps,2) MIhat = 2.0*MI/(Hx+Hy) return MIhat if __name__ == '__main__': A = np.array([1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3]) B = np.array([1,2,1,1,1,1,1,2,2,2,2,3,1,1,3,3,3]) print(NMI(A,B)) # 0.3645617718571898 print(metrics.normalized_mutual_info_score(A,B)) # 0.36456177185718985
————————————————
版权声明:本文为CSDN博主「梦家」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/DreamHome_S/article/details/78379635
存在问题:如何获得标准聚类结果呢?
state:暂未解决
标签:1.0,Python,互信息,ids,len,np,total,NMI From: https://www.cnblogs.com/shylkt/p/17506152.html