信息熵:
信息熵torch代码
event = {'a':2 , 'b':2, 'c':4} # 信息熵分:1.5
event2 = {'a':1 , 'b':1, 'c':1} # 信息熵分:1.585
p_e = [ v/sum(event.values()) for v in event.values() ]
en_e = [ item*torch.log2(torch.tensor(item)) for item in p_e ]
print(en_e)
info_entropy = -torch.sum(torch.tensor(en_e))
相对熵:KL散度
- KL:衡量两个分布的差异
- KL越大:分布差异大 / 拟合损失大 / 模型优化难度大
- KL(P||Q)通常不等于KL(Q||P),概率分布一样,两者才会相等且为0。分别表示用分布 Q 拟合 P
- KL(ASR-wenet端到端识别中):模型生成分布为(T,D,P)真实标签(T,D,1/n)
1维tensor计算
import torch.nn.functional as F
x = torch.tensor([0.5, 0.5])
y = torch.tensor([0.2, 0.8])
logp_x = torch.softmax(x, dim=-1)
p_y = torch.softmax(y, dim=-1)
kl_mean = F.kl_div(logp_x, p_y, reduction='mean')
kl_sum = F.kl_div(logp_x, p_y, reduction='sum')
kl_default = F.kl_div(logp_x, p_y )
d1 = [0.5, 0.5]
d2 = [0.2, 0.8]
d1 = torch.softmax( torch.tensor(d1), dim=-1 )
d2 = torch.softmax( torch.tensor(d2), dim=-1 )
def kl_self(d1, d2):
return torch.tensor( [ d2[id]*(torch.log(d2[id])-v) for id, v in enumerate(d1) ] )
kl_self(logp_x, p_y).sum()
KL多维tensor计算(摘自wenet,与手写不一致可能是softmax部分)
d1 = [0.5, 0.5]
d2 = [0.2, 0.8]
kl = torch.nn.KLDivLoss(reduction="none")
kl( torch.tensor(d2) , torch.tensor(d1) )
# 手写
d1 = torch.softmax( torch.tensor(d1), dim=-1 )
d2 = torch.softmax( torch.tensor(d2), dim=-1 )
torch.tensor( [ d2[id]*(torch.log(d2[id])-v) for id, v in enumerate(d1) ] )