KL散度(相对熵)
\(KL(P||Q)=\sum{p(x)}log\frac{p(x)}{q(x)}\)
\(KL(Q||P)=\sum{q(x)}log\frac{q(x)}{p(x)}\)
用来衡量两个分布之间的差异,交叉熵\(-p(x)log(q(x))\)减去信息熵\(-p(x)log(p(x))\)
由于KL散度的非对称性,故更加方便使用的JS散度诞生
JS散度
设\(M=\frac{1}{2}(P+Q)\)
则有
\begin{eqnarray}JSD(P||Q) &=& \frac{1}{2}KL(P||M)+\frac{1}{2}KL(Q||M) \nonumber
\\ &=&\frac{1}{2}\sum{p(x)\log{(\frac{2p(x)}{p(x)+q(x)})}}+\frac{1}{2}\sum{q(x)\log{(\frac{2q(x)}{p(x)+q(x)})}} \nonumber\\
&=&\frac{1}{2}\sum{p(x)\log{(\frac{p(x)}{p(x)+q(x)})}}+\frac{1}{2}\sum{q(x)\log{(\frac{q(x)}{p(x)+q(x)})}} + \log{2}
\nonumber\end{eqnarray}
当分布P和分布Q完全不重叠时,JS散度为常数log2,梯度为0,无法反向传播