From Section 5.2, we know that any prefix code satisfies Kraft inequality.
Our goal is to design prefix codes with minimum \(L(C)\), by Kraft inequality, such goal is equivalent to finding \(l_1, l_2, ..., l_m\) satisfying Kraft inequality and \(L(C)=\sum_{i=1}^m p_i l_i\) is less than the \(L\) of any other prefix codes.
Let's formalize this problem mathematically:
Minimize $L=\sum_{i=1}^m p_i l_i $ such that $\sum_{i=1}^m D^{-l_i} \leq 1 $ and $ l_1,...,l_m $ are integers
This is a constrained optimization problem, and we can use Lagrange multipliers to solve it:
\[J=\sum_{i=1}^m p_i l_i +\lambda (\sum_{i=1}^m D^{-l_i}) \]Differentiating w.r.t. \(l_i\) and \(\lambda\), we have:
\[\frac{\partial J}{\partial \lambda}=0 \Rightarrow \sum_{i=1}^m D^{-l_i}=1 \]\[\frac{\partial J}{\partial l_i}=0 (i=1,2,...,m) \Rightarrow \\ p_i -\lambda D^{-l_i} \log_e D =0 \Rightarrow \\ D^{-l_i} = \frac{p_i}{\lambda \log_e D} \Rightarrow \\ \sum_{i=1}^mD^{-l_i} = \frac{\sum_{i=1}^m p_i}{\lambda \log_e D} \Rightarrow \\ \lambda = \frac{1}{\log_e D} \]Substituting \(\lambda = \frac{1}{\log_e D}\) into $D^{-l_i} = \frac{p_i}{\lambda \log_e D} $:
\[D^{-l_i}=p_i \Rightarrow l_i^* = -\log_D p_i \]where \(l_i^*\) is the optimal solution to such problem, but here we didn't consider the condition of integers. So these non-integer codeword lengths yields:
\[L^* =\sum_{i=1}^m p_i l_i^* = -\sum_{i=1}^m p_i \log_D p_i =H_{D}(X)=H_2(X)/\log_2 D \]Intuitively, we should choose \(l_i\) close to \(l_i^*\). Now, we verify optimality directly in the proof of the following theorem:
Thm. 5.3.1 The expected length \(L\) of any prefix D-ary code for a random variable \(X\) is greater than or equal to the entropy \(H_D(X)\); that is,
\[L\geq H_D(X) \]with equality iff \(D^{-l_i}=p_i\) --- such dist. is called D-adic.
Proof:
\[L-H_D(X)=\sum_i p_il_i +\sum_i \log_D p_i \\ =-\sum_i p_i \log_D D^{-li}+\sum_i \log_D p_i \\ =-\sum_i p_i \log_D \frac{p_i}{D^{-li}}\times\frac{\sum_j D^{-l_j}}{\sum_j D^{-l_j}} \]Let \(r_i=\frac{D^{-l_i}}{\sum_j D^{-l_j}}\), obviously, \(r_i\) is a pmf, as \(\sum r_i=1\) and \(r_i\in [0,1]\).
And let \(c=\sum_j D^{-l_j}\), \(c\leq 1\) by Kraft inequality. Then,
\[= \sum_i p_i\log_D \frac{p_i}{r_i}+\sum_i p_i \log_D \frac{1}{c}\\ = D(\mathbf{p}||\mathbf{r})+\log_D\frac{1}{c} \geq 0 \]first term is the relative entropy/KL divergence between distributions \(\mathbf{p}\) and \(\mathbf{r}\), which is always non-negative, and the second term is still non-negative because \(\frac{1}{c} \geq 1\).
标签:5.3,frac,log,sum,Codes,Optimal,Rightarrow,partial,lambda From: https://www.cnblogs.com/setdong/p/17765269.html