目录
概
本文希望实现 4-bit 的模型训练和推理. 提出了一种 logarithmic unbiased quantization (LUQ).
Logarithmic Unbiased Quantization
-
作者认为, 无偏量化在反向传播中尤为重要因为这保证在期望上和普通的优化策略的一致性. 又梯度整体呈现对数形状, 如何在这些条件下进行量化催生了本文 LUQ.
-
Stochastic underflow: 首先, 对梯度进行一个随机'裁剪':
\[T_{\alpha}(x) = \left \{ \begin{array}{ll} x, & \text{if } |x| \ge \alpha, \\ \text{sign}(x) \cdot \alpha & \text{with a probability } \frac{|x|}{\alpha}, \text{if } |x| < \alpha, \\ 0 & \text{with a probability } 1 - \frac{|x|}{\alpha}, \text{if } |x| < \alpha. \end{array} \right . \]这里取 \(\alpha = \max(|x| / 2^{2^{b-1}})\).
-
Logarithmic SR: 对数量化是选择 bins:
\[\{\alpha, 2\alpha, \ldots, 2^{2^{b-1}} \alpha \}, \]然后按照如下的方式进行 stochastic rounding. 对于 \(2^{n-1}\alpha < x < 2^n \alpha\):
\[Q_{\alpha}(x) = \left \{ \begin{array}{ll} 2^{n-1} \alpha & \text{with a probability } \frac{2^n \alpha - x}{2^n \alpha - 2^{n-1} \alpha}, \\ 2^{n} \alpha & \text{with a probability } 1 - \frac{2^n \alpha - x}{2^n \alpha - 2^{n-1} \alpha}. \end{array} \right . \]作者为了进一步优化这个稍显复杂的 rounding, 提出了 RDNP. 可惜这部分我没咋看懂, \((2^n + 2^{n-1}) / 2 = 3 / 4 \cdot 2^{n-1}\)?.
代码
[代码在 supplementary material 中]
标签:Training,frac,Matrix,probability,Neural,text,alpha,bit,array From: https://www.cnblogs.com/MTandHJ/p/18626909