首页 > 其他分享 >Part4: Appendix

Part4: Appendix

时间:2024-02-25 21:45:02浏览次数:26  
标签:bar right mathbf sqrt Appendix Part4 alpha left

本文是\(\text{diffusion models}\)中相关公式的推导部分,主要对论文中一些被省略的推导进行补充说明,对“扩散模型”感兴趣请查看前几篇文章。

高斯分布

概率密度函数

若\(x \sim \mathcal{N}(\mu, \sigma^2)\),则:

\[f(x ; \mu, \sigma)=\frac{1}{\sigma \sqrt{2 \pi}} \exp \left(-\frac{(x-\mu)^2}{2 \sigma^2}\right) \]

两个高斯的KL散度

\[D_{\mathrm{KL}}\left(\mathcal{N(\mu_1, \sigma_1^2) \mid\mid N(\mu_2, \sigma_2^2)}\right) = \ln \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2\sigma_2^2} - \frac{1}{2} \]

性质1

如果存在一个随机变量\(x \sim \mathcal{N}(\mu, \sigma^2)\)服从高斯分布,那么存在实数\(a, b\),满足:

\[ax + b \sim \mathcal{N}(a\mu + b, (a\sigma)^2) \]

因此,对于任意高斯分布\(\mathbf{x} \sim \mathcal{N}(\mu, \sigma^2)\),可以将其表示为服从标准正态分布的随机变量\(\epsilon\)的变换,即:

\[\mathbf{x} = \epsilon * \sigma + \mu, \epsilon \sim \mathcal{N}(0, \mathbf{I}) \]

性质2

假定两个随机变量都服从高斯分布且相互独立,记作\(x \sim \mathcal{N}(\mu_x, \sigma_x^2),\ \ y \sim \mathcal{N}(\mu_y, \sigma_y^2)\),则两个随机变量的和或差仍服从高斯分布,即:

\[\begin{aligned} & U=x+y \sim N\left(\mu_x+\mu_y, \sigma_x^2+\sigma_y^2\right) \\ & V=x-y \sim N\left(\mu_x-\mu_y, \sigma_x^2+\sigma_y^2\right) \end{aligned}\]

推导一

\(\text{Diffusion Forward process}\)中,任意时刻\(t\)的状态\(\mathbf{x}_t\)如何基于\(\mathbf{x}_0\)表示?

解:
已知前向过程中,状态间的转换服从高斯分布,有:

\[q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right) = \mathcal{N}\left(\sqrt{1-\beta_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I}\right)\tag{1} \]

对\(\beta_{t}\)进行变换,定义:

\[\begin{aligned} \alpha_t & =1-\beta_t \\ \bar{\alpha}_t & =\prod_{i=1}^t \alpha_i \end{aligned}\]

对\((1)\)式展开如下:

\[\begin{aligned} q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right) & =\mathcal{N}\left(\sqrt{1-\beta_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I}\right) \\ \mathbf{x}_t & =\sqrt{1-\beta_t} \mathbf{x}_{t-1}+\sqrt{\beta_t} \epsilon, \quad \epsilon \sim \mathcal{N}(0, \mathbf{I}) \\ & =\sqrt{\alpha_t} \mathbf{x}_{t-1}+\sqrt{1-\alpha_t} \epsilon \end{aligned} \tag{2} \]

已知\(\mathbf{x}_t = \sqrt{\alpha_t}\mathbf{x}_{t-1} + \sqrt{1 - \alpha_t} \epsilon\),同理可得\(\mathbf{x}_{t-1} = \sqrt{\alpha_{t-1}}\mathbf{x}_{t-2} + \sqrt{1 - \alpha_{t-1}} \bar{\epsilon}\),对\((2)\)改写,有:

\[\begin{aligned} & \sqrt{\alpha_t}\mathbf{x}_{t-1} + \sqrt{1 - \alpha_t} \epsilon \\ =& \sqrt{\alpha_t}\left(\sqrt{\alpha_{t-1}}\mathbf{x}_{t-2} + \sqrt{1 - \alpha_{t-1}} \bar{\epsilon} \right) + \sqrt{1 - \alpha_t} \epsilon \\ =& \sqrt{\alpha_t \alpha_{t-1}} \mathbf{x}_{t-2} + \sqrt{\alpha_t \left(1 - \alpha_{t-1}\right)} \bar{\epsilon} + \sqrt{1 - \alpha_t} \epsilon \end{aligned} \tag{3} \]

为了与\(\epsilon\)进行区分,使用\(\bar{\epsilon}\)表示另外一个服从标准高斯分布\(\mathcal{N}(0, \mathbf{I})\)的变量。

根据高斯分布的性质1,任意的高斯分布可由标准高斯分布转换得到,故:

\[\begin{aligned} \epsilon \sim \mathcal{N}(0, \mathbf{I}) \quad &\Rightarrow \quad \sqrt{1 - \alpha_t} \epsilon \sim \mathcal{N}(0, \left(1-\alpha_t\right)\mathbf{I}) \ \\ \bar{\epsilon} \sim \mathcal{N}(0, \mathbf{I}) \quad &\Rightarrow \quad \sqrt{\alpha_t \left(1 - \alpha_{t-1}\right)} \epsilon \sim \mathcal{N}(0, \alpha_t\left(1-\alpha_{t-1}\right)\mathbf{I}) \end{aligned} \tag{a}\]

由于\(\sqrt{1 - \alpha_t} \epsilon\)与\(\sqrt{\alpha_t \left(1 - \alpha_{t-1}\right)} \bar{\epsilon}\)独立且都服从高斯分布,记\(U = \sqrt{1 - \alpha_t} \epsilon + \sqrt{\alpha_t \left(1 - \alpha_{t-1}\right)} \bar{\epsilon}\),由性质2可知\(U\)也服从高斯分布,有:

\[\begin{aligned} \sqrt{1 - \alpha_t} \epsilon + \sqrt{\alpha_t \left(1 - \alpha_{t-1}\right)} \bar{\epsilon} &\sim \mathcal{N}(0, \left(1-\alpha_t\right)\mathbf{I} +\alpha_t\left(1-\alpha_{t-1}\right)\mathbf{I}) \\ \Rightarrow U & \sim \mathcal{N}(0, \left(1-\alpha_t\alpha_{t-1}\right)\mathbf{I}) \end{aligned} \tag{b}\]

基于高斯分布的性质1,将\(U\)使用标准高斯分布表示:

\[\begin{aligned} U &\sim \mathcal{N}(0, \left(1-\alpha_t\alpha_{t-1}\right)\mathbf{I}) \Rightarrow U = \sqrt{1 - \alpha_t\alpha_{t-1}} \epsilon \end{aligned} \tag{c}\]

将\((c)\)代入\((3)\),可得:

\[\begin{aligned} & \sqrt{\alpha_t}\mathbf{x}_{t-1} + \sqrt{1 - \alpha_t} \epsilon \\ =& \sqrt{\alpha_t}\left(\sqrt{\alpha_{t-1}}\mathbf{x}_{t-2} + \sqrt{1 - \alpha_{t-1}} \bar{\epsilon} \right) + \sqrt{1 - \alpha_t} \epsilon \\ =& \sqrt{\alpha_t \alpha_{t-1}} \mathbf{x}_{t-2} + \sqrt{\alpha_t \left(1 - \alpha_{t-1}\right)} \bar{\epsilon} + \sqrt{1 - \alpha_t} \epsilon \\ =& \sqrt{\alpha_t \alpha_{t-1}}\mathbf{x}_{t-2} + \sqrt{1 - \alpha_t \alpha_{t-1}} \epsilon \end{aligned}\]

由数学归纳法,易知:

\[\begin{aligned} q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right) & =\mathcal{N}\left(\sqrt{1-\beta_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I}\right) \\ \mathbf{x}_t & =\sqrt{1-\beta_t} \mathbf{x}_{t-1}+\sqrt{\beta_t} \epsilon, \quad \epsilon \sim \mathcal{N}(0, \mathbf{I}) \\ & =\sqrt{\alpha_t} \mathbf{x}_{t-1}+\sqrt{1-\alpha_t} \epsilon \\ & =\sqrt{\alpha_t \alpha_{t-1}} \mathbf{x}_{t-2}+\sqrt{1-\alpha_t \alpha_{t-1}} \epsilon \\ & =\ldots \\ & =\sqrt{\bar{\alpha}_t} \mathbf{x}_0+\sqrt{1-\bar{\alpha}_t} \epsilon \end{aligned} \]

因此,\(q\left(\mathbf{x}_t \mid \mathbf{x}_{0}\right) = \mathcal{N}\left(\sqrt{\bar{\alpha}_t} \mathbf{x}_{0}, \sqrt{1 - \bar{\alpha}_t} \mathbf{I}\right)\)

推导二

在\(diffusion\)中,定义\(q\)服从高斯分布,故对\(q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right)\)定义如下:

\[\begin{aligend} q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right) & =\mathcal{N}\left(\mathbf{x}_{t-1} ; \tilde{\boldsymbol{\mu}}_t\left(\mathbf{x}_t, \mathbf{x}_0\right), \tilde{\beta}_t \mathbf{I}\right) \end{aligend} \]

那其中\(\tilde{\boldsymbol{\mu}}_t\left(\mathbf{x}_t, \mathbf{x}_0\right)\)与\(\tilde{\beta_t}\)如何得到?

此处先给出结论,下方是更详细的推导。

\[\begin{aligned} \tilde{\boldsymbol{\mu}}_t\left(\mathbf{x}_t, \mathbf{x}_0\right) &:= \frac{\sqrt{\bar{\alpha}_{t-1}} \beta_t}{1-\bar{\alpha}_t} \mathbf{x}_0+\frac{\sqrt{\alpha_t}\left(1-\bar{\alpha}_{t-1}\right)}{1-\bar{\alpha}_t} \mathbf{x}_t, \\ \tilde{\beta}_t &:= \frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_t} \beta_t \end{aligned} \]

解:
回顾贝叶斯公式,对\(q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right)\)改写,有:

\[q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right) =q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}, \mathbf{x}_0\right) \frac{q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_0\right)}\tag{1}\]

由于Diffusion基于马尔可夫链建模,由马尔可夫性易知每个状态只依赖于前一个状态,故$$q\left(\mathbf{x}t \mid \mathbf{x}, \mathbf{x}_0\right) = q\left(\mathbf{x}t \mid \mathbf{x}\right)$$

\((1)\)式写作\((2)\)式:

\[\begin{aligned} q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right) &=q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}, \mathbf{x}_0\right) \frac{q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_0\right)} \\ &=q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right) \frac{q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_0\right)} \end{aligned} \tag{2}\]

基于推导一的结论,易知:

\[\begin{aligned} q\left(\mathbf{x}_t \mid \mathbf{x}_{0}\right) &= \mathcal{N}\left(\sqrt{\bar{\alpha}_t} \mathbf{x}_{0}, \sqrt{1 - \bar{\alpha}_t} \mathbf{I}\right) \\ q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_{0}\right) &= \mathcal{N}\left(\sqrt{\bar{\alpha}_{t-1}} \mathbf{x}_{0}, \sqrt{1 - \bar{\alpha}_{t-1}} \mathbf{I}\right) \end{aligned}\]

由高斯分布的概率密度函数,对\((2)\)展开,有:

\[\begin{aligned} q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right) & = q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right) \frac{q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_0\right)} \\ & \propto \exp \left(-\frac{1}{2}\left(\frac{\left(\mathbf{x}_t-\sqrt{\alpha_t} \mathbf{x}_{t-1}\right)^2}{\beta_t}+\frac{\left(\mathbf{x}_{t-1}-\sqrt{\bar{\alpha}_{t-1}} \mathbf{x}_0\right)^2}{1-\bar{\alpha}_{t-1}}-\frac{\left(\mathbf{x}_t-\sqrt{\bar{\alpha}_t} \mathbf{x}_0\right)^2}{1-\bar{\alpha}_t}\right)\right) \end{aligned}\tag{3}\]

不论是\(\beta_t\)或是\(\bar{\alpha}_t\)皆非随机变量,故可省略。最终目标是使用随机变量\(\mathbf{x}_0\)与\(\mathbf{x}_{t}\)表示\(\mathbf{x}_{t-1}\)。对\((3)\)式继续展开,有\((4)\):

\[\begin{aligned} &q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t, \mathbf{x}_0\right) = q\left(\mathbf{x}_t \mid \mathbf{x}_{t-1}\right) \frac{q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0\right)}{q\left(\mathbf{x}_t \mid \mathbf{x}_0\right)} \\ & \propto \exp \left(-\frac{1}{2}\left(\frac{\left(\mathbf{x}_t-\sqrt{\alpha_t} \mathbf{x}_{t-1}\right)^2}{\beta_t}+\frac{\left(\mathbf{x}_{t-1}-\sqrt{\bar{\alpha}_{t-1}} \mathbf{x}_0\right)^2}{1-\bar{\alpha}_{t-1}}-\frac{\left(\mathbf{x}_t-\sqrt{\bar{\alpha}_t} \mathbf{x}_0\right)^2}{1-\bar{\alpha}_t}\right)\right) \\ &=\exp \left(-\frac{1}{2}\left(\frac{\mathbf{x}_t^2-2 \sqrt{\alpha_t} \mathbf{x}_t \mathbf{x}_{t-1}+\alpha_t \mathbf{x}_{t-1}^2}{\beta_t}+\frac{\mathbf{x}_{t-1}^2-2 \sqrt{\bar{\alpha}_{t-1}} \mathbf{x}_0 \mathbf{x}_{t-1}+\bar{\alpha}_{t-1} \mathbf{x}_0^2}{1-\bar{\alpha}_{t-1}}-\frac{\left(\mathbf{x}_t-\sqrt{\bar{\alpha}_t} \mathbf{x}_0\right)^2}{1-\bar{\alpha}_t}\right)\right) \\ &=\exp \left(-\frac{1}{2}\left(\left(\frac{\alpha_t}{\beta_t}+\frac{1}{1-\bar{\alpha}_{t-1}}\right) \mathbf{x}_{t-1}^2-\left(\frac{2 \sqrt{\alpha_t}}{\beta_t} \mathbf{x}_t+\frac{2 \sqrt{\bar{\alpha}_{t-1}}}{1-\bar{\alpha}_{t-1}} \mathbf{x}_0\right) \mathbf{x}_{t-1}+C\left(\mathbf{x}_t, \mathbf{x}_0\right)\right)\right) \end{aligned} \tag{4}\]

其中,倒数第二个等号右边是对上一步的平方展开;最后一个等号右边是以\(\mathbf{x}_{t-1}\)为变量,\(\mathbf{x}_0\)与\(\mathbf{x}_{t}\)为参数,构造完全平方公式,以形成高斯分布概率密度函数中的指数部分,形如\(-\frac{(\mathbf{x}_{t-1}-\tilde{\mu_t})^2}{2 \tilde{\beta_t}}\)。因此,不难得出:

\[\begin{aligned} \tilde{\boldsymbol{\mu}}_t &= \frac{1}{\frac{\alpha_t}{\beta_t}+\frac{1}{1-\bar{\alpha}_{t-1}}} * \left(\frac{\sqrt{\alpha_t}}{\beta_t} \mathbf{x}_t+\frac{\sqrt{\bar{\alpha}_{t-1}}}{1-\bar{\alpha}_{t-1}} \mathbf{x}_0\right) \\ &= \frac{\left(1 - \bar{\alpha}_{t-1}\right) \beta_{t}}{\alpha_t\left(1 - \bar{\alpha}_{t-1}\right) + \beta_t} * \left(\frac{\sqrt{\alpha_t}}{\beta_t} \mathbf{x}_t+\frac{\sqrt{\bar{\alpha}_{t-1}}}{1-\bar{\alpha}_{t-1}} \mathbf{x}_0\right) \\ & = \frac{\left(1 - \bar{\alpha}_{t-1}\right)\sqrt{\alpha_t}}{\alpha_t\left(1 - \bar{\alpha}_{t-1}\right) + \beta_t} \mathbf{x}_t + \frac{\sqrt{\bar{\alpha}_{t-1}} \beta_{t}}{\alpha_t\left(1 - \bar{\alpha}_{t-1}\right) + \beta_t} \mathbf{x}_0 \\ \end{aligned}\tag{5}\]

\(\alpha_t = 1 - \beta_t\),故:

\[\begin{aligned} \alpha_t\left(1 - \bar{\alpha}_{t-1}\right) + \beta_t &= \alpha_t - \alpha_t \bar{\alpha}_{t-1} + \beta_t \\ &= 1 - \beta_t - \alpha_t \bar{\alpha}_{t-1} + \beta_t \\ &= 1 - \alpha_t \bar{\alpha}_{t-1} \\ &= 1 - \bar{\alpha}_{t} \end{aligned}\tag{6}\]

将\((6)\)式代入\((5)\),有:

\[\tilde{\boldsymbol{\mu}}_t\left(\mathbf{x}_t, \mathbf{x}_0\right) :=\frac{\sqrt{\bar{\alpha}_{t-1}} \beta_t}{1-\bar{\alpha}_t} \mathbf{x}_0+\frac{\sqrt{\alpha_t}\left(1-\bar{\alpha}_{t-1}\right)}{1-\bar{\alpha}_t} \mathbf{x}_t \]

对于\(\tilde{\beta}_t\),有:

\[\begin{aligned} \tilde{\beta}_t &= \frac{1}{\frac{\alpha_t}{\beta_t}+\frac{1}{1-\bar{\alpha}_{t-1}}} \\ &= \frac{\left(1 - \bar{\alpha}_{t-1}\right) \beta_{t}}{\alpha_t\left(1 - \bar{\alpha}_{t-1}\right) + \beta_t} \\ &= \frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_t} \beta_t \end{aligned}\]

以上内容即\(\text{DDPM}\)中一些被省略的数学推导。

Papers

  1. Deep unsupervised learning using nonequilibrium thermodynamics, 2015.
  2. Denoising diffusion probabilistic models, 2020.
  3. Improved denoising diffusion probabilistic models, 2021.

标签:bar,right,mathbf,sqrt,Appendix,Part4,alpha,left
From: https://www.cnblogs.com/shayue/p/18033128

相关文章

  • day28 回溯算法part4 代码随想录算法训练营 90. 子集 II
    题目:90.子集II我的感悟:只要功夫深,铁树也开花参考答案,没我写的好理解难点:去重代码难点:i-1的含义易错点:nums要排序回溯要写i+1path.append要添加的是nums[i]代码示例:classSolution:defsubsetsWithDup(self,nums:List[int])->List[List[int]]:......
  • day28 回溯算法part4 代码随想录算法训练营 78. 子集
    题目:78.子集我的感悟:看见弹幕是秒了,我有点不敢相信,自己试了试,没有通过,再看了一眼文字讲解。感觉懂了点理解难点:这题可以没有终止条件,开始我就疑惑这个终止条件怎么写注意这个nums[i]要添加进入是可以不写终止的,不会出现无线递归的,因为是从i+1开始,那会不会越界??,不会,最......
  • day28 回溯算法part4 代码随想录算法训练营 93. 复原 IP 地址
    题目:93.复原IP地址我的感悟:加油!理解难点:开始没理解,start_index的含义start_index是切割后的位置信息。代码难点:代码示例:fromtypingimportListclassSolution:defrestoreIpAddresses(self,s:str)->List[str]:#找3个分割点?#最后......
  • 即时通讯技术文集(第31期):IM开发综合技术合集(Part4) [共13篇]
    为了更好地分类阅读52im.net总计1000多篇精编文章,我将在每周三推送新的一期技术文集,本次是第31 期。[- 1 -] IM消息ID技术专题(一):微信的海量IM聊天消息序列号生成实践(算法原理篇)[链接] http://www.52im.net/thread-1998-1-1.html[摘要] 如何优雅地解决“消息序列号只要保......
  • Derivative norm vector repect to time 《PBM by Pixar》 Appendix D.2 code
    目录1Derivativenormalvectorrepecttotime1.1DerivativevectornormrepecttotimeXRefVectorCalculus1DerivativenormalvectorrepecttotimeLet'sdenotetheunitnormalvectoras:\[\mathbf{n}=\frac{\mathbf{e}_a\times\mathbf{e}_b}{......
  • python part4
    Pythonpart4条件if语句abs()布尔表达式嵌套if-elif-else推导式本质是python的语法糖match-case语句case后的内容可以用|隔开清晰的代码风格让if后的条件为真if下的内容不要空白,不够清晰明了用if-if会导致bug多用elifif套if比较混乱,用and较为清晰......
  • 华为认证考试每日刷题与解析 Part4
    1、关于IGMPSnooping工作机制的描述,正确的是?A、二层交换机通过不断监听IGMP报文,在二层建立和维护PIM路由表B、没有运行IGMPSnooping时,组播报文将在二层广播:运行IGMPSnooping后,报文将不再在二层广播,而是进行二层组播C、如果主机发出IGMP离开报文时,交换机将该主机加入到相应......
  • Flex+J2EE实例(cairngorm+blazeDS+hibernate+spring) part4 (完)
     Flex+J2EE实例(cairngorm+blazeDS+hibernate+spring)part4----addcairngorm1.添加在libs下添加Cairngorm.swc,此时,具备了cairngorm框架能力2.运用cairngorm框架2.1在flex_src下创建如下文件夹和文件   AdminVO.aspackagevo.AdminVO{ [Bindable] publicclassAdm......
  • Artifact Appendix 用处
    在论文中,ArtifactAppendix(实证附录)通常是指附带的、与研究相关的软件、数据、工具、代码等实证材料和文档的集合。这些实证材料可以帮助读者验证研究的正确性、复现研究结果,也可以帮助其他研究者进一步拓展和改进研究工作。ArtifactAppendix的内容可能因研究内容的不同而有所不......
  • Artifact Appendix 用处
    在论文中,ArtifactAppendix(实证附录)通常是指附带的、与研究相关的软件、数据、工具、代码等实证材料和文档的集合。这些实证材料可以帮助读者验证研究的正确性、复现研究结果,也可以帮助其他研究者进一步拓展和改进研究工作。ArtifactAppendix的内容可能因研究内容的不同而有所......