可微凸优化临近点梯度法
求解约束优化问题:
\begin{align*}
\mathop{min}\limits_{x} & \quad f(x)\\
s.t. & \quad x \in S
\end{align*}
其中,$f$是可微凸函数,$S$是凸集合。这个问题等价于:
\begin{align*}
\mathop{min}\limits_{x} & \quad f(x)+I_S(x)\\
\end{align*}
其中$I_S$为集合$S$的指示函数。
取$\beta \in \mathbb{R}^+$,使得$I+\beta \partial I_S$为极大单调算子,设$x^*$为问题的最优解,则
\begin{align*}
&0 \in \nabla f(x^*) + \partial I_S(x)\\
\iff & 0 \in \beta \nabla f(x^*) + \beta \partial I_S(x)\\
\iff & 0 \in x^* - \left( x^* -\beta \nabla f(x^*)\right) + \beta \partial I_S(x^*)\\
\iff & \left( x^* -\beta \nabla f(x^*)\right) \in x^* +\beta \partial I_S(x^*)\\
\iff & \left( I -\beta \nabla f\right)(x^*) \in \left( I +\beta \partial I_S\right) (x^*)\\
\iff & x^* =\left( I +\beta \partial I_S\right)^{-1}\left( I -\beta \nabla f\right) (x^*)
\end{align*}
由此将最优化问题转化为不动点问题,根据不动点迭代的做法,我们得到迭代方式:
$$ x^{k+1} =\left( I +\beta \partial I_S\right)^{-1}\left( I -\beta \nabla f\right) (x^k)$$
再将这个格式打开:
\begin{align*}
& x^{k+1} =\left( I +\beta \partial I_S\right)^{-1}\left( I -\beta \nabla f\right) (x^k)\\
\iff & \left( I +\beta \partial I_S\right)(x^{k+1}) =\left( I -\beta \nabla f\right) (x^k)\\
\iff & x^k-\beta \nabla f(x^k)-x^{k+1}\in \beta \partial I_S(x^{k+1})\\
\iff & (x-x^{k+1})^\top \frac{1}{\beta}\left( x^k-\beta \nabla f(x^k)-x^{k+1} \right) \leq 0,\forall x \in S\\
\iff & (x-x^{k+1})^\top \frac{1}{\beta}\left( x^{k+1}-\left( x^k-\beta \nabla f(x^k) \right) \right) \geq 0,\forall x \in S\\
\iff & x^{k+1}=\mathop{argmin}\limits_{x} \{\frac{1}{2\beta}\Vert x-\left( x^k-\beta \nabla f(x^k)\right) \Vert^2 | x\in S\}
\end{align*}
这就推导出来临近点梯度法的迭代格式:
$$x^{k+1}=\mathop{argmin}\limits_{x} \{\frac{1}{2\beta}\Vert x-\left( x^k-\beta \nabla f(x^k)\right) \Vert^2 | x\in S\}$$
进一步:
\begin{align*}
x^{k+1}=&\mathop{argmin}\limits_{x} \{\frac{1}{2\beta}\Vert x-\left( x^k-\beta \nabla f(x^k)\right) \Vert^2 | x\in S\}\\
=&\mathop{argmin}\limits_{x} \{\frac{1}{2\beta}\Vert x-x^k\Vert^2+\nabla f(x^k)^\top (x-x^k)+\frac{1}{2} \beta \Vert \nabla f(x^k) \Vert^2 | x\in S\}\\
=&\mathop{argmin}\limits_{x} \{f(x^k)+\nabla f(x^k)^\top (x-x^k) +\frac{1}{2\beta}\Vert x-x^k \Vert^2 | x\in S\}
\end{align*}
可以看出临近梯度法实际上是将目标函数在当前迭代点进行二次近似得到下一个迭代点。
复合凸优化临近点梯度法
求解复合凸优化问题:
\begin{align*}
\mathop{min}\limits_{x} & \quad f(x)+\theta (x)\\
s.t. & \quad x \in S
\end{align*}
其中,$f$是可微凸函数,$f$是凸函数但不一定可微,$S$是凸集合。这个问题等价于:
\begin{align*}
\mathop{min}\limits_{x} & \quad f(x)+\theta (x)+I_S(x)\\
\end{align*}
其中$I_S$为集合$S$的指示函数。
对此,可以将目标函数中的$\theta (x)+I_S(x)$看成一个凸函数$(\theta+I_S)(x)$,则和第一部分一样的操作,将优化问题转化为不动点问题:
$$x^* =\left( I +\beta ( \partial \theta + \partial I_S) \right)^{-1}\left( I -\beta \nabla f\right) (x^*)$$
于是得到迭代方式:
$$x^{k+1} =\left( I +\beta ( \partial \theta + \partial I_S) \right)^{-1}\left( I -\beta \nabla f\right) (x^k)$$
再将这个格式打开:
\begin{align*}
& x^{k+1} =\left( I +\beta ( \partial \theta + \partial I_S) \right)^{-1}\left( I -\beta \nabla f\right) (x^k)\\
\iff & \left( I +\beta ( \partial \theta + \partial I_S\right)(x^{k+1}) =\left( I -\beta \nabla f\right) (x^k)\\
\iff & x^k-\beta \nabla f(x^k)-x^{k+1}\in \beta \partial( \theta + I_S)(x^{k+1})\\
\iff & \theta(x) + I_S(x) -\left(\theta(x^{k+1}) + I_S(x^{k+1}) \right) -(x-x^{k+1})^\top \frac{1}{\beta}\left( x^k-\beta \nabla f(x^k)-x^{k+1} \right) \geq 0 , \forall x
\end{align*}
根据最后一个不等式,取$x \in S$,这样可以通过反证法得到$x^{k+1} \in S$,于是:
\begin{align*}
& \theta(x) + I_S(x) -\left(\theta(x^{k+1}) + I_S(x^{k+1}) \right) -(x-x^{k+1})^\top \frac{1}{\beta}\left( x^k-\beta \nabla f(x^k)-x^{k+1} \right) \geq 0 , \forall x \\
\iff & \theta(x) + I_S(x) -\theta(x^{k+1}) +(x-x^{k+1})^\top \left( x^{k+1} -\left( x^k-\beta \nabla f(x^k)\right) \right) \geq 0 , \forall x \\
\iff & \theta(x) -\theta(x^{k+1}) +(x-x^{k+1})^\top \frac{1}{\beta}\left( x^{k+1} -\left( x^k-\beta \nabla f(x^k)\right) \right) \geq 0 , \forall x \in S \\
\iff &x^{k+1}=\mathop{argmin}\limits_{x} \{\theta(x)+\frac{1}{2\beta}\Vert x-\left( x^k-\beta \nabla f(x^k)\right) \Vert^2 | x\in S\}
\end{align*}
这就推导出来临近点梯度法的迭代格式:
$$x^{k+1}=\mathop{argmin}\limits_{x} \{\theta(x)+\frac{1}{2\beta}\Vert x-\left( x^k-\beta \nabla f(x^k)\right) \Vert^2 | x\in S\}$$
进一步:
\begin{align*}
x^{k+1}=&\mathop{argmin}\limits_{x} \{\theta(x)+\frac{1}{2\beta}\Vert x-\left( x^k-\beta \nabla f(x^k)\right) \Vert^2 | x\in S\}\\
=&\mathop{argmin}\limits_{x} \{\theta(x)+ \frac{1}{2\beta}\Vert x-x^k\Vert^2+\nabla f(x^k)^\top (x-x^k)+\frac{1}{2} \beta \Vert \nabla f(x^k) \Vert^2 | x\in S\}\\
=&\mathop{argmin}\limits_{x} \{\theta(x)+f(x^k)+\nabla f(x^k)^\top (x-x^k) +\frac{1}{2\beta}\Vert x-x^k \Vert^2 | x\in S\}
\end{align*}
可以看出临近梯度法实际上是将目标函数中的可微的部分在当前迭代点进行二次近似得到下一个迭代点。