Differentiation
Def. Gradient
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is differentiable. Then the gradient of \(f\) at \({\bf x}\in\cal{X}\), denoted by \(\nabla f({\bf x})\), is defined by
Def. Hessian
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is twice differentiable. Then the Hessian of \(f\) at \({\bf x}\in\cal{X}\), denoted by \(\nabla ^2 f({\bf x})\), is defined by
Th. Fermat's theorem
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is twice differentiable. If \(f\) admits a local extrememum at \({\bf x} ^*\), then \(\nabla f(\bf x ^*)=0\)
Convexity
Def. Convex set
A set \({\cal X}\sube \mathbb{R} ^N\) is said to be convex, if for any \({\bf x, y}\in \cal X\), the segment \(\bf [x, y]\) lies in \(\cal X\), that is \(\{\alpha{\bf x} + (1-\alpha){\bf y}:0\le\alpha\le 1 \}\sube\cal X\)
Th. Operations that preserve convexity
- \({\cal C} _i\) is convex for all \(i\in I\), then \(\bigcap _{i\in I} {\cal C} _i\) is also convex
- \({\cal C} _1, {\cal C} _2\) is convex, then \({\cal C _1+C _2}=\{x _1+x _2:x _1\in {\cal C} _1, x _2\in {\cal X} _2 \}\) is also convex
- \({\cal C} _1, {\cal C} _2\) is convex, then \({\cal C _1\times C _2}=\{(x _1, x _2):x _1\in {\cal C} _1, x _2\in {\cal X} _2 \}\) is also convex
- Any projection of a convex set is also convex
Def. Convex hull
The Convex hull \(\text{conv}(\cal X)\) of set \({\cal X}\sube \mathbb{R} ^N\), is the minimal convex set containing \(\cal X\), that is
Def. Epigraph
The epigraph of \(f:{\cal X}\to \mathbb{R}\), denoted by \(\text{Epi } f\), is defined by \(\{(x,y):x\in {\cal X}, y\ge f(x)\}\)
Def. Convex function
Convex set \(\cal X\). A function \(f:{\cal X}\to \mathbb{R}\) is said to be convex, iff \(\text{Epi }f\) is convex, or equivalently, for all \({\bf x, y}\in {\cal X},\alpha \in [0,1]\)
Moreover, \(f\) is said to be strictly convex if the inequality is strict when \(\bf x\ne y\) and \(\alpha\in (0,1)\). \(f\) is said to be (strictly) concave if \(-f\) is (strictly) convex.
Th. Convex function characterized by first-order differential
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is differentiable. Then \(f\) is convex iff \(\text{dom}(f)\) is convex, and
Th. Convex function characterized by second-order differential
\(f:{\cal X}\sube\mathbb{R} ^N\to \mathbb{R}\) is twice differentiable. Then \(f\) is convex iff \(\text{dom}(f)\) is convex, and its Hessian is positive semidefinite (半正定)
- Symmetric matrix is positive semidefinite if all eigenvalues non-negative
- If \(f\) is scalar (eg. \(x\mapsto x ^2\)), then \(f\) is convex iff \(\forall x\in \text{dom}(f), f''(x)\ge 0\)
- For example
- Linear functions is both convex and concave
- Any norm \(\Vert\cdot\Vert\) over convex set \(\cal X\) is a convex function
\(\Vert\alpha{\bf x}+(1-\alpha){\bf y} \Vert\le \Vert\alpha{\bf x}\Vert+\Vert(1-\alpha){\bf y} \Vert\le \alpha\Vert{\bf x}\Vert+(1-\alpha)\Vert{\bf y}\Vert\)
- Using composition rules to prove convexity
Th. Composition of convex/concave functions
Assume \(h:\mathbb{R}\to\mathbb{R}\) and \(g:\mathbb{R} ^N\to\mathbb{R}\) are twice differentiable. Define \(f({\bf x})=h(g({\bf x})), \forall {\bf x}\in \mathbb{R} ^N\), then
- \(h\) is convex & non-decreasing, \(g\) is convex \(\implies\) \(f\) is convex
- \(h\) is convex & non-increasing, \(g\) is concave \(\implies\) \(f\) is convex
- \(h\) is concave & non-decreasing, \(g\) is concave \(\implies\) \(f\) is concave
- \(h\) is concave & non-increasing, \(g\) is convex \(\implies\) \(f\) is concave
Proof: It holds for \(N=1\), which suffices to prove convexity (concavity) along all lines that intersect the domain.
Example: \(g\) could be any norm \(\Vert\cdot\Vert\)
Th. Pointwise maximum of convex functions
\(f _i\) is a convex function defined over convex set \(\cal C\) for all \(i\in I\), then \(f(x)=\sup _{i\in I}f _i(x), x\in \cal C\) is a convex function.
Proof: \(\text{Epi } f = \bigcap _{i\in I} \text{Epi } f _i\) is convex
- \(f({\bf x})=\max _{i\in I}{\bf w} _i\cdot {\bf x}+b _i\) over a convex set, is a convex function
- The maximum eigenvalue \(\lambda _{\max}({\bf M})\) over the set of symmetric matrices, is a convex function, since \(\lambda _{\max}({\bf M})=\sup _{\Vert\bf x\Vert _2\le 1}{\bf x}'{\bf Mx}\) is supremum of linear functions \({\bf M}\mapsto{\bf x}'{\bf Mx}\)
More generally, let \(\lambda _{k}(\bf M)\) denote the top \(k\) eigenvalues, then \({\bf M}\mapsto \sum _{i=1} ^{k}\lambda _{i}(\bf M)\) and \({\bf M}\mapsto \sum _{i=n-k+1} ^{n}\lambda _{i}({\bf M})=-\sum _{i=1} ^{k}\lambda _i(-\bf M)\) are both convex function
Th. Partial infimum
Convex function \(f\) defined over convex set \(\cal C\sube X\times Y\), and conves set \(\cal B\sube Y\). Then \({\cal A}=\{x\in {\cal X}:\exist y\in{\cal B}, (x, y)\in {\cal C} \}\) is convex set if non-empty, and \(g(x)=\inf _{y\in\cal B} f(x, y)\) for all \(x\in \cal A\) is convex function.
For example, the distance to convex set \(\cal B\), \(d(x)=\inf _{y\in \cal B}\Vert x-y\Vert\) is convex function
Th. Jensen's inequality
Let r.v. \(X\) in convex set \({\cal C}\sube \mathbb{R} ^N\), and convex function \(f\) defined over \(\cal C\). Then, \(\mathbb{E}[X]\in {\cal C}, \mathbb{E}[f(X)]\) is finite, and
Sketch of proof: extending \(f(\sum \alpha x)\le\sum \alpha f(x)\) and \(\sum \alpha=1\) that can be interpreted as probabilities, to arbitraty contributions.
Constrained optimization
Def. Constrained optimization problem
\({\cal X}\sube \mathbb{R} ^N,\ f, g _i:{\cal X}\to\mathbb{R}, i\in [m]\),则带约束优化问题(也称为 primal problem)的形式为
记 \(\inf _{\bf x\in\cal X} f({\bf x})=p ^*\);注意到我们没有假设任何的 convexity;对于 \(g=0\) 的约束我们可以用 \(g\le 0, -g\le 0\) 来刻画
解决这类问题,先引入拉格朗日函数 Lagrange function,将约束以非正项引入
Def. Lagrange function
为带约束优化问题定义拉格朗日函数,为
其中 \({\boldsymbol{\alpha}}=(\alpha _1,\dotsb, \alpha _m)'\) 称为对偶变量 dual variable
对于约束 \(g=0\),其系数 \(\alpha=\alpha _+ - \alpha _-\) 不需要非负(但是下文给出定理时,要求 \(g,-g\) 同时为凸,从而 \(g\) 得是仿射函数 affine,即形如 \({\bf w\cdot x+b}\))
注意到 \(p ^* = \inf _{\bf x} \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}})\),因为当 \(\bf x\) 不满足约束时 \(\sup _{\boldsymbol{\alpha}}\) 可以取到无穷大,从而刻画了约束
有趣的来了,我们能构造一个 concave function,称为对偶函数 Dual function
Def. Dual function
为带约束优化问题定义对偶函数,为
它是 concave 的,因为 \(\cal L\) 是关于 \(\boldsymbol{\alpha}\) 的线性函数,且 pointwise infimum 保持了 concavity
同时注意到对任意 \({\boldsymbol{\alpha}}\),\(F({\boldsymbol{\alpha}})\le \inf _{\bf x\in\cal X}f({\bf x})=p ^*\)
定义对偶问题
Def. Dual problem
为带约束优化问题定义对偶问题,为
对偶问题是凸优化问题,即求 concave 函数的最大值,记其为 \(d ^*\);由上文可知 \(d ^*\le p ^*\),也就是:
\[d ^* = \sup _{\boldsymbol{\alpha}} \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha}})\le \inf _{\bf x} \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}}) = p ^* \]称为弱对偶 weak duality,取等情况称为强对偶 strong duality
接下来会给出:
当凸优化问题满足约束规范性条件 constraint qualification (Slater's contidion) 时(此为充分条件),有 \(d ^*=p ^*\),且该解的充要条件为拉格朗日函数的鞍点 saddle point
Def. Constraint qualification (Slater's condition)
假设集合 \(\cal X\) 的内点非空 \(\text{int}({\cal X})\ne \empty\):
- 定义 strong constraint qualification (Slater's condition) 为
- 定义 weak constraint qualification (weak Slater's condition) 为
这个条件有点像在说约束区域是一个 convex set ?
基于 Slater's condition,叙述拉格朗日函数的鞍点 saddle point 是带约束优化问题的解的充要条件
Th. Saddle point - sufficient condition
带约束优化问题,如果其拉格朗日函数存在鞍点 saddle point \(({\bf x} ^*, \boldsymbol{\alpha} ^*)\),即
则 \({\bf x} ^*\) 是该问题的解,\(f({\bf x} ^*)=\inf f({\bf x})\)
Th. Saddle point - necessary condition
假设 \(f, g _i, i\in [m]\) 为 convex function:
- 若满足 Slater's condition,则带约束优化问题的解 \(\bf x ^*\) 满足存在 \(\boldsymbol{\alpha} ^*\ge 0\) 使得 \(({\bf x} ^*, \boldsymbol{\alpha} ^*)\) 是拉格朗日函数的鞍点
- 若满足 weak Slater's condition 且 \(f, g _i\) 可导,则带约束优化问题的解 \(\bf x ^*\) 满足存在 \(\boldsymbol{\alpha} ^*\ge 0\) 使得 \(({\bf x} ^*, \boldsymbol{\alpha} ^*)\) 是拉格朗日函数的鞍点
充要性证明。由于书本上没提供必要性的证明,且充分性证明不难但是不够漂亮,所以就不抄了,只给出我自己的思路(虽然可能有缺陷,下图也只是示意):
\[d ^* = \sup _{\boldsymbol{\alpha}} \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha}})\le\inf _{\bf x} \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}}) = p ^* \]
回到最初的不等式:定义 \({\bf x} ^*\) 取到 \(p ^*=\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x}, {\boldsymbol{\alpha}}),\ \forall {\bf x}\)
定义 \({\boldsymbol{\alpha}} ^*\) 取到 \(d ^*=\inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\ge \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha}}),\ \forall {\boldsymbol{\alpha}}\)充分性:
若存在鞍点,反证法(见图右上),假设 \(({\bf x} ^* , {\boldsymbol{\alpha}} ^*)\) 不是鞍点,记鞍点为 \(({\bf x}' , {\boldsymbol{\alpha}}')\),则 \({\cal L}({\bf x}', {\boldsymbol{\alpha}} ^*){\color{green}<} {\cal L}({\bf x}', {\boldsymbol{\alpha}}')=\inf _{\bf x}{\cal L}({\bf x}, {\boldsymbol{\alpha}}'){\color{green}\le} \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\),观察不等号两头发现矛盾,则 \({\boldsymbol{\alpha} ^*}={\boldsymbol{\alpha}}'\);对 \(\bf x ^*,x'\) 同理
从而 \(({\bf x} ^* , {\boldsymbol{\alpha}} ^*)\) 是鞍点,即 \(\forall {\bf x},\forall {\boldsymbol{\alpha}},\ {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}} ^*)\le {\cal L}({\bf x}, {\boldsymbol{\alpha}} ^*)\),则有 \(\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}}) = {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}} ^*) = \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\),从而 \(p ^* = d ^*\)必要性:
若 \(p ^* =d ^*\),即 \(\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})=\inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\),且 \(\sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\ge {\cal L}({\bf x} ^*, {\boldsymbol{\alpha} ^*})\ge \inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\);故三者取等,故 \(\forall {\bf x},\forall {\boldsymbol{\alpha}},\ {\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})\le \sup _{\boldsymbol{\alpha}}{\cal L}({\bf x} ^*, {\boldsymbol{\alpha}})={\cal L}({\bf x} ^*, {\boldsymbol{\alpha} ^*})=\inf _{\bf x} {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\le {\cal L}({\bf x}, {\boldsymbol{\alpha} ^*})\),从而 \(({\bf x} ^* , {\boldsymbol{\alpha}} ^*)\) 是鞍点爽!
综上,我们用一个定理来总结:KKT
Th. Karush-Kuhn-Tucker's theorem
带约束优化问题,假设 \(f, g _i:{\cal X}\to\mathbb{R},\forall i\in[m]\),为 convex and differentiable,且满足 Slater's condition
其拉格朗日函数为 \({\cal L}({\bf x}, {\boldsymbol{\alpha}})=f({\bf x}) + {\boldsymbol{\alpha}}\cdot {\bf g}({\bf x}), {\boldsymbol{\alpha}}\ge 0\)
则 \(\bar{\bf x}\) 是该问题的解,当且仅当存在 \(\bar{\boldsymbol{\alpha}}\ge 0\),满足:
其中后两条称为互补条件 complementarity conditions,即对任意 \(i\in[m], \bar{\boldsymbol{\alpha}} _i\ge 0,g _i(\bar{\bf x})\le 0\),且满足 \(\bar{\boldsymbol{\alpha}} _i g _i(\bar{\bf x})=0\)
充要性证明:
必要性,\(\bar{\bf x}\) 为解,则存在 \(\bar{\boldsymbol{\alpha}}\) 使得 \((\bar{\bf x}, \bar{\boldsymbol{\alpha}})\) 为鞍点,从而得到 KKT 条件:第一条即鞍点定义,第二、三条:
\[\begin{aligned} \forall {{\boldsymbol{\alpha}}},{\cal L}(\bar{\bf x}, {\boldsymbol{\alpha}})\le{\cal L}(\bar{\bf x}, \bar{\boldsymbol{\alpha}})&\implies{\boldsymbol{\alpha}}\cdot {\bf g}(\bar{\bf x})\le \bar{\boldsymbol{\alpha}}\cdot {\bf g}(\bar{\bf x})\\{\boldsymbol{\alpha}}\to +\infty&\implies {\bf g}(\bar{\bf x})\le 0 \\ {\boldsymbol{\alpha}}\to 0 &\implies \bar{\boldsymbol{\alpha}}\cdot{\bf g}(\bar{\bf x})=0 \end{aligned} \]充分性,满足 KKT 条件,则对于满足 \(\bf g({\bf x})\le 0\) 的 \(\bf x\):
\[\begin{aligned} f({\bf x}) - f(\bar{\bf x}) &\ge \nabla _{\bf x} f(\bar{\bf x})\cdot ({\bf x-\bar{x}}) & ;\text{convexity of $f$} \\ &= -\bar{\boldsymbol{\alpha}}\cdot \nabla _{\bf x} {\bf g}(\bar{\bf x})\cdot ({\bf x-\bar{x}}) &; \text{first cond}\\ &\ge -\bar{\boldsymbol{\alpha}}\cdot ({\bf g}({\bf x})-{\bf g}(\bar{\bf x})) &; \text{convexity of $g$}\\ &= -\bar{\boldsymbol{\alpha}}\cdot {\bf g}({\bf x}) \ge 0 &;\text{third cond} \end{aligned} \]
Fenchel duality
凸优化问题,对 \(f\) 不可导或者有无穷大的值的情况进行分析
标签:bf,boldsymbol,bar,Convex,convex,笔记,Optimization,cal,alpha From: https://www.cnblogs.com/zhyh/p/17642673.html