【智应数】Markow chains

标签：mathbb Markow limits chains mu distribution pi Omega 智应数

Markow Chain & Stationary Distribution

Def(Finite Markow Chain). Let $\Omega$ be a finite set of states, $\forall x,y\in\Omega,P(x,y)\ge 0$ be a transition function, i.e., $\sum\limits_{y\in\Omega} P(x,y)=1.$ A finite Markow chain is a sequence of random variables

\[(X_0,X_1,X_2,...) \]

if $\forall t$, for any $x_0,x_1,...,x_t\in\Omega$ we have

\[\mathbb{P}(X_{t+1}=x_{t+1}\mid X_0=x_0,X_1=x_1,...,X_t=x_t)=\mathbb{P}(X_{t+1}=x_{t+1}\mid X_t=x_t)=P(x_t,x_{t+1}). \]

Given the initial distribution $\mu_0$ over $\Omega$ and the transition matrix $P$, we can calculate the distribution after $t$ transitions

\[\mu_t=\mu_{t-1}P=\mu_0P^t. \]

Def. A distribution $\pi$ is called stationary if $\pi P=\pi$.

Lem(Brouwer's fixed point theorem). Every continuous map $f$ from a convex, closed, bounded set to itself, has a fixed point $f(x)=x$.

Thm(Perron-Frobemins thm). Any Markov chain over finite states has a stationary distribution.

Pf. Let $|\Omega|=n,\Delta^{n-1}=\{\mu\in\mathbb{R}^n\mid \mu_i\ge 0,\sum\mu_i=1\}$. We can show that $\Delta^{n-1}$ is convex, closed, bounded. Since $P$ is a linear map from $\Delta^{n-1}$ to itself, by Brouwer's fixed point theorem there is a stationary distribution $\pi\in\Delta^{n-1}$ s.t. $\pi P=\pi$.

Def. A finite Markov chain is irreducible (connected) if $\forall x,y\in\Omega,\exists t>0$ s.t. $P^t(x,y)>0$.

Thm(Fundamental Theorem of Markov Chains). For a connected Markov chain there is a unique stationary distribution $\pi$. Moreover, any initial distribution will converge to $\pi$.

Pf. Suppose $Ph=h$ for some $h\in\mathbb{R}^n$. Suppose $i_0=\arg\max\limits_i h_i$. Let $h(i_0)=M$. Then $\forall j\in\Omega$ s.t. $P(i_0,j)>0$, then $h(j)=M$. By irreducibility, $\forall i, h(i)=M$. Thus $Ph=h$ implies $h=M\vec{1}$. It immediately gives $\text{rank}(P-I)=n-1$ and the theorem.

Def(Reversed Markov Chain). Given a Markov Chain $(P,\Omega)$. Denote $\pi$ as a stationary distribution. Let $$\hat{P}(x,y)=\frac{\pi(y)P(y,x)}{\pi(x)}.$$ Then $(\hat{P},\Omega)$ is the reversed Markov chain.

Markov Chain Monte Carlo

Given a distribution $\pi$ and a function $f$ over $\Omega$ (usually $\Omega=\{0,1...,n-1\}^d$), estimate

\[\underset{\pi}{\mathbb{E}}(f)=\sum\limits_{x\in\Omega}f(x)\pi(x). \]

MCMC method:

Design $P$ s.t. $\pi P=\pi$.
Generate $X_0,X_1,...,X_T$. $X_t$ is transferred from $X_{t-1}$ according to $P$.
Calculate $\frac{1}{T}\sum\limits_{t=1}^Tf(X_T)$ as an estimate of $\mathbb{E}(f)$.

Let $\hat{\mu}=\frac{1}{T}\sum\limits_{t=1}^T\mu_t$. If $\lim\limits_{t\rightarrow\infty}\mu_t=\pi$, then $\lim\limits_{T\rightarrow\infty}\underset{\hat{\mu}}{\mathbb{E}}(f)=\underset{\pi}{\mathbb{E}}(f)$, where. Since $X_t\sim \mu_t$, it works.

Consider a graph $G=(\Omega, E)$, where $E=\{(x,y)\mid P(x,y)>0\}.$

To ensure the time complexity, the degree of vertices in $G$ should be small.

To ensure the convergence rate, the diameter of $G$ should be small (or holds some other property).

For example, if $\Omega=\{0,1\}^d$, a possible way is to let $G$ be the hypercube.

Metropolis-Hasting Algorithm

\[P(x,y)= \begin{cases} \Phi(x,y)(1\wedge\frac{\pi(y)}{\pi(x)}) & x\neq y\\ 1-\sum\limits_{z\neq x}\Phi(x,z)(1\wedge\frac{\pi(z)}{\pi(x)}) & x=y \end{cases}\]

here $\Phi$ is a symmetric transition matrix.

Lem(Detailed balance equation). For a given transition matrix $P$, if there exists a distribution $\pi$ s.t.

\[\pi (x) P(x,y)=P(y) \pi (y,x),\forall x,y \in \Omega , \]

then $\pi$ is a stationary distribution and $P$ is reversible.

Ex. Given a graph $G=(V,E)$. Let $f:\{0,1\}^{|V|}\rightarrow\mathbb{Z}^{+}$ be the size of cut. Find $\max\limits_x f(x).$

Solution: Define $\pi_{\lambda}(x)=\lambda^{f(x)}$. Applying MH algorithm with $\Phi(x,y)=\left[\sum\limits_i[x(i)\neq y(i)]=1\right]$.

Gibbs Sampling

\[P(x,y)= \begin{cases} \frac{1}{d}\frac{\pi(y)}{\sum\limits_a\pi(x_1,...,x_{i-1},a,x_{i+1},...,x_n)} & \{j\mid x_j\neq y_j\}=\{i\}\\ 0 & otherwise \end{cases}\]

Intuitively, choose a variables $x_i$ randomly and update it according to $\pi$. A alternative scheme is choose $x_i$ by sequentially scanning from $x_1$ to $x_d$.

Mixing time

标签：mathbb,Markow,limits,chains,mu,distribution,pi,Omega,智应数
From： https://www.cnblogs.com/xcyle/p/18211665

Markow Chain & Stationary Distribution

Markov Chain Monte Carlo

Metropolis-Hasting Algorithm

Gibbs Sampling

Mixing time

相关文章

赞助商

阅读排行