Statistic Final Review
Covariance
Definition 4.2. The covariance of \(X\) and \(Y\) is
\[\operatorname{Cov}(X, Y)=\mathbb{E}(X Y)-\mathbb{E} X \cdot \mathbb{E} Y \]If \(X\) and \(Y\) are independent then \(\operatorname{Cov}(X, Y)=0\). But the converse is not true: we can have \(X, Y\) with \(\operatorname{Cov}(X, Y)=0\) and \(X\) and \(Y\) not independent.
Covariance can be positive, negative, or \(0 .\)
Lemma 4.3.
The formula generalizes:
Lemma 4.4.
(Remember that in this formula \(i<j\) so we add \(2 \operatorname{Cov}\left(X_{1}, X_{2}\right)\), for example, not 1 times or 4 times.)
Properties of Covariance
If \(X, Y, W\), and \(V\) are real-valued random variables and \(a, b, c, d\) are real-valued constants, then the following facts are a consequence of the definition of covariance:
\[\begin{aligned} \operatorname{cov}(X, a) &=0 \\ \operatorname{cov}(X, X) &=\operatorname{var}(X) \\ \operatorname{cov}(X, Y) &=\operatorname{cov}(Y, X) \\ \operatorname{cov}(a X, b Y) &=a b \operatorname{cov}(X, Y) \\ \operatorname{cov}(X+a, Y+b) &=\operatorname{cov}(X, Y) \\ \operatorname{cov}(a X+b Y, c W+d V) &=a c \operatorname{cov}(X, W)+a d \operatorname{cov}(X, V)+b c \operatorname{cov}(Y, W)+b d \operatorname{cov}(Y, V) \end{aligned} \]For a sequence \(X_{1}, \ldots, X_{n}\) of random variables in real-valued, and constants \(a_{1}, \ldots, a_{n}\), we have
\[\operatorname{var}\left(\sum_{i=1}^{n} a_{i} X_{i}\right)=\sum_{i=1}^{n} a_{i}^{2} \sigma^{2}\left(X_{i}\right)+2 \sum_{i, j: i<j} a_{i} a_{j} \operatorname{cov}\left(X_{i}, X_{j}\right)=\sum_{i, j} a_{i} a_{j} \operatorname{cov}\left(X_{i}, X_{j}\right) \]Hoeffding's covariance identity
A useful identity to compute the covariance between two random variables \(X, Y\) is the Hoeffding's covariance identity: [7]
\[\operatorname{cov}(X, Y)=\int_{\mathbb{R}} \int_{\mathbb{R}}\left(F_{(X, Y)}(x, y)-F_{X}(x) F_{Y}(y)\right) d x d y \]where \(F_{(X, Y)}(x, y)\) is the joint cumulative distribution function of the random vector \((X, Y)\) and \(F_{X}(x), F_{Y}(y)\) are the marginals.
Covariance Matrix
Definition 4.5. The covariance matrix \(\Sigma\) of a random vector \(\vec{X}\) is an \(m \times m\) matrix where
\[\Sigma_{i j}=\operatorname{Cov}\left(X_{i}, X_{j}\right) . \]This means the diagonal entries \(\Sigma_{i i}, i=1, \ldots, m\), are the variances of the components. \(\Sigma_{i i}=\operatorname{Var}\left(X_{i}\right) .\)
We can also write the covariance matrix in terms of vectors:
This is the expectation of the outer product of a vector with itself. (Here we assume \(\vec{X}\) and \(\vec{\mu}\) are column vectors).
Properties of covariance matrices
Let \(\mathbb{E} \vec{X}=\vec{\mu}\) and \(\Sigma=\mathbb{E}\left[(\vec{X}-\vec{\mu})(\vec{X}-\vec{\mu})^{T}\right]\). Then
- \(\Sigma=\mathbb{E}\left[X X^{T}\right]-\vec{\mu} \vec{\mu}^{T}\).
- \(\Sigma\) is a symmetric matrix. That is, \(\Sigma_{i j}=\Sigma_{j i}\).
- \(\Sigma\) is a positive semi-definite matrix. That is, for every vector \(\vec{y} \in \mathbb{R}^{m}\),
Sums of independent random variables
Suppose \(X\) and \(Y\) are independent, continuous random variables, and let \(Z=X+Y\). What is the distribution of \(Z\) ?
We can calculate
or
\[f_{Z}(t)=\int_{-\infty}^{\infty} f_{X}(x) f_{Y}(t-x) d x \]We can do the same for discrete random variables. Suppose \(X\) and \(Y\) are independent, discrete random variables, and let \(Z=X+Y\). What is the distribution of \(Z\) ?
\[F_{Z}(t)=\sum_{x} \sum_{y: x+y \leq t} f_{X}(x) f_{Y}(y) . \]or
\[f_{Z}(t)=\sum_{k} f_{X}(k) f_{Y}(t-k) . \]The first and second moment methods
Markov's Inequality & first moment method
Theorem \(5.1\) (Markov's inequality). Suppose \(X\) is a non-negative random variable. Then
\[P[X \geq t] \leq \frac{\mathbb{E} X}{t} \]Say we have a counting (or any non-negative) random variable \(X\) and its expectation is small. Then Markov's inequality tells us that \(X\) cannot be too big too often.
The First moment method
Lemma \(5.2\) (The first moment method for counting random variables). Let \(B_{1}(n), B_{2}(n)\), \(\ldots, B_{m}(n)\) be a sequence of \((b a d)\) events that may depend on some hidden parameter \(n\). If
\[\lim _{n \rightarrow \infty} \sum_{i=1}^{m} P\left(B_{i}(n)\right)=0 \]then \(\lim _{n \rightarrow \infty} P[\) At least one bad event occurs \(]=0\).
This is the first moment method.
Example: Let \(M_{n}\) be an \(n \times n\) matrix in which each entry is 0 or 1 with probability \(1 / 2\) each, independently of all other entries. Let \(B\) be the event that there is a row of all 0 's. Show that \(P(B) \rightarrow 0\) as \(n \rightarrow \infty\)
坏事情不会发生,因为每个所有坏事的概率的和=0.
Chebyshev's Inequality & second moment method
Theorem \(5.3\) (Chebyshev's Inequality). Let \(X\) be any random variable with finite mean and variance. Then
\[P[|X-\mathbb{E} X| \geq t] \leq \frac{\operatorname{Var}(X)}{t^{2}} \]In statistics we will be most interested in using Chebyshev's Inequality with a sequence of random variables and asking about the behavior in the limit.
The second moment method
Corollary 5.4. Suppose \(X\) is a non-negative, integer-valued random variable. Then
\[P(X \geq 1) \geq 1-\frac{\operatorname{Var}(X)}{(\mathbb{E} X)^{2}} \]Or in other words, let \(B_{1}(n), B_{2}(n), \ldots, B_{m(n)}(n)\) be a sequence of (bad) events that may depend on some hidden parameter \(n\). Let \(X=\sum_{i=1}^{m(n)} 1_{B_{i}}\) be the number of these events that occur. If
\[\lim _{n \rightarrow \infty} \frac{\operatorname{Var}(X)}{(\mathbb{E} X)^{2}}=0 \]then \(\lim _{n \rightarrow \infty} P[\) At least one bad event occurs \(]=1\).
坏事情会发生,因为坏事发生的次数是稳定的,并且次数的期望增长速度大于次数的散度。
Convergence of RV
3 types of convergence:
- Convergence in probability
- Convergence in distribution
- Almost sure convergence \(^*\) (non-examinbable)
Convergence in Probability
Definition 6.2. Let \(X_{1}, X_{2}, \ldots\) be an infinite sequence of random variables, and let \(X\) be a single random variable, all defined on the same probability space.
Then we say \(X_{n}\) converges to \(X\) in probability if for every \(\epsilon>0\),
Notice that this is a statement about the limit of probabilities.
Convergence in distribution
Example 6.3. Let \(X_{1}, \ldots\) be a sequence of random variables with \(X_{n} \sim \operatorname{Bin}(n, \lambda / n)\). Then \(\lim _{n \rightarrow \infty} P\left[X_{n}=k\right]=P[\) Pois \((\lambda)=k]\)
Definition 6.4. We say a sequence of random variables \(X_{1}, X_{2}, \ldots\) converges in distribution to a random variable \(X\) if
\[\lim _{n \rightarrow \infty} F_{X_{n}}(t)=F_{X}(t) \]for every \(t \in \mathbb{R}\) at which \(F_{X}(t)\) is continuous (e.g.convergence holds for all continuity points of \(F_{X}(t)\). We write this with a double arrow:)
\[X_{n} \Rightarrow X \]Lemma 6.6. Let \(X_{1}, X_{2}, \ldots\) be a sequence of discrete random variables and \(X\) another discrete random variable, they are all non-negative integer valued. Then,
\[X_{n} \Rightarrow X \]if and only if
\[\lim _{n \rightarrow \infty} f_{X_{n}}(t)=f_{X}(t) \]for all \(t \in \mathbb{R}\).
The Law of Large Numbers
Weak Law of Large Numbers (WLLN)
Theorem 6.12 (Weak Law of Large Numbers). Let \(X_{1}, X_{2}, \ldots\) be a sequence of \(i . i . d\). random variables with \(\mathbb{E} X_{j}=\mu\) and \(\operatorname{Var}\left(X_{j}\right)=\sigma^{2}<\infty\). Let \(\bar{S}_{n}=\frac{X_{1}+\cdots+X_{n}}{n}\). Then for every \(\epsilon>0\),
\[\lim _{n \rightarrow \infty} P\left[\left|\bar{S}_{n}-\mu\right|>\epsilon\right]=0 \]Strong Law of Large Numbers is non-examinable.
Characteristic functions
Definition 6.14. The characteristic function \(\phi_{X}(t)\) of a random variable \(X\) is the function
\[\phi_{X}(t)=\mathbb{E}\left[e^{i t X}\right] \](This is related to the Fourier transform of the distribution of \(X\) ).
Theorem \(6.15\) (Levy's Continuity Theorem). Let \(X_{1}, X_{2}, \ldots\) be a sequence of random variables and \(X\) another random variable. Then the following are equivalent:
- \(X_{n} \Rightarrow X\) (convergence in distribution).
- \(\lim _{n \rightarrow \infty} \phi_{X_{n}}(t)=\phi_{X}(t)\) for all \(t\).
Limit theorems via characteristic functions
A proof of the WLLN
Here is the Weak Law of Large Numbers, assuming only the existence of a finite mean.
Theorem 6.16. Let \(X_{1}, X_{2}, \ldots\) be i.i.d. random variables with \(\mathbb{E}_{2} X_{k}=\mu .\) Let \(\bar{S}_{n}=\frac{X_{1}+\cdots X_{n}}{n}\). Then \(\bar{S}_{n} \Rightarrow \mu\)
(Notice the conditions are less restrictive than those when we proved the WLLN using Chebyshev's inequality).
The Central Limit Theorem
Theorem 6.17. Let \(X_{1}, X_{2}, \ldots\) be i.i.d. random variables with \(\mathbb{E} X_{i}=\mu\) and \(\operatorname{Var}\left(X_{i}\right)=\sigma^{2}\). Let
\[\tilde{S}_{n}=\frac{X_{1}+\cdots+X_{n}-n \mu}{\sigma \sqrt{n}} \]Then \(\tilde{S}_{n} \Rightarrow N(0,1)\).
Moment generating function
Definition 6.18. The moment generating function of a random variable \(X\) is the function
\[M_{X}(t)=\mathbb{E}\left[e^{t X}\right] \]Beware: the function may not be defined for all values of \(t\), i.e. the expectation might be \(\infty .\)
Probability generating function
Definition 6.19. The probability generating function of a counting random variable \(X\) is the function
\[G_{X}(t)=\sum_{k=0}^{\infty} t^{k} P[X=k]=\mathbb{E}\left[t^{X}\right] \]Beware: the function may not be defined for all values of \(t\), ie. the sum might be \(\infty\).
Random walks
The simple symmetric random walk
Definition 7.2. The simple symmetric random walk is the discrete-time, discrete-space stochastic process \(S_{0}, S_{1}, S_{2}, \ldots\) with \(S_{0}=0\) and \(S_{n}=X_{1}+\cdots+X_{n}\) where the \(X_{j}\) 's are i.i.d. \(\pm 1\) with probability \(1 / 2\) each.
'Simple' refers to the fact that the increments are just \(+1\) and \(-1\). 'Symmetric' refers to the fact that the probability of \(+1\) is \(1 / 2\).
We can define other random walks. Given any distribution \(X\), let \(X_{1}, X_{2}, \ldots\) be i.i.d. copies of \(X\). Then let \(S_{0}=0\) and \(S_{n}=X_{1}+\cdots+X_{n}\) is a random walk.
Exercise: prove that any random walk is a Markov chain.
Basic properties of the Simple Symmetric Random Walk (SSRW)
- \(\mathbb{E} S_{n}=0\)
- \(\operatorname{Var}\left(S_{n}\right)=n\)
- \(S_{n} / n \rightarrow 0\) in probability and almost surely. (Law of Large Numbers)
- \(S_{n} / \sqrt{n} \Rightarrow N(0,1)\). (Central Limit Theorem)