广义线性三:迭代加权最小二乘
指数族分布
组成
GLM是一种回归模型,将响应变量Y的条件期望值的函数与一组协变量x相关联:
g
(
μ
i
)
=
η
i
=
x
i
T
β
g\left(\mu_{i}\right)=\eta_{i}=\boldsymbol{x}_{i}^{T} \boldsymbol{\beta}
g(μi)=ηi=xiTβ
其中
μ
i
=
E
(
Y
i
∣
x
i
)
\mu_{i}=E\left(Y_{i} \mid \boldsymbol{x}_{i}\right)
μi=E(Yi∣xi)。
一个GLM有以下三个组成部分:一个随机组成部分,一个系统组成部分和一个链接函数。
随机组成部分
响应变量Y,其属于指数族分布,pdf具有规范形式,即 α ( y ) = y \alpha(y)=y α(y)=y。
联合概率分布函数:
f
(
y
1
,
…
,
y
n
;
θ
1
,
…
,
θ
n
)
=
∏
i
=
1
n
exp
[
y
i
b
(
θ
i
)
+
c
(
θ
i
)
+
d
(
y
i
)
]
=
exp
{
∑
i
=
1
n
[
y
i
b
(
θ
i
)
+
c
(
θ
i
)
+
d
(
y
i
)
]
}
\begin{aligned} &f\left(y_{1}, \ldots, y_{n} ; \theta_{1}, \ldots, \theta_{n}\right)\\ & =\prod_{i=1}^{n} \exp \left[y_{i} b\left(\theta_{i}\right)+c\left(\theta_{i}\right)+d\left(y_{i}\right)\right] \\ & =\exp \left\{\sum_{i=1}^{n}\left[y_{i} b\left(\theta_{i}\right)+c\left(\theta_{i}\right)+d\left(y_{i}\right)\right]\right\} \end{aligned}
f(y1,…,yn;θ1,…,θn)=i=1∏nexp[yib(θi)+c(θi)+d(yi)]=exp{i=1∑n[yib(θi)+c(θi)+d(yi)]}
系统组成部分
也称作线性预测部分:
η
i
=
x
i
T
β
=
∑
j
=
0
p
β
j
x
j
i
\eta_{i}=\boldsymbol{x}_{i}{ }^{T} \boldsymbol{\beta}=\sum_{j=0}^{p} \beta_{j} x_{j i}
ηi=xiTβ=j=0∑pβjxji
链接函数
光滑的一对一参数变换
g
(
μ
)
g(\mu)
g(μ)连接随机和系统部分,将Y的条件期望
E
(
Y
∣
x
)
E(Y\mid x)
E(Y∣x)和线性预测部分
η
\eta
η相连接:
g
(
μ
i
)
=
η
i
=
x
i
T
β
g\left(\mu_{i}\right)=\eta_{i}=\boldsymbol{x}_{i}^{T} \boldsymbol{\beta}
g(μi)=ηi=xiTβ
若
g
(
μ
)
=
b
(
θ
)
g(\mu)=b(\theta)
g(μ)=b(θ),那么称
g
(
μ
)
g(\mu)
g(μ)为规范链接。
e.g.1(逻辑回归)
随机成分: Y i ∣ x i ∼ B ( π i ) , E ( Y i ∣ x i ) = π i Y_i\mid x_i\sim B(\pi_i),\quad E(Y_i\mid x_i)=\pi_i Yi∣xi∼B(πi),E(Yi∣xi)=πi
系统成分: η i = β 0 + β 1 x 1 i + ⋯ + β p x p i \eta_{i}=\beta_{0}+\beta_{1} x_{1 i}+\cdots+\beta_{p} x_{p i} ηi=β0+β1x1i+⋯+βpxpi
连接函数: g ( π i ) = log ( π i 1 − π i ) = b ( π i ) g\left(\pi_{i}\right)=\log \left(\frac{\pi_{i}}{1-\pi_{i}}\right)=b\left(\pi_{i}\right) g(πi)=log(1−πiπi)=b(πi)
综上有:
log
(
π
i
1
−
π
i
)
=
β
0
+
β
1
x
1
i
+
⋯
+
β
p
x
p
i
\log \left(\frac{\pi_{i}}{1-\pi_{i}}\right)=\beta_{0}+\beta_{1} x_{1 i}+\cdots+\beta_{p} x_{p i}
log(1−πiπi)=β0+β1x1i+⋯+βpxpi
e.g.2(其他二元响应)
随机成分: Y i ∣ x i ∼ B ( π i ) , E ( Y i ∣ x i ) = π i Y_i\mid x_i\sim B(\pi_i),\quad E(Y_i\mid x_i)=\pi_i Yi∣xi∼B(πi),E(Yi∣xi)=πi
系统成分: η i = β 0 + β 1 x 1 i + ⋯ + β p x p i \eta_{i}=\beta_{0}+\beta_{1} x_{1 i}+\cdots+\beta_{p} x_{p i} ηi=β0+β1x1i+⋯+βpxpi
连接函数:
线性概率: g ( μ i ) = π i g(\mu_i)=\pi_i g(μi)=πi
probit回归: g ( μ i ) = Φ − 1 ( π i ) g\left(\mu_{i}\right)=\Phi^{-1}\left(\pi_{i}\right) g(μi)=Φ−1(πi),其中 Φ ( z ) = P ( Z ≤ z ) , Z ∼ N ( 0 , 1 ) \Phi(z)=P(Z \leq z) ,\quad Z \sim N(0,1) Φ(z)=P(Z≤z),Z∼N(0,1)
似然估计
通过极大化对数似然估计GLM中的
β
\beta
β:
g
(
μ
i
)
=
η
i
=
β
0
+
β
1
x
1
i
+
⋯
+
β
p
x
p
i
g\left(\mu_{i}\right)=\eta_{i}=\beta_{0}+\beta_{1} x_{1 i}+\cdots+\beta_{p} x_{p i}
g(μi)=ηi=β0+β1x1i+⋯+βpxpi
当Y是具有规范形式的指数族分布时,有对数似然:
l
(
θ
1
,
…
,
θ
n
;
y
)
=
∑
i
=
1
n
l
i
=
∑
i
=
1
n
[
y
i
b
(
θ
i
)
+
c
(
θ
i
)
+
d
(
y
i
)
]
l\left(\theta_{1}, \ldots, \theta_{n} ; \boldsymbol{y}\right)=\sum_{\mathrm{i}=1}^{n} l_{i}=\sum_{\mathrm{i}=1}^{n}\left[y_{i} b\left(\theta_{i}\right)+c\left(\theta_{i}\right)+d\left(y_{i}\right)\right]
l(θ1,…,θn;y)=i=1∑nli=i=1∑n[yib(θi)+c(θi)+d(yi)]
对数似然函数通过传导路径:
β
∼
η
∼
μ
∼
θ
\beta \sim \eta \sim \mu \sim \theta
β∼η∼μ∼θ,建立起与
β
\beta
β的联系。
因此有得分函数:
∂
l
∂
β
j
=
u
(
β
j
)
=
u
j
=
∑
i
=
1
n
(
∂
l
i
∂
θ
i
∂
θ
i
∂
μ
i
∂
μ
i
∂
η
i
∂
η
i
∂
β
j
)
\frac{\partial l}{\partial \beta_{j}}=u\left(\beta_{j}\right)=u_{j}=\sum_{i=1}^{n}\left(\frac{\partial l_{i}}{\partial \theta_{i}} \frac{\partial \theta_{i}}{\partial \mu_{i}} \frac{\partial \mu_{i}}{\partial \eta_{i}} \frac{\partial \eta_{i}}{\partial \beta_{j}}\right)
∂βj∂l=u(βj)=uj=i=1∑n(∂θi∂li∂μi∂θi∂ηi∂μi∂βj∂ηi)
分解上述求导过程:
∂
l
i
∂
θ
i
=
y
i
b
′
(
θ
i
)
+
c
′
(
θ
i
)
=
b
′
(
θ
i
)
[
y
i
+
c
′
(
θ
i
)
b
′
(
θ
i
)
]
=
b
′
(
θ
i
)
(
y
i
−
μ
i
)
\begin{aligned} \frac{\partial l_{i}}{\partial \theta_{i}}=y_{i} b^{\prime}\left(\theta_{i}\right)+c^{\prime}\left(\theta_{i}\right)=&b^{\prime}\left(\theta_{i}\right)\left[y_{i}+\frac{c^{\prime}\left(\theta_{i}\right)}{b^{\prime}\left(\theta_{i}\right)}\right]\\=&b^{\prime}\left(\theta_{i}\right)\left(y_{i}-\mu_{i}\right) \end{aligned}
∂θi∂li=yib′(θi)+c′(θi)==b′(θi)[yi+b′(θi)c′(θi)]b′(θi)(yi−μi)
其中
E
(
Y
i
∣
x
i
)
=
μ
i
=
−
c
′
(
θ
i
)
b
′
(
θ
i
)
E\left(Y_{i} \mid \boldsymbol{x}_{i}\right)=\mu_{i}=\frac{-c^{\prime}\left(\theta_{i}\right)}{b^{\prime}\left(\theta_{i}\right)}
E(Yi∣xi)=μi=b′(θi)−c′(θi)
∂
μ
i
∂
θ
i
=
−
c
′
′
(
θ
i
)
b
′
(
θ
i
)
+
c
′
(
θ
i
)
b
′
′
(
θ
i
)
[
b
′
(
θ
i
)
]
2
=
b
′
(
θ
i
)
Var
(
Y
i
)
\frac{\partial \mu_{i}}{\partial \theta_{i}}=\frac{-c^{\prime \prime}\left(\theta_{i}\right) b^{\prime}\left(\theta_{i}\right)+c^{\prime}\left(\theta_{i}\right) b^{\prime \prime}\left(\theta_{i}\right)}{\left[b^{\prime}\left(\theta_{i}\right)\right]^{2}}=b^{\prime}\left(\theta_{i}\right) \operatorname{Var}\left(Y_{i}\right)
∂θi∂μi=[b′(θi)]2−c′′(θi)b′(θi)+c′(θi)b′′(θi)=b′(θi)Var(Yi)
因此:
∂
θ
i
∂
μ
i
=
(
∂
μ
i
∂
θ
i
)
−
1
=
1
b
′
(
θ
i
)
Var
(
Y
i
)
\frac{\partial \theta_{i}}{\partial \mu_{i}}=\left(\frac{\partial \mu_{i}}{\partial \theta_{i}}\right)^{-1}=\frac{1}{b^{\prime}\left(\theta_{i}\right) \operatorname{Var}\left(Y_{i}\right)}
∂μi∂θi=(∂θi∂μi)−1=b′(θi)Var(Yi)1
在未选择具体的连接函数之前,是无法表示:
∂
μ
i
∂
η
i
\frac{\partial \mu_{i}}{\partial \eta_{i}}
∂ηi∂μi
最后:
η
i
=
β
0
+
β
1
x
1
i
+
⋯
+
β
p
x
p
i
\eta_{i}=\beta_{0}+\beta_{1} x_{1 i}+\cdots+\beta_{p} x_{p i}
ηi=β0+β1x1i+⋯+βpxpi
因此:
∂
η
i
∂
β
j
=
x
j
i
\frac{\partial \eta_{i}}{\partial \beta_{j}}=x_{j i}
∂βj∂ηi=xji
结合上述步骤,即能获得得分函数:
u
j
=
∑
i
=
1
n
(
∂
l
i
∂
θ
i
∂
θ
i
∂
μ
i
∂
μ
i
∂
η
i
∂
η
i
∂
β
j
)
=
∑
i
=
1
n
b
′
(
θ
i
)
(
y
i
−
μ
i
)
1
b
′
(
θ
i
)
Var
(
Y
i
)
∂
μ
i
∂
η
i
x
j
i
=
∑
i
=
1
n
(
y
i
−
μ
i
)
x
j
i
Var
(
Y
i
)
∂
μ
i
∂
η
i
\begin{aligned} u_{j} & =\sum_{i=1}^{n}\left(\frac{\partial l_{i}}{\partial \theta_{i}} \frac{\partial \theta_{i}}{\partial \mu_{i}} \frac{\partial \mu_{i}}{\partial \eta_{i}} \frac{\partial \eta_{i}}{\partial \beta_{j}}\right) \\ & =\sum_{i=1}^{n} b^{\prime}\left(\theta_{i}\right)\left(y_{i}-\mu_{i}\right) \frac{1}{b^{\prime}\left(\theta_{i}\right) \operatorname{Var}\left(Y_{i}\right)} \frac{\partial \mu_{i}}{\partial \eta_{i}} x_{j i}\\&=\sum_{i=1}^{n} \frac{\left(y_{i}-\mu_{i}\right) x_{j i}}{\operatorname{Var}\left(Y_{i}\right)} \frac{\partial \mu_{i}}{\partial \eta_{i}}\end{aligned}
uj=i=1∑n(∂θi∂li∂μi∂θi∂ηi∂μi∂βj∂ηi)=i=1∑nb′(θi)(yi−μi)b′(θi)Var(Yi)1∂ηi∂μixji=i=1∑nVar(Yi)(yi−μi)xji∂ηi∂μi
由于:
u
=
(
u
0
,
u
1
,
…
,
u
p
)
T
\boldsymbol{u}=\left(u_{0}, u_{1}, \ldots, u_{p}\right)^{T}
u=(u0,u1,…,up)T,很难通过分析求得u的0点,需要用到迭代的方式求解,例如Fisher scoring算法:
θ
(
m
+
1
)
=
θ
(
m
)
+
[
I
(
m
)
]
−
1
u
(
m
)
\theta^{(m+1)}=\theta^{(m)}+\left[I^{(m)}\right]^{-1} u^{(m)}
θ(m+1)=θ(m)+[I(m)]−1u(m)
若想使用Fisher scoring算法求解,还得知道Fisher信息量。
由于Fisher信息量=得分函数的方差,而得分函数的期望为0,因此可借助得分函数与Fisher信息量之间的关系来求:
I
(
θ
)
=
E
[
U
(
θ
;
Y
)
2
]
I(\theta)=E\left[U(\theta ; Y)^{2}\right]
I(θ)=E[U(θ;Y)2]
其中第j行第k列的fisher信息量为:
I
j
k
=
I
β
j
β
k
=
E
(
U
j
U
k
)
I_{j k}=I_{\beta_{j} \beta_{k}}=E\left(U_{j} U_{k}\right)
Ijk=Iβjβk=E(UjUk)
因此:
I
j
k
=
E
(
U
j
U
k
)
=
E
[
(
∑
i
=
1
n
(
Y
i
−
μ
i
)
x
j
i
Var
(
Y
i
)
∂
μ
i
∂
η
i
)
(
∑
l
=
1
n
(
Y
l
−
μ
l
)
x
k
l
Var
(
Y
l
)
∂
μ
l
∂
η
l
)
]
=
∑
i
=
1
n
∑
l
=
1
n
x
j
i
x
k
l
Var
(
Y
i
)
Var
(
Y
l
)
∂
μ
i
∂
η
i
∂
μ
l
∂
η
l
E
[
(
Y
i
−
μ
i
)
(
Y
l
−
μ
l
)
]
\begin{aligned} &I_{j k}\\ =& E\left(U_{j} U_{k}\right)\\=&E\left[\left(\sum_{i=1}^{n} \frac{\left(Y_{i}-\mu_{i}\right) x_{j i}}{\operatorname{Var}\left(Y_{i}\right)} \frac{\partial \mu_{i}}{\partial \eta_{i}}\right)\left(\sum_{l=1}^{n} \frac{\left(Y_{l}-\mu_{l}\right) x_{k l}}{\operatorname{Var}\left(Y_{l}\right)} \frac{\partial \mu_{l}}{\partial \eta_{l}}\right)\right] \\ =& \sum_{i=1}^{n} \sum_{l=1}^{n} \frac{x_{j i} x_{k l}}{\operatorname{Var}\left(Y_{i}\right) \operatorname{Var}\left(Y_{l}\right)} \frac{\partial \mu_{i}}{\partial \eta_{i}} \frac{\partial \mu_{l}}{\partial \eta_{l}} E\left[\left(Y_{i}-\mu_{i}\right)\left(Y_{l}-\mu_{l}\right)\right] \end{aligned}
===IjkE(UjUk)E[(i=1∑nVar(Yi)(Yi−μi)xji∂ηi∂μi)(l=1∑nVar(Yl)(Yl−μl)xkl∂ηl∂μl)]i=1∑nl=1∑nVar(Yi)Var(Yl)xjixkl∂ηi∂μi∂ηl∂μlE[(Yi−μi)(Yl−μl)]
可见最后一项为
C
o
v
(
Y
i
,
Y
l
)
Cov(Y_i,Y_l)
Cov(Yi,Yl),当
Y
i
Y_i
Yi是相互独立时,对于
i
≠
l
,
C
o
v
(
Y
i
,
Y
l
)
=
0
i\ne l,\quad Cov(Y_i,Y_l)=0
i=l,Cov(Yi,Yl)=0;对于
i
=
l
,
C
o
v
(
Y
i
,
Y
l
)
=
V
a
r
(
Y
i
)
i= l,\quad Cov(Y_i,Y_l)=Var(Y_i)
i=l,Cov(Yi,Yl)=Var(Yi)。因此:
I
j
k
=
∑
i
=
1
n
[
x
j
i
x
k
i
Var
(
Y
i
)
(
∂
μ
i
∂
η
i
)
2
]
I_{j k}=\sum_{i=1}^{n}\left[\frac{x_{j i} x_{k i}}{\operatorname{Var}\left(Y_{i}\right)}\left(\frac{\partial \mu_{i}}{\partial \eta_{i}}\right)^{2}\right]
Ijk=i=1∑n[Var(Yi)xjixki(∂ηi∂μi)2]
综上有,广义线性回归参数估计的Fisher scoring算法:
β
^
(
m
+
1
)
=
β
^
(
m
)
+
[
I
(
m
)
]
−
1
u
(
m
)
\widehat{\boldsymbol{\beta}}^{(m+1)}=\widehat{\boldsymbol{\beta}}^{(m)}+\left[\boldsymbol{I}^{(m)}\right]^{-1} \boldsymbol{u}^{(m)}
β
(m+1)=β
(m)+[I(m)]−1u(m)
其中
u
,
I
\boldsymbol{u},\boldsymbol{I}
u,I由以下元素组成:
u
j
=
∑
i
=
1
n
(
y
i
−
μ
i
)
x
j
i
Var
(
Y
i
)
∂
μ
i
∂
η
i
u_{j}=\sum_{i=1}^{n} \frac{\left(y_{i}-\mu_{i}\right) x_{j i}}{\operatorname{Var}\left(Y_{i}\right)} \frac{\partial \mu_{i}}{\partial \eta_{i}}
uj=i=1∑nVar(Yi)(yi−μi)xji∂ηi∂μi
I
j
k
=
∑
i
=
1
n
[
x
j
i
x
k
i
Var
(
Y
i
)
(
∂
μ
i
∂
η
i
)
2
]
I_{j k}=\sum_{i=1}^{n}\left[\frac{x_{j i} x_{k i}}{\operatorname{Var}\left(Y_{i}\right)}\left(\frac{\partial \mu_{i}}{\partial \eta_{i}}\right)^{2}\right]
Ijk=i=1∑n[Var(Yi)xjixki(∂ηi∂μi)2]
对于GLM,可以使用
迭
代
加
权
最
小
二
乘
(
I
W
L
S
)
\color{blue}{迭代加权最小二乘(IWLS)}
迭代加权最小二乘(IWLS)来更有效率的实现Fisher scoring算法。具体如下:
在第m步迭代,考虑在拟合均值
μ
i
(
m
)
\mu_{i}^{(m)}
μi(m)处,对
g
(
y
i
)
g(y_i)
g(yi)进行线性近似:
g
(
y
i
)
≈
g
(
μ
i
(
m
)
)
+
g
′
(
μ
i
(
m
)
)
(
y
i
−
μ
i
(
m
)
)
≈
η
i
(
m
)
+
∂
η
i
(
m
)
∂
μ
i
(
m
)
(
y
i
−
μ
i
(
m
)
)
≡
z
i
(
m
)
\begin{aligned} g\left(y_{i}\right) &\approx g\left(\mu_{i}^{(m)}\right)+g^{\prime}\left(\mu_{i}^{(m)}\right)\left(y_{i}-\mu_{i}^{(m)}\right)\\&\approx \eta_{i}^{(m)}+\frac{\partial \eta_{i}^{(m)}}{\partial \mu_{i}^{(m)}}\left(y_{i}-\mu_{i}^{(m)}\right)\equiv \mathbf{z}_i^{(m)} \end{aligned}
g(yi)≈g(μi(m))+g′(μi(m))(yi−μi(m))≈ηi(m)+∂μi(m)∂ηi(m)(yi−μi(m))≡zi(m)
其中g是链接函数。可将模型"线性"表示成如下形式:
z
(
m
)
=
X
T
β
(
m
)
+
ε
(
m
)
\mathbf{z}^{(m)}=\boldsymbol{X}^{T} \boldsymbol{\beta}^{(m)}+\boldsymbol{\varepsilon}^{(m)}
z(m)=XTβ(m)+ε(m)
其中
E
(
ε
i
(
m
)
)
=
0
E\left(\varepsilon_{i}^{(m)}\right)=0
E(εi(m))=0
Var
(
ε
i
(
m
)
)
=
Var
[
Z
i
(
m
)
∣
x
i
]
=
Var
[
η
i
(
m
)
+
∂
η
i
(
m
)
∂
μ
i
(
m
)
(
Y
i
−
μ
i
(
m
)
)
]
=
[
∂
η
i
(
m
)
∂
μ
i
(
m
)
]
2
Var
(
Y
i
)
\begin{aligned} \operatorname{Var}\left(\varepsilon_{i}^{(m)}\right)=\operatorname{Var}\left[Z_{i}^{(m)} \mid \boldsymbol{x}_{i}\right] & =\operatorname{Var}\left[\eta_{i}^{(m)}+\frac{\partial \eta_{i}^{(m)}}{\partial \mu_{i}^{(m)}}\left(Y_{i}-\mu_{i}^{(m)}\right)\right] \\ & =\left[\frac{\partial \eta_{i}^{(m)}}{\partial \mu_{i}^{(m)}}\right]^{2} \operatorname{Var}\left(Y_{i}\right) \end{aligned}
Var(εi(m))=Var[Zi(m)∣xi]=Var[ηi(m)+∂μi(m)∂ηi(m)(Yi−μi(m))]=[∂μi(m)∂ηi(m)]2Var(Yi)
因此可采用加权最小二乘(WLS)估计上式:
β
~
(
m
+
1
)
=
[
X
T
W
(
m
)
X
]
−
1
X
T
W
(
m
)
z
(
m
)
\widetilde{\boldsymbol{\beta}}^{(m+1)}=\left[\boldsymbol{X}^{T} \boldsymbol{W}^{(m)} \boldsymbol{X}\right]^{-1} \boldsymbol{X}^{T} \boldsymbol{W}^{(m)} \mathbf{z}^{(m)}
β
(m+1)=[XTW(m)X]−1XTW(m)z(m)
其中
W
(
m
)
\boldsymbol{W}^{(m)}
W(m)
是一个
n
×
n
n\times n
n×n的对角矩阵,对角线上第i个元素为:
W
i
i
(
m
)
=
1
Var
(
ε
i
(
m
)
)
W_{i i}^{(m)}=\frac{1}{\operatorname{Var}\left(\varepsilon_{i}^{(m)}\right)}
Wii(m)=Var(εi(m))1
β
\beta
β的新估计值将会引出新的线性变量
z
i
(
m
+
1
)
\mathbf{z}_i^{(m+1)}
zi(m+1)
和新的权重矩阵
W
(
m
+
1
)
\boldsymbol{W}^{(m+1)}
W(m+1),该过程可以迭代至收敛。
可以证明对于参数 β \beta β,用Fisher scoring 估计出的 β ^ \hat{\beta} β^与IWLS估计出的 β ~ \tilde{\beta} β~相同。
回忆指数族分布的Fisher信息阵
I
\mathbf{I}
I,第j行k列的元素如下:
I
j
k
=
∑
i
=
1
n
[
x
j
i
x
k
i
Var
(
Y
i
)
(
∂
μ
i
∂
η
i
)
2
]
I_{j k}=\sum_{i=1}^{n}\left[\frac{x_{j i} x_{k i}}{\operatorname{Var}\left(Y_{i}\right)}\left(\frac{\partial \mu_{i}}{\partial \eta_{i}}\right)^{2}\right]
Ijk=i=1∑n[Var(Yi)xjixki(∂ηi∂μi)2]
因此,在第m步迭代可以记作:
I
(
m
)
=
X
T
W
(
m
)
X
\boldsymbol{I}^{(m)}=\boldsymbol{X}^{T} \boldsymbol{W}^{(m)} \boldsymbol{X}
I(m)=XTW(m)X
W
(
m
)
\boldsymbol{W}^{(m)}
W(m)是上述定义的矩阵。
对于描述Fisher scoring迭代步骤的方程:
β
^
(
m
+
1
)
=
β
^
(
m
)
+
[
I
(
m
)
]
−
1
u
(
m
)
\widehat{\boldsymbol{\beta}}^{(m+1)}=\widehat{\boldsymbol{\beta}}^{(m)}+\left[\mathbf{I}^{(m)}\right]^{-1} \boldsymbol{u}^{(m)}
β
(m+1)=β
(m)+[I(m)]−1u(m)
两边同时左乘
I
(
m
)
\boldsymbol{I}^{(m)}
I(m)有:
KaTeX parse error: No such environment: align at position 7: \begin{̲a̲l̲i̲g̲n̲}̲ \boldsymbo…
上述等式(1)左边可改写为:
X
T
W
(
m
)
X
β
^
(
m
+
1
)
\boldsymbol{X}^{T} \boldsymbol{W}^{(m)} \boldsymbol{X} \widehat{\boldsymbol{\beta}}^{(m+1)}
XTW(m)Xβ
(m+1)
等式(1)右边是一个向量,其中第j个元素为:
∑
k
=
0
p
∑
i
=
1
n
x
j
i
x
k
i
Var
(
Y
i
)
(
∂
μ
i
(
m
)
∂
η
i
(
m
)
)
2
β
^
k
(
m
)
+
∑
i
=
1
n
(
y
i
−
μ
i
(
m
)
)
x
j
i
Var
(
Y
i
)
∂
μ
i
(
m
)
∂
η
i
(
m
)
=
∑
i
=
1
n
x
j
i
Var
(
Y
i
)
(
∂
μ
i
(
m
)
∂
η
i
(
m
)
)
2
[
(
y
i
−
μ
i
(
m
)
)
∂
η
i
(
m
)
∂
μ
i
(
m
)
+
∑
k
=
0
p
x
k
i
β
^
k
(
m
)
]
=
∑
i
=
1
n
x
j
i
Var
(
Y
i
)
(
∂
μ
i
(
m
)
∂
η
i
(
m
)
)
2
[
(
y
i
−
μ
i
(
m
)
)
∂
η
i
(
m
)
∂
μ
i
(
m
)
+
η
i
(
m
)
]
\begin{array}{l}\sum_{k=0}^{p} \sum_{i=1}^{n} \frac{x_{j i} x_{k i}}{\operatorname{Var}\left(Y_{i}\right)}\left(\frac{\partial \mu_{i}^{(m)}}{\partial \eta_{i}^{(m)}}\right)^{2} \hat{\beta}_{k}^{(m)}+\sum_{i=1}^{n} \frac{\left(y_{i}-\mu_{i}^{(m)}\right) x_{j i}}{\operatorname{Var}\left(Y_{i}\right)} \frac{\partial \mu_{i}^{(m)}}{\partial \eta_{i}^{(m)}} \\ =\sum_{i=1}^{n} \frac{x_{j i}}{\operatorname{Var}\left(Y_{i}\right)}\left(\frac{\partial \mu_{i}^{(m)}}{\partial \eta_{i}^{(m)}}\right)^{2}\left[\left(y_{i}-\mu_{i}^{(m)}\right) \frac{\partial \eta_{i}^{(m)}}{\partial \mu_{i}^{(m)}}+\sum_{k=0}^{p} x_{k i} \hat{\beta}_{k}^{(m)}\right] \\ =\sum_{i=1}^{n} \frac{x_{j i}}{\operatorname{Var}\left(Y_{i}\right)}\left(\frac{\partial \mu_{i}^{(m)}}{\partial \eta_{i}^{(m)}}\right)^{2}\left[\left(y_{i}-\mu_{i}^{(m)}\right) \frac{\partial \eta_{i}^{(m)}}{\partial \mu_{i}^{(m)}}+\eta_{i}^{(m)}\right]\end{array}
∑k=0p∑i=1nVar(Yi)xjixki(∂ηi(m)∂μi(m))2β^k(m)+∑i=1nVar(Yi)(yi−μi(m))xji∂ηi(m)∂μi(m)=∑i=1nVar(Yi)xji(∂ηi(m)∂μi(m))2[(yi−μi(m))∂μi(m)∂ηi(m)+∑k=0pxkiβ^k(m)]=∑i=1nVar(Yi)xji(∂ηi(m)∂μi(m))2[(yi−μi(m))∂μi(m)∂ηi(m)+ηi(m)]
可见等式(1)右侧可以写作
X
T
W
(
m
)
z
(
m
)
\boldsymbol{X}^{T} \boldsymbol{W}^{(m)} \mathbf{z}^{(m)}
XTW(m)z(m),其中
z
(
m
)
\mathbf{z}^{(m)}
z(m)是上述定义的线性近似。
综上,(1)式可改写为:
X
T
W
(
m
)
X
β
^
(
m
+
1
)
=
X
T
W
(
m
)
Z
(
m
)
\boldsymbol{X}^{T} \boldsymbol{W}^{(m)} \boldsymbol{X} \widehat{\boldsymbol{\beta}}^{(m+1)}=\boldsymbol{X}^{T} \boldsymbol{W}^{(m)} \mathbf{Z}^{(m)}
XTW(m)Xβ
(m+1)=XTW(m)Z(m)
可见该估计即为IWLS的估计值。
算 法 流 程 : \color{blue}{算法流程:} 算法流程:
给定初始值 β ^ ( 0 ) \widehat{\boldsymbol{\beta}}^{(0)} β (0)
对于第m步迭代,分别计算出 Z ( m ) \mathbf{Z}^{(m)} Z(m)和 W ( m ) \boldsymbol{W}^{(m)} W(m)后,使用WLS获得下一步迭代估计 β ^ ( m + 1 ) \widehat{\boldsymbol{\beta}}^{(m+1)} β (m+1)
迭代上述过程直至 β \boldsymbol{\beta} β的估计值收敛
e.g.1(线性回归):假设
Y
i
∣
x
i
∼
N
(
μ
i
,
σ
2
)
Y_{i} \mid x_{i} \sim N\left(\mu_{i}, \sigma^{2}\right)
Yi∣xi∼N(μi,σ2),
μ
i
=
η
i
=
β
0
+
β
1
x
1
i
+
⋯
+
β
p
x
p
i
\mu_{i}=\eta_{i}=\beta_{0}+\beta_{1} x_{1 i}+\cdots+\beta_{p} x_{p i}
μi=ηi=β0+β1x1i+⋯+βpxpi
由于连接函数为等值:
∂
μ
i
∂
η
i
=
∂
η
i
∂
μ
i
=
1
\frac{\partial \mu_{i}}{\partial \eta_{i}}=\frac{\partial \eta_{i}}{\partial \mu_{i}}=1
∂ηi∂μi=∂μi∂ηi=1,所以
W
i
i
=
1
Var
(
Y
i
)
(
∂
μ
i
∂
η
i
)
2
=
1
σ
2
W_{i i}=\frac{1}{\operatorname{Var}\left(Y_{i}\right)}\left(\frac{\partial \mu_{i}}{\partial \eta_{i}}\right)^{2}=\frac{1}{\sigma^{2}}
Wii=Var(Yi)1(∂ηi∂μi)2=σ21
z
i
(
m
)
=
(
y
i
−
μ
i
(
m
)
)
∂
η
i
(
m
)
∂
μ
i
(
m
)
+
η
i
(
m
)
=
(
y
i
−
η
i
(
m
)
)
+
η
i
(
m
)
=
y
i
z_{i}^{(m)}=\left(y_{i}-\mu_{i}^{(m)}\right) \frac{\partial \eta_{i}^{(m)}}{\partial \mu_{i}^{(m)}}+\eta_{i}^{(m)}=\left(y_{i}-\eta_{i}^{(m)}\right)+\eta_{i}^{(m)}=y_{i}
zi(m)=(yi−μi(m))∂μi(m)∂ηi(m)+ηi(m)=(yi−ηi(m))+ηi(m)=yi
根据迭代加权最小二乘:
X
T
W
(
m
)
X
β
^
(
m
+
1
)
=
X
T
W
(
m
)
Z
(
m
)
\boldsymbol{X}^{T} \boldsymbol{W}^{(m)} \boldsymbol{X} \widehat{\boldsymbol{\beta}}^{(m+1)}=\boldsymbol{X}^{T} \boldsymbol{W}^{(m)} \mathbf{Z}^{(m)}
XTW(m)Xβ
(m+1)=XTW(m)Z(m)
带入可得:
1
σ
2
X
T
X
β
^
(
m
+
1
)
=
1
σ
2
X
T
y
\frac{1}{\sigma^{2}} \boldsymbol{X}^{T} \boldsymbol{X} \widehat{\boldsymbol{\beta}}^{(m+1)}=\frac{1}{\sigma^{2}} \boldsymbol{X}^{T} \boldsymbol{y}
σ21XTXβ
(m+1)=σ21XTy
因此:
β
^
=
(
X
T
X
)
−
1
X
T
y
\widehat{\boldsymbol{\beta}}=\left(\boldsymbol{X}^{T} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{T} \boldsymbol{y}
β
=(XTX)−1XTy
这就是最经典的最小二乘解。
可 借 助 规 范 链 接 简 化 计 算 流 程 \color{blue}{可借助规范链接简化计算流程} 可借助规范链接简化计算流程
首先回忆最初始的得分函数:
∂
l
∂
β
j
=
u
(
β
j
)
=
u
j
=
∑
i
=
1
n
(
∂
l
i
∂
θ
i
∂
θ
i
∂
μ
i
∂
μ
i
∂
η
i
∂
η
i
∂
β
j
)
\frac{\partial l}{\partial \beta_{j}}=u\left(\beta_{j}\right)=u_{j}=\sum_{i=1}^{n}\left(\frac{\partial l_{i}}{\partial \theta_{i}} \frac{\partial \theta_{i}}{\partial \mu_{i}} \frac{\partial \mu_{i}}{\partial \eta_{i}} \frac{\partial \eta_{i}}{\partial \beta_{j}}\right)
∂βj∂l=u(βj)=uj=i=1∑n(∂θi∂li∂μi∂θi∂ηi∂μi∂βj∂ηi)
∂
l
i
∂
θ
i
=
y
i
b
′
(
θ
i
)
+
c
′
(
θ
i
)
=
b
′
(
θ
i
)
[
y
i
+
c
′
(
θ
i
)
b
′
(
θ
i
)
]
=
b
′
(
θ
i
)
(
y
i
−
μ
i
)
\begin{aligned} \frac{\partial l_{i}}{\partial \theta_{i}}=y_{i} b^{\prime}\left(\theta_{i}\right)+c^{\prime}\left(\theta_{i}\right)=&b^{\prime}\left(\theta_{i}\right)\left[y_{i}+\frac{c^{\prime}\left(\theta_{i}\right)}{b^{\prime}\left(\theta_{i}\right)}\right]\\=&b^{\prime}\left(\theta_{i}\right)\left(y_{i}-\mu_{i}\right) \end{aligned}
∂θi∂li=yib′(θi)+c′(θi)==b′(θi)[yi+b′(θi)c′(θi)]b′(θi)(yi−μi)
∂
η
i
∂
β
j
=
x
j
i
\frac{\partial \eta_{i}}{\partial \beta_{j}}=x_{j i}
∂βj∂ηi=xji
如果使用规范链接,有
g
(
μ
i
)
=
η
i
=
b
(
θ
i
)
g\left(\mu_{i}\right)=\eta_{i}=b\left(\theta_{i}\right)
g(μi)=ηi=b(θi)
,因此:
∂
η
i
∂
θ
j
=
b
′
(
θ
)
⇔
∂
θ
i
∂
η
j
=
1
b
′
(
θ
)
\frac{\partial \eta_{i}}{\partial \theta_{j}}=b^{\prime}(\theta) \Leftrightarrow \frac{\partial \theta_{i}}{\partial \eta_{j}}=\frac{1}{b^{\prime}(\theta)}
∂θj∂ηi=b′(θ)⇔∂ηj∂θi=b′(θ)1
带入可得得分函数:
u
j
=
∑
i
=
1
n
(
∂
l
i
∂
θ
i
∂
θ
i
∂
η
i
∂
η
i
∂
β
j
)
=
∑
i
=
1
n
b
′
(
θ
)
(
y
i
−
μ
i
)
1
b
′
(
θ
)
x
j
i
=
∑
i
=
1
n
x
j
i
(
y
i
−
μ
i
)
=
∑
i
=
1
n
x
j
i
[
y
i
+
c
′
(
θ
)
b
′
(
θ
)
]
≡
∑
i
=
1
n
u
i
j
\begin{aligned} u_{j}=\sum_{i=1}^{n}\left(\frac{\partial l_{i}}{\partial \theta_{i}} \frac{\partial \theta_{i}}{\partial \eta_{i}} \frac{\partial \eta_{i}}{\partial \beta_{j}}\right) & =\sum_{i=1}^{n} b^{\prime}(\theta)\left(y_{i}-\mu_{i}\right) \frac{1}{b^{\prime}(\theta)} x_{j i} \\ & =\sum_{i=1}^{n} x_{j i}\left(y_{i}-\mu_{i}\right) \\ & =\sum_{i=1}^{n} x_{j i}\left[y_{i}+\frac{c^{\prime}(\theta)}{b^{\prime}(\theta)}\right] \equiv \sum_{i=1}^{n} u_{i j}\end{aligned}
uj=i=1∑n(∂θi∂li∂ηi∂θi∂βj∂ηi)=i=1∑nb′(θ)(yi−μi)b′(θ)1xji=i=1∑nxji(yi−μi)=i=1∑nxji[yi+b′(θ)c′(θ)]≡i=1∑nuij
Hessian矩阵:
H
j
k
=
H
β
j
β
k
=
∂
u
j
∂
β
k
=
∑
i
=
1
n
(
∂
u
i
j
∂
θ
i
∂
θ
i
∂
η
i
∂
η
i
∂
β
k
)
=
∑
i
=
1
n
x
j
i
[
c
′
′
(
θ
)
b
′
(
θ
)
−
b
′
′
(
θ
)
c
′
(
θ
)
b
′
(
θ
)
2
]
1
b
′
(
θ
)
x
k
i
=
−
∑
i
=
1
n
Var
(
Y
i
)
x
j
i
x
k
i
\begin{aligned} H_{j k} & =H_{\beta_{j} \beta_{k}}=\frac{\partial u_{j}}{\partial \beta_{k}}=\sum_{i=1}^{n}\left(\frac{\partial u_{i j}}{\partial \theta_{i}} \frac{\partial \theta_{i}}{\partial \eta_{i}} \frac{\partial \eta_{i}}{\partial \beta_{k}}\right) \\ & =\sum_{i=1}^{n} x_{j i}\left[\frac{c^{\prime \prime}(\theta) b^{\prime}(\theta)-b^{\prime \prime}(\theta) c^{\prime}(\theta)}{b^{\prime}(\theta)^{2}}\right] \frac{1}{b^{\prime}(\theta)} x_{k i}\\&=-\sum_{i=1}^{n} \operatorname{Var}\left(Y_{i}\right) x_{j i} x_{k i}\end{aligned}
Hjk=Hβjβk=∂βk∂uj=i=1∑n(∂θi∂uij∂ηi∂θi∂βk∂ηi)=i=1∑nxji[b′(θ)2c′′(θ)b′(θ)−b′′(θ)c′(θ)]b′(θ)1xki=−i=1∑nVar(Yi)xjixki
可见上式中并不包含随机变量Y,因此:
I
=
E
[
−
H
]
=
−
H
\boldsymbol{I}=E[-\boldsymbol{H}]=-\boldsymbol{H}
I=E[−H]=−H
即
具
有
规
范
链
接
函
数
的
广
义
线
性
模
型
,
F
i
s
h
e
r
s
c
o
r
i
n
g
和
N
e
w
t
o
n
R
a
p
h
s
o
n
在
每
一
步
的
迭
代
值
完
全
相
同
。
\color{blue}{即具有规范链接函数的广义线性模型,Fisher scoring和Newton Raphson在每一步的迭代值完全相同。}
即具有规范链接函数的广义线性模型,Fisherscoring和NewtonRaphson在每一步的迭代值完全相同。