给定 m 个样本,根据每个样本拥有的 n 个特征对它们分类。
最基础的分类问题是 Binary Classification, 输出结果为 Yes 或 No。
Logistic regression
相较于线性回归假设因变量 y 服从高斯分布,逻辑回归假设因变量 y 服从伯努利分布
\[z=\mathbf{w}^T \mathbf{x} \in (-\infty,+\infty) \]Sigmoid/Logistic function:
\[g(z)=\frac{1}{1+e^{-z}} \in(0,1) ,\, z\in(-\infty,+\infty) \]使用 Sigmoid function 将输出映射到 \([0,1]\) 之间:
\[h(\mathbf{x};\mathbf{w}) = g(\mathbf{w}^T \mathbf{x}) = \frac{1}{1+e^{-\mathbf{w}^T \mathbf{x}}} \]Hypothesis (假设):
\[\begin{aligned} p &= P(y=1|\mathbf{x};\mathbf{w}) = \frac{1}{1+e^{-\mathbf{w}^T \mathbf{x}}} \\ 1-p &= P(y=0|\mathbf{x};\mathbf{w}) = \frac{1}{1+e^{\mathbf{w}^T \mathbf{x}}}\end{aligned} \]易得:
\[\text{odds} = \frac{p}{1-p} \in (0,+\infty) \]\[\ln(\text{odds}) = \ln\frac{p}{1-p}=\mathbf{w}^T \mathbf{x} \]All possibilities:
\[P(y=1|\mathbf{x};\mathbf{w}) + P(y=0|\mathbf{x};\mathbf{w}) = 1 \]Linear decision boundary
\[\mathbf{w}^T\mathbf{x} = \sum_{j=0}^{n} w_j x_j = 0 \]Non-linear decision boundary: polynomial
Likelihood:
\[\mathcal{L}(y=1| \mathbf{w};\mathbf{x}) = P(y=1|\mathbf{x};\mathbf{w}) \]最大似然估计:
\[\begin{aligned} P(y|\mathbf{x};\mathbf{w}) &= P(y=1|\mathbf{x};\mathbf{w})^y P(y=0|\mathbf{x};\mathbf{w})^{1-y} \\ &= (\frac{1}{1+e^{-\mathbf{w}^T \mathbf{x}}})^y (1- \frac{1}{1+e^{-\mathbf{w}^T \mathbf{x}}})^{1-y} \end{aligned} \]Maximize:
\[L(\mathbf{w}) = \prod_{i=1}^{m} P(y_i|\mathbf{x}_i;\mathbf{w}) \]Take the logarithm:
\[\begin{aligned} \ln L(\mathbf{w}) &= \sum_{i=1}^{m} ( y_i \ln P(y=1|\mathbf{x};\mathbf{w}) + (1-y_i) \ln (1- P(y=1|\mathbf{x};\mathbf{w})) ) \\ &= \sum_{i=1}^{m} ( y_i \ln \frac{P(y=1|\mathbf{x};\mathbf{w})}{1- P(y=1|\mathbf{x};\mathbf{w})} + \ln (1- P(y=1|\mathbf{x};\mathbf{w})) ) \\ &= \sum_{i=1}^{m} ( y_i \mathbf{w}^T \mathbf{x}_i - \ln (1+e^{\mathbf{w}^T\mathbf{x}_i}) ) \end{aligned}\]Training set:
\[\{(\mathbf{x}_1,y_1), (\mathbf{x}_2,y_2), \cdots, (\mathbf{x}_m,y_m)\} \]For each example,
\[\mathbf{x}_i = \begin{bmatrix} x_{i,0} \\ x_{i,1} \\ \vdots \\ x_{i,n} \end{bmatrix} ,\, x_{i,0}=1 ,\, y_i \in\{0,1\} \]\[\text{Loss}(h(\mathbf{x}),y) = \begin{cases} -\ln(h(\mathbf{x})) ,&\quad y=1 \\ -\ln(1-h(\mathbf{x})) ,&\quad y=0 \end{cases} \]non-convex 非凸函数
Compress to:
\[\begin{aligned} \text{Loss} (h(\mathbf{x}),y) &= -y\ln(h(\mathbf{x})) - (1-y)\ln(1-h(\mathbf{x})) \\ &= - (y\ln(h(\mathbf{x})) + (1-y)\ln(1-h(\mathbf{x})))\end{aligned}\]Logistic regression cost function:
\[\begin{aligned} J(\mathbf{w}) &= \frac{1}{m} \sum_{i=1}^{m} \text{Loss}(h(\mathbf{x}_i),y_i) \\ &= - \frac{1}{m} \sum_{i=1}^{m} (y_i\ln(h(\mathbf{x}_i)) + (1-y_i)\ln(1-h(\mathbf{x}_i))) \\ &= - \frac{1}{m} \ln L(\mathbf{w}) \\ &= - \frac{1}{m} \sum_{i=1}^{m} ( y_i \mathbf{w}^T \mathbf{x}_i - \ln (1+e^{\mathbf{w}^T\mathbf{x}_i}) ) \end{aligned}\]Repeat (simultaneously update all \(w_j\)):
\[w_j := w_j -\alpha \sum_{i=1}^{m} (h(\mathbf{x}_i)-y_i)\mathbf{x}_{i,j} \]Cross Entropy (交叉熵)
Conjugate gradient (共轭梯度法)
BFGS
L-BFGS
[jVal,gradient] = costFunction(theta)
optimset
fminunc
Cost function of Regularized Logistic Regression:
\[J(\mathbf{w}) = - \frac{1}{m} \sum_{i=1}^{m} ( y_i \mathbf{w}^T \mathbf{x}_i - \ln (1+e^{\mathbf{w}^T\mathbf{x}_i}) ) + \frac{\lambda}{2m} \sum_{j=1}^{n} w_j^2 \]Gradient Descent (梯度下降法)
Repeat simultaneously:
\[w_0 := w_0 -\alpha \sum_{i=1}^{m} (h(\mathbf{x}_i)-y_i)x_{i,0} \]\[\begin{aligned} w_j &:= w_j - \alpha \frac{\partial}{\partial w_j} J(\mathbf{w}) \\ &:= w_j -\alpha \left[ \frac{1}{m} \sum_{i=1}^{m} (h(\mathbf{x}_i)-y_i) x_{i,j} + \frac{\lambda}{m} w_j \right] \end{aligned}\] 标签:Binary,frac,Classification,ln,sum,end,mathbf,aligned From: https://www.cnblogs.com/4thirteen2one/p/16608597.html