阶跃函数Sigmoid
\[g(z)=\frac{1}{1+e^{-z}} \]线性函数进行sigmoid操作
\[h_{\theta}(x)=g\left(\theta^{T} x\right)=\frac{1}{1+e^{-\theta^{T} x}} \]对于分类任务
\[\begin{array}{l}P(y=1 \mid x ; \theta)=h_{\theta}(x) \\ P(y=0 \mid x ; \theta)=1-h_{\theta}(x)\end{array} \]整合以上式子得出
\[P(y \mid x ; \theta)=\left(h_{\theta}(x)\right)^{\mathrm{y}}\left(1-h_{\theta}(x)\right)^{1-y} \]得出似然函数
\[L(\theta)=\prod_{i=1}^{m} P\left(y_{i} \mid x_{i} ; \theta\right)=\prod_{i=1}^{m}\left(h_{\theta}\left(x_{i}\right)\right)^{y_{i}}\left(1-h_{\theta}\left(x_{i}\right)\right)^{1-y_{i}} \]得出对数似然函数
\[l(\theta)=\log L(\theta)=\sum_{i=1}^{m}\left(y_{i} \log h_{\theta}\left(x_{i}\right)+\left(1-y_{i}\right) \log \left(1-h_{\theta}\left(x_{i}\right)\right)\right) \]以上极大似然,当然是越大越好,这样就需要梯度上升,转化为梯度下降如下
\[J(\theta)=-\frac{1}{m} l(\theta) \]对似然函数求导
\[\begin{array}{l}l(\theta)=\log L(\theta)=\sum_{i=1}^{m}\left(y_{i} \log h_{\theta}\left(x_{i}\right)+\left(1-y_{i}\right) \log \left(1-h_{\theta}\left(x_{i}\right)\right)\right) \\ \frac{\delta}{\delta_{\theta_{j}}} J(\theta)=-\frac{1}{m} \sum_{i=1}^{m}\left(y_{i} \frac{1}{h_{\theta}\left(x_{i}\right)} \frac{\delta}{\delta_{\theta_{j}}} h_{\theta}\left(x_{i}\right)-\left(1-\mathrm{y}_{\mathrm{i}}\right) \frac{1}{1-h_{\theta}\left(x_{i}\right)} \frac{\delta}{\delta_{\theta_{j}}} h_{\theta}\left(x_{i}\right)\right) \\ =-\frac{1}{m} \sum_{i=1}^{m}\left(y_{i} \frac{1}{g\left(\theta^{\mathrm{T}} x_{i}\right)}-\left(1-\mathrm{y}_{\mathrm{i}}\right) \frac{1}{1-g\left(\theta^{\mathrm{T}} x_{i}\right)}\right) \frac{\delta}{\delta_{\theta}} g\left(\theta^{\mathrm{T}} x_{i}\right) \\ =-\frac{1}{m} \sum_{i=1}^{m}\left(y_{i} \frac{1}{g\left(\theta^{\mathrm{T}} x_{i}\right)}-\left(1-\mathrm{y}_{\mathrm{i}}\right) \frac{1}{1-g\left(\theta^{\mathrm{T}} x_{i}\right)}\right) g\left(\theta^{\mathrm{T}} x_{i}\right)\left(1-g\left(\theta^{\mathrm{T}} x_{i}\right)\right) \frac{\delta}{\delta_{\theta}} \theta^{\mathrm{T}} x_{i} \\ =-\frac{1}{m} \sum_{i=1}^{m}\left(y_{i}\left(1-g\left(\theta^{\mathrm{T}} x_{i}\right)\right)-\left(1-\mathrm{y}_{\mathrm{i}}\right) g\left(\theta^{\mathrm{T}} x_{i}\right)\right) x_{i}^{j} \\ =-\frac{1}{m} \sum_{i=1}^{m}\left(y_{i}-g\left(\theta^{\mathrm{T}} x_{i}\right)\right) x_{i}^{j}\end{array} \]参数更新
\[\theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x_{i}\right)-y_{i}\right) x_{i}^{j} \]多分类softmax
\[h_{\theta}\left(x^{(i)}\right)=\left[\begin{array}{c}p\left(y^{(i)}=1 \mid x^{(i)} ; \theta\right) \\ p\left(y^{(i)}=2 \mid x^{(i)} ; \theta\right) \\ \vdots \\ p\left(y^{(i)}=k \mid x^{(i)} ; \theta\right)\end{array}\right]=\frac{1}{\sum_{j=1}^{k} e^{\theta_{j}^{T} x^{(i)}}}\left[\begin{array}{c}e^{\theta_{1}^{T} x^{(i)}} \\ e^{\theta_{2}^{T} x^{(i)}} \\ \vdots \\ e^{\theta_{k}^{T} x^{(i)}}\end{array}\right] \]\[\hat{p}_{k}=\sigma(\mathbf{s}(\mathbf{x}))_{k}=\frac{\exp \left(s_{k}(\mathbf{x})\right)}{\sum_{j=1}^{K} \exp \left(s_{j}(\mathbf{x})\right)} \]损失函数交叉熵
\[J(\Theta)=-\frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} y_{k}^{(i)} \log \left(\hat{p}_{k}^{(i)}\right) \] 标签:逻辑,right,frac,回归,算法,theta,mathrm,sum,left From: https://www.cnblogs.com/CallMeRoot/p/18038686