Basics of Neural Network Programming
Logistic Regression
given x , want \(\hat{y}=P(y=1|x)\), \(x\in\R^{n_x}\)
Parameters: \(w\in\R^{n_x}, b\in\R\)
Output: \(\hat{y}=\sigma(w^T+b),\ \ \ \hat{y}\in(0,1)\).
Loss(error) function:
\(\ell(\hat{y},y)=\cfrac{1}{2}(\hat{y}-y)^2\), \(\ell(\hat{y},y)=-(y\log\hat{y}+(1-y)\log(1-\hat{y}))\).
\(\log\lrArr \ln\).
For the second function, if \(y=1,\ell(\hat{y},y)=-y\log \hat{y}\) and you want loss function close 0, \(\hat{y}\) must be as big as possible. As we all know \(\hat{y}\in(0,1)\), so when \(y=1.\ \ell(\hat{y},y)\rarr 0\), \(\hat{y}\) will be close 1.
Cost function:
Gradient descent algorithm:
\(\begin{aligned}Repeat&\{\\w :&= w-\alpha\cfrac{dJ(w)}{dw}\\&\}\end{aligned}\).
标签:function,Basics,Network,ell,Programming,cfrac,hat,log From: https://www.cnblogs.com/99kk/p/17437064.htmlignore parameter b: \(J(w,b)\rarr J(w)\).
\(\alpha\) : learning rate