首页 > 其他分享 >Machine Learning 【note_02】

Machine Learning 【note_02】

时间:2023-06-02 13:33:14浏览次数:49  
标签:02 note frac cdot sum times Machine vec partial

note_02

Keywords: Classification, Logistic Regression, Overfitting, Regularization


1 Motivation

image-20230601161422949

Classification:

  • "binary classification": \(y\) can only be one of two values
  • class / category

Try using linear regression to do:

image-20230601161906847

It seems work.

However, when there's another one sample point:

image-20230601162035119

It will cause the \(x\) value, corresponding to the threshold value 0.5, moving to right, which is worse because some point are misclassified.

Logistic regression

  • Could be used to solve this situation.
  • It is actually used for binary classification problems despite of its name.

2 Logistic Regression

2.1 Conception

sigmoid / logistic function: \(g(z)=\frac{1}{1+e^{-z}}\), \(0<g(z)<1\)

image-20230601163926622

Logistic regression:

\[f_{\vec{w},b}(\vec{x})=g(\vec{w}\cdot\vec{x}+b)=\frac{1}{1+e^{-(\vec{w}\cdot\vec{x}+b)}} \]

image-20230601165226566

Understanding logistic regression: the possibility of the class or label \(y\) will be equal to \(1\) given a certain input \(x\).

image-20230601165753969

2.2 Decision Boundary

2.2.1 Threshold

image-20230601175043152

2.2.2 Linear

image-20230601175233889

2.2.3 Non-linear

image-20230601175531961 image-20230601175717607

2.3 Cost Function

2.3.1 Squared Error Cost

How to choose \(\vec{w}\) and \(b\) for logistic regression:

image-20230601180240713

The squared error cost function is not a good choice:

image-20230601180512304

Because it has many local minimum and is not really as smooth as the "soup bowl" from linear regression:

image-20230601183520684

2.3.2 Logistic Loss

We set the Logistic Loss Function as:

\[J(\vec{w},b)=\frac{1}{m}\sum_{i=1}^mL(f_{\vec{w},b}(\vec{x}^{(i)}),y^{(i)}) \\ L(f_{\vec{w},b}(\vec{x}^{(i)}),y^{(i)}) = \begin{equation} \left\{ \begin{array}{lr} -log(f_{\vec{w},b}(\vec{x}^{(i)})), & y^{(i)}=1\\ -log(1-f_{\vec{w},b}(\vec{x}^{(i)})), & y^{(i)}=0 \end{array} \right. \end{equation} \]

To understanding it:

image-20230601180951772 image-20230601181054174

We can see the cost curve is much better:

image-20230601183849389

2.3.3 Simplified Cost Function

We can write loss function as:

\[L(f_{\vec{w},b}(\vec{x}^{(i)}),y^{(i)}) = - y^{(i)}\times log(f_{\vec{w},b}(\vec{x}^{(i)}) - (1-y^{(i)})\times log(1-f_{\vec{w},b}(\vec{x}^{(i)})) \]

The cost function could be simplified as:

\[\begin{equation} \begin{aligned} J(\vec{w},b) &= \frac{1}{m}\sum_{i=1}^mL(f_{\vec{w},b}(\vec{x}^{(i)}),y^{(i)}) \\ &= -\frac{1}{m}\sum_{i=1}^m[ y^{(i)}\times log(f_{\vec{w},b}(\vec{x}^{(i)}) + (1-y^{(i)})\times log(1-f_{\vec{w},b}(\vec{x}^{(i)})) ] \end{aligned} \end{equation} \]

image-20230601190301470

2.4 Gradient Descent

Find \(\vec{w}\) and \(b\) to minimize the cost function.

Given new \(\vec{x}\), output

\[P(y=1|\vec{x};\vec{w},b)=f_{\vec{w},b}(\vec{x})=\frac{1}{1+e^{-(\vec{w}\cdot\vec{x}+b)}} \]

2.4.1 Implementation

That's quite amazing that the partial derivative of cost function is same.

image-20230601193517144
Derivation

Here, I do the derivation:

\[\begin{equation} \begin{aligned} \frac{\partial}{\partial{w_j}}J(\vec{w},b) &= \frac{\partial}{\partial{w_j}} \{-\frac{1}{m}\sum_{i=1}^m[ y^{(i)}\times log(f_{\vec{w},b}(\vec{x}^{(i)}) + (1-y^{(i)})\times log(1-f_{\vec{w},b}(\vec{x}^{(i)})) ]\}\\ &= -\frac{1}{m}\sum_{i=1}^m\{ [ \frac{y^{(i)}}{f_{\vec{w},b}(\vec{x}^{(i)})} - \frac{1-y^{(i)}}{1-f_{\vec{w},b}(\vec{x}^{(i)})} ] \times \frac{\partial f_{\vec{w},b}}{\partial{w_j}} \}\\ &= -\frac{1}{m}\sum_{i=1}^m[ \frac{y^{(i)}-f_{\vec{w},b}}{f_{\vec{w},b}(1-f_{\vec{w},b})} \times \frac{e^{-(\vec{w}\cdot\vec{x}+b)}\cdot{x_j^{(i)}}}{(1+e^{-(\vec{w}\cdot\vec{x}+b)})^2} ]\\ &= \frac{1}{m}\sum_{i=1}^m[ (f_{\vec{w},b}-y^{(i)}){x_j^{(i)}} \times \frac{e^{-(\vec{w}\cdot\vec{x}+b)}}{ f_{\vec{w},b} \times \frac{e^{-(\vec{w}\cdot\vec{x}+b)}}{1+e^{-(\vec{w}\cdot\vec{x}+b)}} \times \frac{1}{f_{\vec{w},b}} \times (1+e^{-(\vec{w}\cdot\vec{x}+b)}) } ]\\ &= \frac{1}{m}\sum_{i=1}^m(f_{\vec{w},b}-y^{(i)}){x_j^{(i)}} \end{aligned} \end{equation} \]

Other parts is similar:

image-20230601201808439

3 Overfitting

3.1 Conception

underfitting

  • doesn't fit the training set well
  • high bias

just right

  • fits training set pretty well
  • generalization

overfitting

  • fits the training set extremely well
  • high variance
image-20230601232411887 image-20230601233320171

3.2 Addressing Overfitting

Method 1: Collect more training examples

image-20230602090306951

Method 2: Feature Selection - Select features to include / exclude

image-20230602090443033

Method 3: Regularization

  • more gentler
  • Preserve all features while preventing them from having a significant impact
image-20230602090753020

是否对\(b\)进行正则化,没有啥影响。

4 Regularization

4.1 Conception

Modify the cost function. Intuitively, when \(w\) is small, the modified cost function can be smaller.

image-20230602101436430

So, set the cost function as:

\[J(\vec{w},b)=\frac{1}{2m}\sum_{i=1}^{m}{f_{\vec{w},b}((\vec{x}^{(i)})-y^{(i)})^2} + \frac{\lambda}{2m}{\sum_{j=1}^{m}{w_j^2}} \]

  • \(\lambda\): regularization parameter, positive
  • \(b\) can be included or excluded
image-20230602102756757

Here you can see, if the \(\lambda\) is too small (let's say \(0\)), it will be underfitting; and if the \(\lambda\) is too big (let's say \(10^{10}\)), it will be overfitting.

image-20230602103118714

4.2 Regularized Linear Regression

Gradient descent

​ repeat{

\[w_j := w_j-\alpha\frac{\partial}{\partial{w_j}}J(\vec{w},b) = w_j-\alpha[ \frac{1}{m} {\sum_{i=1}^m(f_{w,b}(x^{(i)})-{y}^{(i)})x^{(i)}+\frac{\lambda}{m}w_j}] \\ b := b-\alpha\frac{\partial}{\partial{b}}J(\vec{w},b) = b-\alpha[\frac{1}{m}\sum_{i=1}^m{(f_{w,b}(x^{(i)})-{y}^{(i)})}] \]

}

image-20230602123241498

Here, rewrite the \(w_j\):

\[\begin{equation} \begin{aligned} w_j & = w_j-\alpha[\frac{1}{m}{\sum_{i=1}^m(f_{w,b}(x^{(i)})-{y}^{(i)})x^{(i)}+\frac{\lambda}{m}w_j}] \\ & = w_j(1-\alpha\frac{\lambda}{m})-\alpha\frac{1}{m}{\sum_{i=1}^m(f_{w,b}(x^{(i)})-{y}^{(i)})x^{(i)}} \end{aligned} \end{equation} \]

We can know that the latter term of \(w_j\) is the usually-updating term, and the former one is to shrink \(w_j\).

4.3 Regularized Logistic Regression

image-20230602125949866 image-20230602130147126

标签:02,note,frac,cdot,sum,times,Machine,vec,partial
From: https://www.cnblogs.com/QingMu-0722/p/17451542.html

相关文章

  • 2023.6.2——软件工程日报
    所花时间(包括上课):6h代码量(行):0行博客量(篇):1篇今天,上午学习,下午考web。我了解到的知识点:1.了解了一些数据库的知识;2.了解了一些python的知识;3.了解了一些英语知识;5.了解了一些Javaweb的知识;4.了解了一些数学建模的知识;6.了解了一些计算机网络的知识;......
  • 黑马Vue3 + ElementPlus + Pinia 小兔鲜电商项目2023版
    黑马Vue3+ElementPlus+Pinia小兔鲜电商项目2023版download:3w51xuebccom合式API-watch-基本使用和立即执行合式API是一个用于构建可靠、模块化、灵活的RESTfulAPI的框架。它提供了许多实用的功能,其中包括watch机制。在本篇文章中,我们将介绍合式API的watch机制的基本使用和立......
  • 2021-2022年丘成桐女子中学生数学竞赛——笔试部分
    2021-2022年丘成桐女子中学生数学竞赛——笔试部分声明:本文为我从网上流传的试题写的个人解答,与清华大学求真书院官方无关。首届丘成桐女子中学生数学竞赛于2021年10月31日晚落下帷幕,共140名左右学生参加了笔试,35名学生入围面试,争夺“诺特奖”,最终决出金奖、银奖、铜......
  • 2022-2023 春学期 矩阵与数值分析 C5 插值与逼近
    2022-2023春学期矩阵与数值分析C5插值与逼近C5插值与逼近原文5.1引言有n个插值节点,就有n插值条件,可以构造至多n-1次的插值函数需要考虑简单函数类的选取问题:如代数多项式,三角多项式,分段多项式,有理函数,样条函数等;存在唯一性问题;余项估计问题;收敛性问题等思......
  • 宿主机访问虚拟机ubuntu系统报502错误 | 代理问题
    报错502这里可以发现远程访问的地址与访问的地址ip不一致才发现原来宿主机不能够开代理,否则就访问不了虚拟机里的项目地址,即使能够互相ping通!......
  • notepad++
    下划线转驼峰匹配大小写循环查找正则表达式查找目标:([a-z])_([a-z])替换为:\1\u\2......
  • WebStorm 2023(Web前端开发工具) v2023.1.2中文mac版
    WebStorm2022mac版是一款基于WebSocket的Web应用程序编程工具,旨在通过Web应用程序的代码将HTML和CSS文本从浏览器返回到服务器。WebStorm采用MVC架构,其中每个模块都在其内部运行。WebStorm适用于JavaScript和相关技术的集成开发环境。类似于其他JetBrainsIDE,它也会......
  • Machine Learning 【note_01】
    Declaration(2023/06/02):Thisnoteisthefirstnoteofaseriesofmachinelearningnotes.Atpresent,themainlearningresourceisthe2022AndrewY.NgmachinelearningDeeplearning.aicourse,fromwhichmostoftheknowledgeandsomepicturesinthe......
  • 0002-array笔记
    目录std::array的size()是编译期确定的,不可改变大小std::span和std::array区别展开查看`span`是一个轻量级的容器,可以包装任意类型和大小的连续内存区域,它并不拥有所包装的内存,只是提供了对这些内存的非拥有式视图`span`的作用是提供对一段内存的访问,而不是管理一......
  • 喜讯丨计讯物联5G物联网数据网关TG463荣登2022年度中国物联网行业创新产品榜
    近日,备受瞩目的2022‘物联之星’中国物联网产业年度榜单颁奖典礼在上海世博展览馆会场隆重举行。经由申报筛选、网络人气投票、专家评委投票等多重环节,计讯物联旗下5G物联网数据网关TG463荣登2022年度中国物联网行业创新产品榜。 作为中国物联网行业的“奥斯卡”,本届物联之星......