首页 > 其他分享 >【机器学习】正则化

【机器学习】正则化

时间:2023-07-31 15:56:25浏览次数:36  
标签:dj scalar 学习 正则 cost mathbf dw 机器 lambda

Regularized

Cost function for regularized linear regression

数学表达式

\[J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 \]

\[ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b \]

当\(\lambda\)特别大的时候,\(w_j\)就不得不特别小,从而解决过拟合

如果不知道选择哪个特征,就将所有特征都正则化

代码

def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):
    """
    Computes the cost over all examples
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns:
      total_cost (scalar):  cost 
    """

    m  = X.shape[0]
    n  = len(w)
    cost = 0.
    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b                                   #(n,)(n,)=scalar, see np.dot
        cost = cost + (f_wb_i - y[i])**2                               #scalar             
    cost = cost / (2 * m)                                              #scalar  
 
    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j]**2)                                          #scalar
    reg_cost = (lambda_/(2*m)) * reg_cost                              #scalar
    
    total_cost = cost + reg_cost                                       #scalar
    return total_cost                                                  #scalar

Cost function for regularized logistic regression

数学表达式

\[J(\mathbf{w},b) = \frac{1}{m} \sum_{i=0}^{m-1} \left[ -y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \right] + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 \]

\[f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = sigmoid(\mathbf{w} \cdot \mathbf{x}^{(i)} + b) \]

代码

def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):
    """
    Computes the cost over all examples
    Args:
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns:
      total_cost (scalar):  cost 
    """

    m,n  = X.shape
    cost = 0.
    for i in range(m):
        z_i = np.dot(X[i], w) + b                                      #(n,)(n,)=scalar, see np.dot
        f_wb_i = sigmoid(z_i)                                          #scalar
        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)      #scalar
             
    cost = cost/m                                                      #scalar

    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j]**2)                                          #scalar
    reg_cost = (lambda_/(2*m)) * reg_cost                              #scalar
    
    total_cost = cost + reg_cost                                       #scalar
    return total_cost                                                  #scalar

Computing the Gradient with regularization (both linear/logistic)

数学表达式

\[\begin{align*} \frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} + \frac{\lambda}{m} w_j \\ \frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \end{align*} \]

  • For a linear regression model \(f_{\mathbf{w},b}(x) = \mathbf{w} \cdot \mathbf{x} + b\)
  • For a logistic regression model \(z = \mathbf{w} \cdot \mathbf{x} + b\) \(f_{\mathbf{w},b}(x) = g(z)\) where \(g(z)\) is the sigmoid function: \(g(z) = \frac{1}{1+e^{-z}}\)

线性回归代码

def compute_gradient_linear_reg(X, y, w, b, lambda_): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]                 
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]               
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m   
    
    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

    return dj_db, dj_dw

逻辑回归代码

def compute_gradient_logistic_reg(X, y, w, b, lambda_): 
    """
    Computes the gradient for linear regression 
 
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters  
      b (scalar)      : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns
      dj_dw (ndarray Shape (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar)            : The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape
    dj_dw = np.zeros((n,))                            #(n,)
    dj_db = 0.0                                       #scalar

    for i in range(m):
        f_wb_i = sigmoid(np.dot(X[i],w) + b)          #(n,)(n,)=scalar
        err_i  = f_wb_i  - y[i]                       #scalar
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err_i * X[i,j]      #scalar
        dj_db = dj_db + err_i
    dj_dw = dj_dw/m                                   #(n,)
    dj_db = dj_db/m                                   #scalar

    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

    return dj_db, dj_dw  

标签:dj,scalar,学习,正则,cost,mathbf,dw,机器,lambda
From: https://www.cnblogs.com/MrFeng2997/p/17592017.html

相关文章

  • 【机器学习】决策树
    DecisionTree熵-entropy数学表达式\[H(p_1)=-p_1\text{log}_2(p_1)-(1-p_1)\text{log}_2(1-p_1)\]代码#UNQ_C1#GRADEDFUNCTION:compute_entropydefcompute_entropy(y):"""ComputestheentropyforArgs:y(n......
  • 【机器学习】K-Means
    K-Means找最接近的质心公式\[c^{(i)}:=j\quad\mathrm{that\;minimizes}\quad||x^{(i)}-\mu_j||^2\]其中,范式\(||X||\),其计算公式为\[||X||=\sqrt{x_1^2+x_2^2+\cdots+x_n^2}\]代码#UNQ_C1#GRADEDFUNCTION:find_closest_centroidsdeffind_closest......
  • 工业机器人的形态(非姿态)
    工业机器人的形态当我们描述机器人在空间的一个位姿时,通常使用直角坐标系、工具坐标系或用户坐标系(统称为笛卡尔坐标系)的点。但是同样的一个位姿对于关节坐标系来说可能有多个值。假定当六轴机器人处于零点位置时,各坐标系的值如下表。关节坐标系直角坐标系各轴均为0......
  • 爬虫学习(一)
    爬虫学习(一)简单爬虫我们需要学习urllib库,在这个库中存在着许多辅助我们进行爬虫的工具,该包中有着模块:request:最基本的HTTP请求模块,可以用来模拟发送请求。error:异常处理抹开,如果出现请求错误,可以捕捉异常,然后进行充实或其他操作。parse:工具模块,提供了许多URL处理方法,如拆分,......
  • python学习_元组
    一、什么是元组?元组也是python内置的数据结构,是一个不可变的序列,他也可以存放不同数据类型的元素不可变序列有:就是不可以改变的序列,没有增、删、改的操作,如元组、字符串就是不可变序列可变序列:可以对序列进行增、删、改操作,对象地址不发生改变,如列表、字典等'''不可变序列与......
  • 怎么学习C语言,才能快速掌握?
    有多年软件行业经验,期间参与过多个C语言项目。要掌握一门编程语言,仅仅投入时间学习是不够的,关键在于实际项目经验。在没有真正实战经验之前,不宜轻易声称掌握某种编程语言,因为编程是积累性的工作,理论知识重要但实践更为关键。学习任何编程语言都需要先掌握理论基础,然后通过项目实战......
  • Java学习
    数据类型整数类型:byte1个字节,short2个字节,int3个字节,long8个字节。浮点类型:float4个字节,double8个字节,字符类型:char2个字节银行业务不能用浮点数进行比较,用BigDecimal(数学工具类)所有的字符本质上还是数字。转义字符:\t制表符空格\n换行类型转换:由低到高b......
  • 站桩学习整理
    姿势调整由下至上双脚分开,略宽于肩膀,脚尖向前膝盖微曲(方便大腿内侧发力,也能防止盆骨前倾)大腿内侧肌肉收缩(不用太大的力,但是需要收缩)注意盆骨千万不要前倾,胯微下坐,因为膝盖微曲,自然会微微下坐,且大腿内侧用了,会支撑住保持脊柱挺直,在放松的前提下挺到最直,要是用力挺容易累双手......
  • 【机器学习】多变量线性回归
    LinerRegressionwithMultipleVariable用向量实现的代码,单变量和多变量可以共用多变量线性回归相当于是单变量的扩展,主要还是按照模型假设、构造代价函数和研究代价函数的最小值这样的思路展开。与单变量线性回归不同的是,多变量线性回归还可能涉及到特征缩放的问题,主要原因......
  • 【TCP】学习笔记:application/octet-stream
    当浏览器在请求资源时,会通过http返回头中的content-type决定如何显示/处理将要加载的数据,如果这个类型浏览器能够支持阅览,浏览器就会直接展示该资源,比如png、jpeg、video等格式。在某些下载文件的场景中,服务端可能会返回文件流,并在返回头中带上Content-Type:application/octet-st......