首页 > 编程语言 >各类梯度下降算法的numpy实现

各类梯度下降算法的numpy实现

时间:2023-02-27 13:55:06浏览次数:37  
标签:setdefault 梯度 rate next 算法 learning numpy config ###############################

layout: post
title: 深度学习
subtitle: 梯度下降算法实现
description: 梯度下降算法实现
date: 2022-10-25
categories: deeplearning
tags: code pytorch
comments: true

摘录自CS231N

def sgd(w, dw, config=None):

   """

   Performs vanilla stochastic gradient descent.

   config format:

   - learning_rate: Scalar learning rate.

   """

   if config is None:

      config = {}

   config.setdefault("learning_rate", 1e-2)

   w -= config["learning_rate"] * dw

   return w, config

def sgd_momentum(w, dw, config=None):

   """

   Performs stochastic gradient descent with momentum.

   config format:

   - learning_rate: Scalar learning rate.

   - momentum: Scalar between 0 and 1 giving the momentum value.

     Setting momentum = 0 reduces to sgd.

   - velocity: A numpy array of the same shape as w and dw used to store a

     moving average of the gradients.

   """

   if config is None:

      config = {}

   config.setdefault("learning_rate", 1e-2)

   config.setdefault("momentum", 0.9)

   v = config.get("velocity", np.zeros_like(w))

   next_w = None

   ###########################################################################

   # TODO: Implement the momentum update formula. Store the updated value in #

   # the next_w variable. You should also use and update the velocity v.    #

   ###########################################################################

   # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

   pass

   v = config['momentum'] * v + (1 - config['momentum']) * dw

   next_w = w - config['learning_rate'] * v

   # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

   ###########################################################################

   #                    END OF YOUR CODE                   #

   ###########################################################################

   config["velocity"] = v

   return next_w, config

def rmsprop(w, dw, config=None):

   """

   Uses the RMSProp update rule, which uses a moving average of squared

   gradient values to set adaptive per-parameter learning rates.

   config format:

   - learning_rate: Scalar learning rate.

   - decay_rate: Scalar between 0 and 1 giving the decay rate for the squared

     gradient cache.

   - epsilon: Small scalar used for smoothing to avoid dividing by zero.

   - cache: Moving average of second moments of gradients.

   """

   if config is None:

      config = {}

   config.setdefault("learning_rate", 1e-2)

   config.setdefault("decay_rate", 0.99)

   config.setdefault("epsilon", 1e-8)

   config.setdefault("cache", np.zeros_like(w))

   next_w = None

   ###########################################################################

   # TODO: Implement the RMSprop update formula, storing the next value of w #

   # in the next_w variable. Don't forget to update cache value stored in   #

   # config['cache'].                                      #

   ###########################################################################

   # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

   pass

   config['cache'] = config['decay_rate'] * config['cache'] + (1 - config['decay_rate']) * dw ** 2

   next_w = w - config['learning_rate'] * dw / (np.sqrt(config['cache']) + config['epsilon'])

   # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

   ###########################################################################

   #                    END OF YOUR CODE                   #

   ###########################################################################

   return next_w, config

def adam(w, dw, config=None):

   """

   Uses the Adam update rule, which incorporates moving averages of both the

   gradient and its square and a bias correction term.

   config format:

   - learning_rate: Scalar learning rate.

   - beta1: Decay rate for moving average of first moment of gradient.

   - beta2: Decay rate for moving average of second moment of gradient.

   - epsilon: Small scalar used for smoothing to avoid dividing by zero.

   - m: Moving average of gradient.

   - v: Moving average of squared gradient.

   - t: Iteration number.

   """

   if config is None:

      config = {}

   config.setdefault("learning_rate", 1e-3)

   config.setdefault("beta1", 0.9)

   config.setdefault("beta2", 0.999)

   config.setdefault("epsilon", 1e-8)

   config.setdefault("m", np.zeros_like(w))

   config.setdefault("v", np.zeros_like(w))

   config.setdefault("t", 0)

   next_w = None

   ###########################################################################

   # TODO: Implement the Adam update formula, storing the next value of w in #

   # the next_w variable. Don't forget to update the m, v, and t variables   #

   # stored in config.                                     #

   #                                                 #

   # NOTE: In order to match the reference output, please modify t _before_  #

   # using it in any calculations.                             #

   ###########################################################################

   # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

   pass

   config['t'] += 1

   config['m'] = config['beta1'] * config['m'] + (1 - config['beta1']) * dw

   config['v'] = config['beta2'] * config['v'] + (1 - config['beta2']) * dw ** 2

   next_w = w - config['learning_rate'] * config['m'] / (np.sqrt(config['v']) + config['epsilon'])

   # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

   ###########################################################################

   #                    END OF YOUR CODE                   #

   ###########################################################################

   return next_w, config

标签:setdefault,梯度,rate,next,算法,learning,numpy,config,###############################
From: https://www.cnblogs.com/cyinen/p/17159410.html

相关文章

  • NSGA2多目标优化算法的MATLAB仿真
    1.算法描述       首先将一群具有多个目标的个体(解集,或者说线代里的向量形式)作为父代初始种群,在每一次迭代中,GA操作后合并父代于自带。通过非支配排序,我们将所有个......
  • 夏至日计算公式及“三伏”的日期算法问题
    今天是6月21日,农历夏至日,这一天为一年中白天最长,晚上最短。夏至日为二十四节气中最早确定下来的农历节气,为每年公历的6月20日、21、22日中的一天。用程序如何快速计算出来呢......
  • 推荐系统[八]算法实践总结V0:腾讯音乐全民K歌推荐系统架构及粗排设计
    1.前言:召回排序流程策略算法简介推荐可分为以下四个流程,分别是召回、粗排、精排以及重排:召回是源头,在某种意义上决定着整个推荐的天花板;粗排是初筛,一般不会上复杂模型;......
  • python Numpy数组2.27
    #成员类型转换arr.astype(np.float_)#转换数组对象成员的类型为float,形状不变。#形状转换arr.resize(shape)#返回值是一个None,不能引用内部的属性arr.reshape(shape)#......
  • 【算法设计-分治思想】快速幂与龟速乘
    目录1.快速幂2.龟速乘3.快速幂取模4.龟速乘取模5.快速幂取模优化1.快速幂算法原理:计算311:311=(35)2x335=(32)2x332=3x3仅需计算3次,而非11......
  • 推荐系统[八]算法实践总结V0:腾讯音乐全民K歌推荐系统架构及粗排设计
    1.前言:召回排序流程策略算法简介推荐可分为以下四个流程,分别是召回、粗排、精排以及重排:召回是源头,在某种意义上决定着整个推荐的天花板;粗排是初筛,一般不会上复杂模型......
  • React面试:谈谈虚拟DOM,Diff算法与Key机制
    1.虚拟dom原生的JSDOM操作非常消耗性能,而React把真实原生JSDOM转换成了JavaScript对象。这就是虚拟Dom(VirtualDom)每次数据更新后,重新计算虚拟Dom,并和上一次生成的虚拟......
  • 《分布式技术原理与算法解析》学习笔记Day24
    分布式缓存在计算机领域,缓存是一个非常重要的、用来提升性能的技术。什么是分布式缓存?缓存技术是指用一个更快的存储设备存储一些经常用到的数据,供用户快速访问。分布......
  • 算法题-模拟商场优惠打折
    模拟商场优惠打折:有三种优惠可以用,满减券,打折券和无门槛券满减券:满100减10,满200减20,依次递推打折券:92折,每次打折完向下取整,一次购物只能用一次无门槛券:一张券减5元,多张......
  • 特斯拉自动驾驶算法和模型解读
    特斯拉自动驾驶算法和模型解读特斯拉是一个典型的AI公司,过去一年训练了75000个神经网络,意味着每8分钟就要出一个新的模型,共有281个模型用到了特斯拉的车上。接下来我们分......