首页 > 其他分享 >优化:深度神经网络Tricks【笔记】

优化:深度神经网络Tricks【笔记】

时间:2023-08-14 11:36:31浏览次数:43  
标签:training 训练 Tricks 神经网络 笔记 zero Sec np data


Slide:http://lamda.nju.edu.cn/weixs/slide/CNNTricks_slide.pdf

博文:http://lamda.nju.edu.cn/weixs/project/CNNTricks/CNNTricks.html

  1)data augmentation;   

    2)pre-processing on images;    

      3)initializations of Networks;    

       4)some tips during training;     

        5)selections of activation functions;                        

       6)diverse regularizations

      7)some insights found from figures and finally    

   8)methods of ensemble multiple deep networks.

Sec. 1: Data Augmentation

训练的时候,训练集有限,可以用Data Augmentation来扩充数据集合;

  • (1)、简单的crops: horizontally flipping, random crops andcolor jittering
  • (2)、结合(1)中简单的处理
  • (3)、Krizhevsky et al. [1] 提出的 fancy PCA : alters the intensities of the RGB channels in training images.

Sec. 2: Pre-Processing

(1)、 zero-center + normalize:

python实现


>>> X -= np.mean(X, axis = 0) # zero-center
>>> X /= np.std(X, axis = 0) # normalize


(2)、 PCA Whitening:zero-center-->计算covariance matrix(数据之间的correlation结构)-->decorrelate数据-->whitening

python实现


>>> X -= np.mean(X, axis = 0) # zero-center
>>> cov = np.dot(X.T, X) / X.shape[0] # compute the covariance matrix


 decorrelate data :通过将原来的数据(除了zero-centres)映射带eigenbasis


>>> U,S,V = np.linalg.svd(cov) # compute the SVD factorization of the data covariance matrix
>>> Xrot = np.dot(X, U) # decorrelate the data


 whitening:用eigenvalue将eigenbasis的每个维度分开来normalize the scale


>>> Xwhite = Xrot / np.sqrt(S + 1e-5) # divide by the eigenvalues (which are square roots of the singular values)



Sec. 3: Initializations

(1)、All Zero Initialization

理想状态下认为一般权重为正数一半为负数再见过适当的data normalization

缺点:no source of asymmetry between neurons 

(2)、Initialization with Small Random Numbers:

优点:symmetry breaking

 思想:the neurons are all random and unique in the beginning,

eg1: 

, where

is a zero mean, unit standard deviation gaussian. 

eg2:small numbers drawn from a uniform distribution,

(3)、Calibrating the Variances

思想:normalize the variance of each neuron's output to 1 ,但是不会考虑ReLUs

python实现:


>>> w = np.random.randn(n) / sqrt(n) # calibrating the variances with 1/sqrt(n)


(4)、Current Recommendation

 He et al. [4] 关注 ReLUs:variance :

 

python实现:


>>> w = np.random.randn(n) * sqrt(2.0/n) # current recommendation.


Sec. 4: During Training

  • Filters and pooling size.  input images: power-of-2  ;  filter (e.g.,
  • )  ;strides (e.g., 1) with zeros-padding;  pooling :eg: 
  • .
  • Learning rate.利用validation set ,再次 Ilya Sutskever [2]:divide the gradients by mini batch size
  • Fine-tune on pre-trained models. 考虑:新的数据集的大小&和预训练模型训练数据集的相似性
  • (1)、如果自己的数据和预训练的相似 ,直接在从预训练模型的高层提取的特征尚训练一个 linear classifier
  • (2)、如果有许多数据,可以用small learning rate微调预训练模型的高层
  • (3)、如果自己的数据集和预训练模型的数据集差异很大,但是有很多训练图像,大部分的layers需要用小的learning rate在自己的数据集上进行 fine-tuned
  • (4)、如果自己的数据集小而且与预训练模型数据集差异很大,那就只训练一个 linear classifier.

Sec. 5: Activation Functions :non-linearity

                                    

优化:深度神经网络Tricks【笔记】_数据集

                    


Sigmoid

优化:深度神经网络Tricks【笔记】_ide_02

 

优化:深度神经网络Tricks【笔记】_python实现_03

 large negative numbers become 0 and large positive numbers become 1. 


  1. sigmoids saturate and kill gradients. .
  2. Sigmoid outputs are not zero-centered


tanh(x)

优化:深度神经网络Tricks【笔记】_python实现_04

 

 range [-1, 1].

1、 its activations saturate

2、zero-centered                                                                                                 

Rectified Linear Unit

优化:深度神经网络Tricks【笔记】_python实现_05

 

        


  1. (Pros) do  expensive operations (exponentials, etc.),
  2. (Pros) ReLUs does not suffer from saturating.
  3. (Pros) accelerate (e.g., a factor of 6 in [1]) the convergence of stochastic 
  4. gradient descent (linear, non-saturating form.)
  5. (Cons)  fragile during training and can “die”.                                            



Leaky ReLU

优化:深度神经网络Tricks【笔记】_python实现_06

 

 fix the “dying ReLU” problem. 

  if

 (

 : a small constant)

  if


(cons)the results are not always consistent.                                            

Parametric ReLU : 

优化:深度神经网络Tricks【笔记】_ide_07

 


 PReLU,

is learned from data not pre-defiined[[4]] Leaky ReLU 

is fixed.  RReLU,

is a random variable  in a given range in the training, 

and then fixed in the testing[[5]] (cons) reduce overfitting


Randomized ReLU

  RReLU,

 在训练时是给定范围的随机变量 ,但在测试时是固定的。[[5]

优化:深度神经网络Tricks【笔记】_python实现_08

 

 


Sec. 6: Regularizations

  • L2 regularization : add  
  • to the objective,
  •  :regularization strength. ( heavily penalizing peaky weight vectors and preferring diffuse weight vectors)
  • L1 regularization: add
  • to the objective. 结合:
  •  (Elastic net regularization). 
  • Max norm constraints. enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use projected gradient descent to enforce the constraint.
  • .
  •  (always 3 or 4).update are bounded so the nwtwork wont explores..
  • Dropout : [6] only updating the parameters of the sampled network based on the input data . 

优化:深度神经网络Tricks【笔记】_数据集_09

 

 [6].  training:  keeping a neuron active with some 

probability 

(a hyper-parameter), or setting it to zero .

testing: no dropout 

dropout ratio

 is a reasonable default

Sec. 7: Insights from Figures


  • learning rate 
  • loss curve.: the “width” of the curve is related to the batch size. 
  • accuracy curve. 
  • 优化:深度神经网络Tricks【笔记】_数据集_10

Sec. 8: Ensemble[8]


  • Same model, different initialization. 用交叉验证集来决定最好的超参数 hyperparameters, 然后用这些超参数来训练多个 models ,但是随机初始化.
  • Top models discovered during cross-validation. 用交叉验证集来决定最好的超参数 hyperparameters,然后选出前n个最好的models来ensemble.(风险是可能包含未达标准的model).
  • Different checkpoints of a single model. training非常expensive的情况下, 选取一个single network中不同时刻的不同的 checkpoints 来ensemble. (缺乏多样性,但是cheap).
  • Some practical examples. 如果你的任务是high-level image semantic: 可以在不同的数据集上使用多个深度模型来提取不同的互补的深度representations. 

Miscellaneous

Problems:

data:class-imbalanced: some classes have a large number of images/training instances, while some have very limited number of images. 

method1:balance the training data by directly up-sampling and down-sampling the imbalanced data[10].

method2: crops processing[7].

method3 :adjust the fine-tuning strategy

标签:training,训练,Tricks,神经网络,笔记,zero,Sec,np,data
From: https://blog.51cto.com/u_12667998/7074285

相关文章

  • PADS应用笔记:Layout中多块覆铜填充时个别不生效 ##
    PADS应用笔记:Layout中多块覆铜填充时个别不生效解决方法需要在覆铜的参数设置中修改优先级......
  • PADS应用笔记:如何提取其他人图纸中画好的元件符号和封装库 ## Logic中
    PADS应用笔记:如何提取其他人图纸中画好的元件符号和封装库Logic中选中要保存的符号,或者过滤器选择元器件,然后右键全选。全选后再右键,选择保存到库中Layout中与Logic同理,选好元件后,右键选择保存到库中。......
  • PADS应用笔记:如何手动导网表
    PADS应用笔记:如何手动导网表导出网表选择设置->网表生成.asc格式的网表文件导入网表直接在文件->导入中导入网表文件即可。......
  • vlunhub笔记(四)drippingblues
    (一)信息收集查询目标靶机ip,目标机:192.168.241.142arp-scan-l照常扫一下端口,发现开放21(ftp服务),22(ssh服务),80(web服务)三个端口nmap-A-T4192.168.241.142发现开放21ftp端口,尝试访问。发现一个压缩包,下载下来发现有两个加密文件是包含关系,那我们就需要解开第一层密码。ftp://......
  • CaltechCS122 笔记:Assignment 1: NanoDB Set-Up and Storage Layer
    Assignment1:NanoDBSet-UpandStorageLayerNanoDB是加州理工大学CaltechCS122课程使用的教学数据库系统bufferpoolmanagerlab1的第二部分是实现充分利用空间的bpm,当前所给出的bpm代码pin/unpin的调用存在问题,当进行大规模数据的insert操作时,会出现空间不够......
  • Programming abstractions in C阅读笔记:p107-p110
    《ProgrammingAbstractionsInC》学习第46天,p107-p110,3.1小节——“Theconceptofinterface”,总结如下:一、技术总结1.clientp108,调用library的program称为client。2.interfacep108,"Todoso,thechapterfocusesontheboundarybetweenalibraryanditsclients,wh......
  • Programming abstractions in C阅读笔记:p107-p110
    《ProgrammingAbstractionsInC》学习第46天,p107-p110,3.1小节——“Theconceptofinterface”,总结如下:一、技术总结1.  clientp108,调用library的program称为client。2.  interfacep108,"To do so, the chapter focuses on the boundary between a library and ......
  • 总结笔记1
    1.数据颗粒度,维度2.是数据量3.笛卡尔积加条件,内连接外连接等4.行转列sqlcasewhen的理解造列行转列casewhen/if列转行unionall列转换成字符串GROUP_CONCAT5.hive中MR6.hivejoin7.hivesql优化案例介绍减少处理的数据量分区裁剪,列剪裁合理的......
  • 总结笔记4
    hivesql函数字符串函数:1.length:length(stringA)2.reverse:reverse(stringA)3.concat:concat(stringA,stringB)4.concat_ws:concat_ws(stringsep,stringA,stringB)5.substring,substr:substring(stringA,intstart,intlen)6.substring_index(str,delim,count)如......
  • 总结笔记2
    关联规则AB测试聚类算法查找问题:漏斗分析横向分析小辛野子:先是一个sql,让算新增用户数,7日内的留存小辛野子:然后问了决策树算法、聚类算法、关联规则小辛野子:解释贝叶斯定理的公式小辛野子:用假设检验和置信区间解释第一类错误第二类错误小辛野子:还有各种因果推断......