首页 > 其他分享 >Proj CDeepFuzz Paper Reading: Deepxplore: Automated whitebox testing of deep learning systems

Proj CDeepFuzz Paper Reading: Deepxplore: Automated whitebox testing of deep learning systems

时间:2023-08-29 13:11:28浏览次数:33  
标签:inputs DL neuron whitebox DNN testing coverage CDeepFuzz test

Abstract

背景:现有的深度学习测试在很⼤程度上依赖于⼿动标记的数据,因此通常⽆法暴露罕⻅输⼊的错误⾏为。
本文:DeepXplore
Task: a white-box framework to test DL Models
方法:

  1. neuron coverage
  2. differential testing with multiple DL systems(models)
  3. joint-optimization problem and gradient-based search problems

效果:
实验:
对象: 15 DL models from 5 datasets MNIST, ImageNet, Driving, Contagio/VirusTotal, and Drebin

  1. efficiently finds thousands of incorrect corner case behaviors
  2. generated one test input triggering incorrect < 1s
  3. the test inputs generated by DeepXplore improve the model's accuracy by up to 3%

1. Intro

P1: introduce DL
P2-P3: bugs\incorrect behavior in DL Model;训练数据偏差、模型过度拟合和⽋拟合等多种原因
P4: Related work before: 1. collect and manually label authentic data 2. simulation
P5: Cons: 1. 没有考虑the internals of the target DL systems 2. 覆盖率低
P6: adversarial deep learning: adding perturbations
P7: Cons: 1. limit the perturbation to tiny invisible changes or require manual checks 2. 覆盖率低
P8: Challenges: 1. how to generate inputs that trigger different parts of a DL system’s logic and uncover different types of erroneous behaviors 2. how to identify erroneous behaviors of a DL system without manual labeling/checking.
P9: 介绍neuron coverage, 背景:即使是单个随机选择的测试输⼊也能够实现 100% 的code coverage,⽽neuron coverage则低于 10%
P10: 介绍differetial testing
P11: 定义最大化DL系统的神经元覆盖同时尽可能多的暴露差异行为为一个joint optimization problem,使用梯度来有效解决这一优化问题
P12: 允许用户添加自定义constraints用以模拟如图像不同类型的照明、遮挡等问题

15 state-of-the-art DL models with a total of 132057 neurons trained on five popular datasets containing around 162 GB of data.

2.Background

DL System, the development of DNN is different from traditional software, DNN, neuron,
limitations of existing DNN Testing: 1. expensive labeling effort. 2. low test coverage 3. Problems with low-coverage DNN tests.

3. Overview

DeepXplore 将未标记的测试输⼊作为种⼦,并⽣成覆盖⼤量神经元的新测试(即,将它们激活到⾼于可⾃定义阈值的值)
仅⾼神经元覆盖率可能不会引起许多错误⾏为,⽽仅最⼤化不同⾏为可能只是识别同⼀根本原因的不同表现。
DeepXplore supports enforcing custom domain-specific constraints, e.g., image pixel should be in [0, 255]
DeepXplore应该用在已经事先训练好的模型上
DeepXplore与backpropagation的不同:, backpropagation treats the input value as a constant and the weight parameter as a variable
Algo:

  1. compute the gradient with the input value as a variable and the weight as a constant.
  2. we iteratively perform gradient ascent to modify the test input toward maximizing the objective function of the joint optimization problem described above

4 Method

4.1 Definitions

Neuron coverage. We define neuron coverage of a set of test inputs as the ratio of the number of unique activated neurons for all test inputs and the total number of neurons in the DNN. We consider a neuron to be activated if its output is higher than a threshold value (e.g., 0).
More formally, let us assume that all neurons of a DNN are represented by the set N = {n1,n2, ...}, all test inputs are represented by the set T = {x1, x2, ...}, and out(n, x) is a function that returns the output value of neuron n in the DNN for a given test input x.

4.2 DeepXplore algo

Suppose we have n DNNs Fk ∈1..n : x → y, where Fk is the function modeled by the k-th neural network. x represents the input and y represents the output class probability vectors.
Given an arbitrary x as seed that gets classified to the same class by all DNNs,
our goal is to modify x such that the modified input x′ will be classified differently by at least one of the n DNNs.

对于离散特征,我们将梯度舍⼊为整数。对于处理视觉输⼊(例如图像)的 DNN,我们添加不同限制
hyperparameters:

  1. λ1:balancing the objectives between mini one DNN's prediction for a certain label or maxi others' prediction
  2. λ2:balancing between finding differential behaviors and neuron coverage
  3. s: step size of gradient ascent
  4. t: threshold of whether a individual neuron is activated

5 Implementation

6. Experiment

Framework:Tensorflow 1.0.1, Keras 2.0.3
Targets: MNIST, ImageNet, Driving, Contagio/VirusTotal, and Drebin

6.2 Domain-specific constraints

Image constraints (MNIST, ImageNet, and Driving):

  1. lighting effects for simulating different intensities of lights without changing the content,
  2. occlusion by a single small rectangle for simulating an attacker potentially blocking some parts of a camera
  3. occlusion by multiple tiny black rectangles for simulating effects of dirt on camera lens.
    Other constraints (Drebin and Contagio/VirusTotal):
  4. For Contagio/VirusTotal dataset, DeepXplore follows the restrictions on each feature as described by Šrndic et al.
  5. allows modifying features related to the Android manifest file and thus ensures that the application code is unaffected.
  6. Moreover, DeepXplore only allows adding features (changing from zero to one) but do not allow deleting features (changing from one to zero) from the manifest files to ensure that no application functionality is changed due to insufficient permissions

7. Results

实验1:为每个test sets(一个数据集)选择2000个seed inputs
每个标签对应样本均匀分布
不同类型的数据集使用的超参数不同
结果:

  1. differences found

  2. 三种数据集都给出了3个cases
  • image: 图片,在所有模型和在某个模型上标签不同
  • Android: potential dangerous features of Manifest file in Android that marked benign in the dataset
  • pdf: top3 most in/decremented features: Q:The top-3 most in(de)cremented features for generating two sample malware inputs which PDF classifiers incorrectly mark as benign

7.1 Benefits of neuron coverage

each neuron in a DNN tends to independently extract a specific feature of the input instead of collaborating with other neurons for feature extraction

  1. we show that neuron coverage is a significantly better metric than code coverage for measuring comprehensiveness of the DNN test inputs. we find that a small number of test inputs can achieve 100% code coverage for all DNNs where neuron coverage is actually less than 34%.
  2. inputs from different classes tend to activate more unique neurons than inputs from the same class. Both findings confirm that neuron coverage

7.2 Performance

Metrics:

  1. neuron coverage
  2. execution time for inputs
  3. the time for finding the first adversial sample: 为了揭示对差异度小的模型会不会找不到这种samples

Competitor: adversial testing, random selection from the original test set

7.3 Improving DNNs with DeepXplore

选择100个Samples来refine模型

8 Discussion

标签:inputs,DL,neuron,whitebox,DNN,testing,coverage,CDeepFuzz,test
From: https://www.cnblogs.com/xuesu/p/17661185.html

相关文章

  • Debian testing更新遇到依赖错误
    gnustep-base-runtime:Depends:gnustep-base-common(=1.29.0-6)but1.28.1+really1.28.0-5istobeinstalledBing答案Clearoutthelocalrepositoryofretrievedpackagefiles.sudoapt-getautocleanResolvedependenciesproblemssudoapt-get-finstalls......
  • 论文阅读 | Penetration Testing Active Reconnaissance Phase – Optimized Port Sca
    我们可以使用TCP端口扫描对物联网设备进行分类吗?https://ieeexplore.ieee.org/document/8913346 1介绍在[10]中,我们根据统计属性(如活动周期,端口号,信令模式和密码套件)来表征物联网流量。此外,提出了一个多阶段机器学习模型,使用从配备特殊硬件加速(例如NetFlow)的网络交换机......
  • What are the differences between in vivo and in vitro testing of drugs for toxic
    Intoxicologystudies,therearetwomaintypesoftestsusedtoassessthesafetyandpotentialtoxiceffectsofdrugs:invivotestsandinvitrotests.Weknowthatthetraditionalmethodofdrugtoxicologyresearchistouseanimalmodelsforinvivo......
  • Counting principle and Program Testing
    Referencehttps://www.shuxuele.com/data/basic-counting-principle.html基本计数原理若有m个方法去做一件事,及n个方法去做另一件事,则有m×n个方法去做这两件事。应用时需要注意这个原理只适合在所有选择都是独立时才适用。如果是做了一个选择会影响另一个选择就不适合用。......
  • Basic_pentesting_1靶机渗透流程
    Basic_pentesting_1DescriptionThisisasmallboot2rootVMIcreatedformyuniversity’scybersecuritygroup.Itcontainsmultipleremotevulnerabilitiesandmultipleprivilegeescalationvectors.IdidallofmytestingforthisVMonVirtualBox,sot......
  • Gartner 魔力象限:应用程序安全测试 2023 - Magic Quadrant for Application Security
    MagicQuadrantforApplicationSecurityTesting2023Gartner魔力象限:应用程序安全测试2023请访问原文链接:https://sysin.org/blog/gartner-magic-quadrant-ast-2023/,查看最新版。原创作品,转载请保留出处。作者主页:sysin.orgGartner魔力象限:应用程序安全测试2023Magic......
  • Creating your own OpenID Connect server with ASOS: testing your authorization se
    Thispostistheeighthpartofaseriesofblogpostsentitled CreatingyourownOpenIDConnectserverwithASOS:IntroductionChoosingtherightflow(s)RegisteringthemiddlewareintheASP.NETCorepipelineCreatingyourownauthorizationproviderI......
  • 软件测试领域的 penetration testing 的概念和目标
    在软件测试领域,渗透测试(PenetrationTesting)是一种安全测试方法,旨在评估计算机系统、网络或应用程序的安全性。渗透测试模拟了恶意黑客攻击的方式和方法,以揭示潜在的安全漏洞和弱点,并提供有关如何修复这些问题的建议。渗透测试有助于组织识别并加强其信息系统的安全性。以下是渗......
  • [Typescript] Testing type
    1.TSD:https://github.com/SamVerschueren/tsdimport{expectType}from'tsd';importconcatfrom'.';expectType<string>(concat('foo','bar'));expectType<string>(concat(1,2)); 2.Vitest:https://vit......
  • android平台下UITesting环境配置
    1.androidsdk至少需要android4.1,所以先通过SDKmanager更新sdk,我这里使用android4.22.eclipse至少需要3.6.2,否则不支持3.配置环境变量path,加入java,androidsdktool的路径4.通过AVDManager启动android4.2的虚拟机5.在虚拟机上安装需要测试的apk软件(下面的测试用例只测......