首页 > 其他分享 >CS224 Program Analysis@Shanghaitech 24 Fall Notes

CS224 Program Analysis@Shanghaitech 24 Fall Notes

时间:2024-10-10 11:45:14浏览次数:1  
标签:24 CS224 Notes analysis program Analysis basic block OUT

1. Introduction

Rice's Theorem

Static Analysis analyzes a program P to reason about its behaviors and determines whether it satisfies some properties before running P.

Rice's Theorem: Any non-trivial property of the behavior of programs in a recursively enumerable language is undecidable.

我们可以把任意程序看成一个从输入到输出上的函数(输入输出对的集合),该函数描述了程序的行为关于该函数/集合的任何非平凡属性,都不存在可以检查该属性的通用算法

Regular languages \(\in\) Context-Free Languages \(\in\) Recursive Languages \(\in\) Recursively Enumerable Languages \(\in\) Non-Recursively Enumerable Languages

  • A language is recursively enumerable (r.e.) if it is the set of strings accepted by some TM.

  • A language is recursive if it is the set of strings accepted by some TM that halts on every input.

  • A property is trivial if either it is not satisfied by any r.e. language, or if it satisfies by all r.e. languages; otherwise it is non-trivial(平凡属性,要么对全体程序都为真,要么对全体程序都为假)

Examples of trivial properties:

  • The programs are either the empty or they contain at least one word.
  • The length of a program must be finite.

Approximate: Useful Static Analysis

Original decision problem: output Yes or No

Approximate solution to decision problem: Yes, No or Unknown

  1. Must Analysis, lower/under approximate: only output yes or unknown (outputs information that must be true)
  2. May Analysis, upper/over approximate: only output no or unknown (outputs information that may be true)

Objective: Answer Yes or No as much as possible, answer unknown as little as possible

假设正确答案是一个集合 S,must analysis 总是返回 S 的子集,may analysis 总是返回集合 S 的超集

Necessity of Soundness

Soundness (健壮性) in static-analysis refers to the property that if an analysis reports no errors, then the program is guaranteed to be free of the class of errors being checked. In other words, a sound static analysis will not miss any real bugs or defects in the code that it claims to detect. 可以理解为允许误报,但不能漏报。

Completeness (完备性) 可以理解为可能漏报,但不误报的情况。

Static Analysis: ensure (or get close to) soundness, while making good trade-offs between analysis precision and analysis speed.

Abstraction + Over-approximation

Two Words to Conclude Static Analysis (for most analysis): Abstraction + Over-approximation

Abstraction

e.g: ints -> signs (+, -, 0, ⊤(unknown), ⊥(undefined))

Overapproximation

transfer function

In static analysis, transfer functions define how to evaluate different program statements on abstract values.

Transfer functions are defined according to “analysis problem” and the “semantics” of different program statements

e.g: '+' + '-' = '⊤'(unknown)

control flows

As it’s impossible to enumerate all paths in practice, flow merging (as a way of over-approximation) is taken for granted in most static analyses.

2. IR

一般基于 IR 做分析,而不是 AST。

AST

  • high-level and closed to grammar structure •

  • usually language dependent

  • suitable for fast type checking

  • lack of control flow information

IR

  • low-level and closed to machine code
  • usually language independent
  • compact and uniform
  • contains control flow information
  • usually considered as the basic for static analysis

3-Address Code (3AC)

at most one operator on the right side of an instruction.

e.g

  • t2 = a + b + 3 is not 3AC
  • t1 = a + b is 3AC

each 3AC contains at most 3 addresses. Address can be one of the following:

  • Name: such as a, b
  • Constant: such as 3
  • Compiler-generated temporary: such as t1, t2

Some Common 3AC Forms:

  • x = y bop z ( bop: binary arithmetic or logical operation)
  • x = uop y (uop: unary operation (minus, negation, casting) )
  • x = y
  • goto L
  • if x goto L
  • if x rop y goto L (rop: relational operator (>,<,==,>=,<=, etc))

Soot

Soot is the most pupular static analysis framework for java

Its IR is Jimple: typed 3-address code

Static Single Assignment (SSA) (Optional)

All assignments in SSA are to variables with distinct names

  • Give each definition a fresh name
  • Propagate fresh name to subsequent uses
  • Every variable has exactly one definition

When a variable use is at control flow merges, a special merge operator, ∅ (called phi-function), is introduced to select the values at merge nodes

Why SSA

Flow information is indirectly incorporated into the unique variable names.

Define-and-Use pairs are explicit

Why not SSA

SSA may introduce too many variables and phi-functions

May introduce inefficiency problem when translating to machine code (due to copy operations)

Control Flow Analysis

Usually refer to building Control Flow Graph (CFG). CFG serves as the basic structure for static analysis.

The node in CFG can be an individual 3-address instruction, or (usually) a Basic Block (BB)

Basic Block

Basic blocks(BB) are maximal sequences of consecutive 3AC with the properties that

  • it can be entered only at the beginning, i.e., the first instruction in the block.
  • it can be exited only at the end, i.e., the last instruction in the block.

How to build Basic blocks

INPUT: A sequence of three-address instructions of P

OUTPUT: A list of basic blocks of P

METHOD:

  1. Determine the leaders in P
    • The first instruction in P is a leader
    • Any target instruction of a conditional or unconditional jump is a leader
    • Any instruction that immediately follows a conditional or unconditional jump is a leader
  2. Build BBs for P
    • A BB consists of a leader and all its subsequent instructions until the next leader

Control Flow Graph

The nodes of CFG are basic blocks

There is an edge from block A to block B if and only if

  • There is a conditional or unconditional jump from the end of A to the beginning of B
  • B immediately follows A in the original order of instructions and A does not end in an unconditional jump

We say that A is a predecessor of B, and B is a successor of A

Usually we add two nodes, Entry and Exit.

  • They do not correspond to executable IR
  • A edge from Entry to the BB containing the first instruction of IR
  • A edge to Exit from any BB containing an instruction that could be the last instruction of IR

3. Data Flow Analysis

Each execution of an IR statement transforms an input state to a new output state,the input (output) state is associated with program point before (after) the statement

Data-flow analysis is to find a solution to a set of safe-approximation-directed constraints on the IN[s]’s and OUT[s]’s, for all statements s.

  • constraints based on semantics of statements (transfer functions)
    • Forward Analysis: OUT[s] = f_s(IN[s])
    • Backward Analysis: IN[s] = f_s(OUT[s])
  • constraints based on the flows of control

相关文章

  • 20222317 2024-2025-1《网络与系统攻防技术》实验一实验报告
    一、实验内容本次实验的对象是一个名为pwn1的linux可执行文件。该程序正常执行流程是:main调用foo函数,foo函数会简单回显任何用户输入的字符串。该程序同时包含另一个代码片段,getShell,会返回一个可用Shell。正常情况下这个代码是不会被运行的。我们本次实验将学习两种方法运行这......
  • 20222306 2024-2025-1 《网络与系统攻防技术》实验一实验报告
    1.实验内容1.1本周学习内容①Linux基础知识基本的shell命令(例如:ls、cd、cp、touch、cat、su等等)在Linux中熟练使用编译器gcc、调试器gdb,尤其是gdb调试指令(例如:设置断点break/clear、启用/禁用断点enable/disable、运行程序run、继续运行continue、单步代码跟入函数step、查看......
  • 2024.9.27 模拟赛 CSP5
    模拟赛无T1光题贪心,发现首先让最大的减\(4\),这样最优并且不会涉及向下取整,等到数据范围小了以后直接\(O(n^4)\)暴力枚举。code#include<bits/stdc++.h>usingnamespacestd;inta,b,c,d;intans=1e9;#definemx(x,y)(x>y?(x):(y))#definemi(x,y)(x<y?(x):(y......
  • Camtasia2024破解版下载Camtasia2024安装包永久免费版电脑软件下载
    CamtasiaStudio2024软件:打造你的视频内容宇宙......
  • 【2024版】最简单的Pycharm安装 教程(新手小白都能学会)
    PyCharm安装教程1、点击右边链接→PyCharm安装包我们以专业版为例,下载完成后打开安装包点击下一步先选择安装位置,然后点击下一步勾选所有选项,点击下一步直接点击安装等待安装完成激火后PyCharm就可以使用了PyCharm专业版安装包、集活码获取:点击这里最......
  • SSM湘农乐市农产品交易平台-计算机毕业设计源码28246
    目 录SSM湘农乐市农产品交易平台1绪论1.1研究背景1.2研究意义1.3研究方法1.4论文结构与章节安排2 湘农乐市农产品交易平台系统分析2.1可行性分析2.2系统流程分析2.3 系统功能分析2.4 系统用例分析2.5本章小结3湘农乐市农产品交易平台总体设......
  • 2024你必须要掌握的Idea使用技巧和快捷键
    Hello,大家好,我是Feri,一枚十多年的程序员,同时也是一名在读研究生,关注我,且看一个平凡的程序员如何在自我成长,CodingSir是我想打造一个编程社区,只为各位小伙伴提供编程相关干货知识,希望在自我蜕变的路上,我们一起努力,努力什么时候开始都不晚,我,从现在开始做起!一、前言Hello,大家......
  • 2024-10-10 js 深拷贝常用方法
    1、json序列化以及反序列化leta=JSON.parse(JSON.stringify(b))2、使用lodash库插件没有的话先安装:npmilodash使用方式:import{cloneDeep}from'lodash';leta=cloneDeep(b);ps:我当前使用的版本是@4为什么要使用深拷贝?因为我们在开发中会经常进行赋值......
  • CSP2024 前集训:多校A层冲刺NOIP2024模拟赛04
    前言T1签了。T2一眼后缀数组板子,但是复杂度是\(O(nq\log(n))\)的,极限数据本地\(4\)秒,但如果您会\(O(n)\)求后缀数组的话就直接过掉了,但赛时数据貌似纯随机,遂可以直接过掉,可以优化成\(O(n^2\log(n)+nq)\)或\(O(n^2\log(n)+q)\)的,赛时想打这个但是怕常熟大和上面区别......
  • 多校A层冲刺NOIP2024模拟赛04
    A.02表示法对要求的数二进制拆分,每一位递归求解,大于2就继续拆,是1返回\(2(0)\),是2返回\(2\),由于外层的数比较大,所以要写一个高精除低精点击查看代码#include<bits/stdc++.h>#defineintlonglongconstintmaxn=1e5+10;usingnamespacestd;intn,ans[maxn],top;str......