算是一个过渡的知识点,学完就该去试试混淆了
理论基础
基本块分割就是将一个基本块分割成等价的若干基本块,在分割后的基本块之间加上无条件跳转语句。这个玩意不能算是代码混淆,但是可以提高某些代码混淆的混淆效果。
大概就是这么个效果
在许多基于基本块的代码混淆中,基本块数量越多,代码混淆后的复杂度越大。通过基本块分割来增加基本块的数量,可以到达提高混淆效果的目的。
具体的做法是,首先遍历每个函数中的基本块,对每个基本块进行分割。但是,如果该基本块包含phi指令的话我们会选择跳过他,因为phi的值会根据前驱块来确定。
例如,现在我们基本块3的前驱是基本块1和基本块2,且基本块3包含phi指令。若我们对基本块3进行分割,将其分割为3-1 3-2和3-3,对于3-1,其前驱块为基本块1 2,那么phi指令的值还是由1 2决定;但是3-2的前驱块变成了3-1,此时phi指令的值不再由基本块1 2决定,则phi指令会发生错误。这个问题是可以被修复的,但是十分复杂。
基本块分割主要使用了这些API
- 额外参数指定:读取opt指令中的自定义参数,来确定我们要把一个块分割成几个块
- splitBasicBlock():将一个基本块从特定的地方分割为两个基本块
- isa<>函数:判断一个数据是不是我们想要的类型
额外参数指定
在 LLVM 中,可以通过 cl::opt 模板类获取指令中的参数,这里的 opt 是选项 option 的缩写,不是优化器的意思。
#include "llvm/Support/CommandLine.h"
//可选的参数,指定一个基本块会被分裂成几个基本块,默认值为2
static cl::opt<int> splitNum( "split_num", cl::init(2), cl::desc( "Split <split_num> time(s) each BB"));
"split_num"是参数名字,init(2)的2是参数的默认值,也就是如果我们opt指令中没有自定义参数,那么参数会被赋值为多少,最后的desc是对参数的描述。
可以看到模板的类型指定为int,那么splitNum就可以直接当做int类型的数据使用了
opt -load ../Build/LLVMObfuscator.so -split -split_num 5 -S TestProgram.ll -o TestProgram_split.ll
我们在opt指令中指定了split_num的值为5,那么splitNum的指定值就不再是默认的2,而是5了
splitBasicBlock函数
splitBasicBlock 函数是 BasicBlock 类的一个成员函数。在BasicBlock.h 头文件里可以看到这个函数的两种用法。
(头文件在llvm/include/llvm内可以查看)
// Split the basic block into two basic blocks at the specified instruction.
///
/// If \p Before is true, splitBasicBlockBefore handles the
/// block splitting. Otherwise, execution proceeds as described below.
///
/// Note that all instructions BEFORE the specified iterator
/// stay as part of the original basic block, an unconditional branch is added
/// to the original BB, and the rest of the instructions in the BB are moved
/// to the new BB, including the old terminator. The newly formed basic block
/// is returned. This function invalidates the specified iterator.
///
/// Note that this only works on well formed basic blocks (must have a
/// terminator), and \p 'I' must not be the end of instruction list (which
/// would cause a degenerate basic block to be formed, having a terminator
/// inside of the basic block).
///
/// Also note that this doesn't preserve any passes. To split blocks while
/// keeping loop information consistent, use the SplitBlock utility function.
BasicBlock *splitBasicBlock(iterator I, const Twine &BBName = "",
bool Before = false);
BasicBlock *splitBasicBlock(Instruction *I, const Twine &BBName = "",
bool Before = false) {
return splitBasicBlock(I->getIterator(), BBName, Before);
}
可以看到两种用法以及十分详细的注释。
他们的区别在于第一个参数类型不同。第一种用法的类型是迭代器iterator,第二种则是Instruction指针类型。我们基本使用第二种方法。这个意思是将原先的基本块在指令I处一分为二,I之前的指令会放在第一个基本块内,之后的会放在第二个基本块内,随后会建立一个从第一个基本块到第二个基本块的绝对跳转。
Twine参数可以当做字符串用,他是分裂出来的新的基本块的名称。
Before为true则分裂出来的第二个基本块将移到第一个基本块的前面,但我们一般不会这么用。
isa<>函数
isa<> 是一个模板函数,用于判断一个指针指向的数据的类型是不是给定的类型,类似于 Java 中的 instanceof。
bool SplitBasicBlock::containsPHI(BasicBlock *BB)
{
for (Instruction &I : *BB)
{
if (isa<PHINode>(&I))
{
return true;
}
}
return false;
}
这个函数用来判断基本块中是否包含有phi指令。
代码实现
接着之前HelloWorld那个CMake项目来写,在src里面创建SplitBasicBlock.cpp,这就是今天的主要内容
#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/IR/Instructions.h"
#include <vector>
using namespace std;
using namespace llvm;
// 可选的参数,指定一个基本块会被分裂成几个基本块,默认值为 2
static cl::opt <int> splitNum("split_num", cl::init(2), cl::desc("Split <split_num> time(s) each BB"));
namespace
{
class SplitBasicBlock : public FunctionPass
{
public:
static char ID;
SplitBasicBlock() : FunctionPass(ID) {}
bool runOnFunction(Function &F);
// 判断一个基本块中是否包含 PHI指令(PHINode)
bool containsPHI(BasicBlock *BB);
// 对单个基本块执行分裂操作
void split(BasicBlock *BB);
};
}
bool SplitBasicBlock::runOnFunction(Function &F)
{
// 用vector保存最初的基本块,为了不影响后续的遍历等操作
vector <BasicBlock*> origBB;
for (BasicBlock &BB : F)
{
origBB.push_back(&BB);
}
// 对每个不包含 PHI 指令的基本块执行分裂操作
for(BasicBlock *BB : origBB)
{
if (!containsPHI(BB))
{
split(BB);
}
}
return true;
}
bool SplitBasicBlock::containsPHI(BasicBlock *BB)
{
for (Instruction &I : *BB)
{
if (isa<PHINode>(&I)) return true;
}
return false;
}
void SplitBasicBlock::split(BasicBlock *BB)
{
// 计算分裂后每个基本块的大小:原基本块的大小 / 分裂数目(向上取整)
int splitSize = (BB -> size() + splitNum - 1) / splitNum;
BasicBlock *curBB = BB;
for (int i = 1; i <= splitNum - 1; ++i)
{
int cnt = 0;
for (Instruction &I : *curBB)
{
if (cnt++ == splitSize)
{
// 在 I 指令处对基本块进行分割
curBB = curBB -> splitBasicBlock(&I);
break;
}
}
}
}
char SplitBasicBlock::ID = 0; // 初始化ID
static RegisterPass<SplitBasicBlock> X("split", "Split One Basic Block into Multiple Blocks.");
我们在 llvm 命名空间里添加一个函数 FunctionPass* createSplitBasicBlockPass()
,这样的话其他 LLVM Pass 就可以通过引入头文件 SplitBasicBlock.h
,调用createSplitBasicBlockPass
函数来创建一个 SplitBasicBlock Pass,完成基本块的分割,这对于以后的写混淆帮助非常大。
#ifndef _SPLIT_BASIC_BLOCK_H
#define _SPLIT_BASIC_BLOCK_H
#include "llvm/IR/Function.h"
#include "llvm/Pass.h"
namespace llvm
{
FunctionPass* createSplitBasicBlockPass();
}
然后在刚刚的cpp里面引用SplitBasicBlock.h,并且加入下面的函数
FunctionPass* createSplitBasicBlockPass()
{
return new SplitBasicBlock();
}
以后就可以直接调用createSplitBasicBlockPass函数来获取FunctionPass,然后再调用runOnFunction方法就可以快速完成基本块分割了。
在CMakeList里面加入src/SplitBasicBlock.cpp,并且在test.sh里面加入opt -load ../Build/LLVMObfuscator.so -split -S IR/TestProgram.ll -o IR/TestProgram_split.ll
运行.sh,可以看到TestProgram_split.ll
; ModuleID = 'IR/TestProgram.ll'
source_filename = "TestProgram.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
@input = dso_local global [100 x i8] zeroinitializer, align 16
@enc = dso_local global <{ [22 x i8], [78 x i8] }> <{ [22 x i8] c"\86\8A}\87\93\8BM\81\80\8AC\7FII\86q\7FbSi(\9D", [78 x i8] zeroinitializer }>, align 16
@.str = private unnamed_addr constant [25 x i8] c"Please input your flag: \00", align 1
@.str.1 = private unnamed_addr constant [3 x i8] c"%s\00", align 1
@.str.2 = private unnamed_addr constant [18 x i8] c"Congratulations~\0A\00", align 1
@.str.3 = private unnamed_addr constant [18 x i8] c"Sorry try again.\0A\00", align 1
; Function Attrs: noinline nounwind optnone uwtable mustprogress
define dso_local void @_Z7encryptPhPc(i8* %0, i8* %1) #0 {
%3 = alloca i8*, align 8
%4 = alloca i8*, align 8
%5 = alloca i32, align 4
%6 = alloca i32, align 4
store i8* %0, i8** %3, align 8
store i8* %1, i8** %4, align 8
br label %7
7: ; preds = %2
%8 = load i8*, i8** %4, align 8
%9 = call i64 @strlen(i8* %8) #5
%10 = trunc i64 %9 to i32
store i32 %10, i32* %5, align 4
store i32 0, i32* %6, align 4
br label %11
11: ; preds = %37, %7
%12 = load i32, i32* %6, align 4
%13 = load i32, i32* %5, align 4
br label %14
14: ; preds = %11
%15 = icmp slt i32 %12, %13
br i1 %15, label %16, label %38
16: ; preds = %14
%17 = load i8*, i8** %4, align 8
%18 = load i32, i32* %6, align 4
%19 = sext i32 %18 to i64
%20 = getelementptr inbounds i8, i8* %17, i64 %19
%21 = load i8, i8* %20, align 1
%22 = sext i8 %21 to i32
%23 = load i32, i32* %6, align 4
%24 = sub nsw i32 32, %23
%25 = add nsw i32 %22, %24
br label %26
26: ; preds = %16
%27 = load i32, i32* %6, align 4
%28 = xor i32 %25, %27
%29 = trunc i32 %28 to i8
%30 = load i8*, i8** %3, align 8
%31 = load i32, i32* %6, align 4
%32 = sext i32 %31 to i64
%33 = getelementptr inbounds i8, i8* %30, i64 %32
store i8 %29, i8* %33, align 1
br label %34
34: ; preds = %26
%35 = load i32, i32* %6, align 4
%36 = add nsw i32 %35, 1
br label %37
37: ; preds = %34
store i32 %36, i32* %6, align 4
br label %11, !llvm.loop !2
38: ; preds = %14
ret void
}
; Function Attrs: nounwind readonly willreturn
declare dso_local i64 @strlen(i8*) #1
; Function Attrs: noinline norecurse optnone uwtable mustprogress
define dso_local i32 @main(i32 %0, i8** %1) #2 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca [100 x i8], align 16
%7 = alloca i8, align 1
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
%8 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([25 x i8], [25 x i8]* @.str, i64 0, i64 0))
br label %9
9: ; preds = %2
%10 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str.1, i64 0, i64 0), i8* getelementptr inbounds ([100 x i8], [100 x i8]* @input, i64 0, i64 0))
%11 = bitcast [100 x i8]* %6 to i8*
call void @llvm.memset.p0i8.i64(i8* align 16 %11, i8 0, i64 100, i1 false)
%12 = getelementptr inbounds [100 x i8], [100 x i8]* %6, i64 0, i64 0
call void @_Z7encryptPhPc(i8* %12, i8* getelementptr inbounds ([100 x i8], [100 x i8]* @input, i64 0, i64 0))
%13 = call i64 @strlen(i8* getelementptr inbounds ([100 x i8], [100 x i8]* @input, i64 0, i64 0)) #5
%14 = icmp eq i64 %13, 22
br i1 %14, label %15, label %21
15: ; preds = %9
%16 = getelementptr inbounds [100 x i8], [100 x i8]* %6, i64 0, i64 0
%17 = call i32 @memcmp(i8* %16, i8* getelementptr inbounds ([100 x i8], [100 x i8]* bitcast (<{ [22 x i8], [78 x i8] }>* @enc to [100 x i8]*), i64 0, i64 0), i64 22) #5
%18 = icmp ne i32 %17, 0
br label %19
19: ; preds = %15
%20 = xor i1 %18, true
br label %21
21: ; preds = %19, %9
%22 = phi i1 [ false, %9 ], [ %20, %19 ]
%23 = zext i1 %22 to i8
store i8 %23, i8* %7, align 1
%24 = load i8, i8* %7, align 1
%25 = trunc i8 %24 to i1
br i1 %25, label %26, label %29
26: ; preds = %21
%27 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([18 x i8], [18 x i8]* @.str.2, i64 0, i64 0))
br label %28
28: ; preds = %26
br label %32
29: ; preds = %21
%30 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([18 x i8], [18 x i8]* @.str.3, i64 0, i64 0))
br label %31
31: ; preds = %29
br label %32
32: ; preds = %31, %28
%33 = load i32, i32* %3, align 4
br label %34
34: ; preds = %32
ret i32 %33
}
declare dso_local i32 @printf(i8*, ...) #3
declare dso_local i32 @__isoc99_scanf(i8*, ...) #3
; Function Attrs: argmemonly nofree nosync nounwind willreturn writeonly
declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg) #4
; Function Attrs: nounwind readonly willreturn
declare dso_local i32 @memcmp(i8*, i8*, i64) #1
attributes #0 = { noinline nounwind optnone uwtable mustprogress "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind readonly willreturn "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { noinline norecurse optnone uwtable mustprogress "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #3 = { "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #4 = { argmemonly nofree nosync nounwind willreturn writeonly }
attributes #5 = { nounwind readonly willreturn }
!llvm.module.flags = !{!0}
!llvm.ident = !{!1}
!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 12.0.1"}
!2 = distinct !{!2, !3}
!3 = !{!"llvm.loop.mustprogress"}
可以对照之前原版的.ll来看看,他的块都被分割开了,行数变多了
我们把.ll编译一下然后运行看看
clang IR/TestProgram_split.ll -o Bin/TestProgram_split
./Bin/TestProgram_split flag{s1mpl3_11vm_d3m0}
两个程序都运行成功了
试试在ida里面看看这个_split的二进制程序
看得出来没啥变化,因为编译器会优化LLVM IR代码,就把代码块分割给优化掉了。所以这个玩意得和其他代码混淆一起用才有效果。
标签:i8,分割,llvm,false,i32,align,i64,基本块 From: https://www.cnblogs.com/Here-is-SG/p/16744957.html