Abstract
背景:
- 本文研究的不是被恶意植入的后门,而是products of defects in training
- 攻击模式: injecting some small fixed input pattern(backdoor) to induce misclassification
本文: PELICAN
Github: https://github.com/ZhangZhuoSJTU/Pelican
Task: find backdoor vulnerabilities in transformer models of binary analysis
Method: 保留code snippet的原程序语义,同时让trigger成为语义的一部分从而不能轻易eliminate
实验:
datasets:
- Tasks: Disassembly, Function Signature Recovery, Function Name Prediction, Compiler Provenance, Binary Similarity
- models:Disassembly(BiRNN-func, XDA-func, XDA-call), Function Signature Recovery(StateFormer, EKLAVYA, EKLAVYA++), Function Name Prediction(in-nomine, in-nomine++), Compiler Provenance(S2V, S2V++), Binary Similarity(Trex, SAFE, SAFE++, S2V-B, S2V-B++)
- Commercial tools: DeepDi, BinaryAI
效果:
- 能够在white-box and black-box 环境下成功诱发misclassification
- 能在两个商业tools上找到backdoor vulnerabilities