基于增强型语义程序依赖图的智能化二进制分析方法

doi:10.3969/j.issn.1671-1122.2025.09.004

摘要/Abstract

摘要：

在软件安全分析领域，二进制程序分析技术正面临编译器优化复杂化与结构信息缺失带来的双重挑战，传统工具链普遍存在分析流程割裂、依赖人工操作、语义表达不足等问题，难以满足结构化、自动化漏洞挖掘任务的需求。文章提出一种基于增强型语义程序依赖图的智能化二进制分析方法，通过统一建模控制流、数据依赖与符号路径约束信息，实现对程序语义的三维结构化表达。在实验评估中，增强型语义程序依赖图展现了显著的性能优势，在 OpenSSL 项目无优化级别下，SPDG 恢复的基本块数比 Ghidra 提升了60.5%，控制边数提升了42.5%；SPDG在数据依赖追踪上也比 Ghidra 提升了287.1%，恢复了超过13万条数据依赖链。此外，在符号执行覆盖率方面，SPDG 在 OpenSSL 的无优化级别下达到 64.7%，优于Angr的60%。在漏洞检测任务中，SPDG 成功识别了 9 个漏洞样例，仅误报 1 次，准确率达 90.0%，显著高于其他工具。

关键词: 二进制分析, 控制流, 数据流, 符号执行, 程序依赖图

Abstract:

In the field of software security analysis, binary program analysis technology faces the dual challenges of complex compiler optimization and a lack of structural information. Traditional toolchains commonly suffer from fragmented analysis processes, reliance on manual operations, and insufficient semantic expression, making them unable to meet the demands of structured, automated vulnerability discovery. This paper proposed an intelligent binary analysis method based on an enhanced Semantic Program Dependence Graph (SPDG). By uniformly modeling control flow (CFG), data dependency (DDG), and symbolic path constraint information, SPDG achieves a three-dimensional structured representation of program semantics. In experimental evaluations, SPDG demonstrates significant performance advantages. At the unoptimized level of the OpenSSL project, SPDG recoveres 60.5% more basic blocks and 42.5% more control edges than Ghidra. SPDG also improves data dependency tracing by 287.1% over Ghidra, recovering over 130,000 data dependency chains. Furthermore, SPDG achieves 64.7% symbolic execution coverage at the unoptimized level of OpenSSL, surpassing Angr’s 60%. In the vulnerability detection task, SPDG successfully identifies nine vulnerability examples with only one false positive, achieving an accuracy rate of 90.0%, which is significantly higher than other tools.

Key words: binary analysis, control flow, data flow, symbolic execution, program dependency graph

中图分类号:

TP309

薛磊, 张际灿, 杜平心. 基于增强型语义程序依赖图的智能化二进制分析方法[J]. 信息网络安全, 2025, 25(9): 1357-1366.

XUE Lei, ZHANG Jican, DU Pingxin. Intelligent Binary Analysis Method Based on Enhanced Semantic Program Dependency Graph[J]. Netinfo Security, 2025, 25(9): 1357-1366.

图/表 12

图1

表1

图2

表2

表3

表4

表5

表6

图3

表7

表8

表9

参考文献 18

[1]	XU Weiyang, LI Yao, TANG Yong, et al. Research on Cross-Architecture Vulnerabilities Searching in Binary Executables[J]. Netinfo Security, 2017, 17(9): 21-25.
	徐威扬, 李尧, 唐勇, 等. 一种跨指令架构二进制漏洞搜索技术研究[J]. 信息网络安全, 2017, 17(9): 21-25.
[2]	HORWITZ S, REPS T, BINKLEY D. Interprocedural Slicing Using Dependence Graphs[J]. ACM Transactions on Programming Languages and Systems, 1990, 12(1): 26-60.
[3]	THE ANGR PROJECT CONTRIBUTORS. Introduction[EB/OL]. (2021-06-05)[2025-05-10]. https://docs.angr.io/en/latest/quickstart.html.
[4]	BASQUE Z L, BAJAJ A P, GIBBS W, et al. Ahoy SAILR! There Is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation[C]// USENIX. 33rd USENIX Security Symposium. Berkeley: USENIX, 2024: 361-378.
[5]	THE ANGR PROJECT CONTRIBUTORS. Intermediate Representation[EB/OL]. (2021-06-05)[2025-05-10]. https://docs.angr.io/advanced-topics/ir.
[6]	FERGUSON J, KAMINSKY D. Reverse Engineering Code with IDA Pro[M]. Burlington: Syngress, 2008.
[7]	GHIDRA DEVELOPMENT TEAM. How to Create Program Dependency Graph in Ghidra?[EB/OL]. (2021-07-10)[2025-05-10]. https://github.com/NationalSecurityAgency/ghidra/issues/3491.
[8]	KHATTAK U F, AL-NAFFAKH H A H, ALI A. A Review on Graph Representation for Object-Oriented Programming[EB/OL]. (2024-04-05)[2025-05-10]. https://doi.org/10.1051/bioconf/20249700131.
[9]	FERRANTE J, OTTENSTEIN K J, WARREN J D. The Program Dependence Graph and Its Use in Optimization[J]. ACM Transactions on Programming Languages and Systems, 1987, 9(3): 319-349.
[10]	RAMALINGAM G. On Sparse Evaluation Representations[EB/OL]. (2002-04-28)[2025-05-10]. https://doi.org/10.1016/S0304-3975(00)00315-7.
[11]	GITHUB SECURITY LAB. QL Language Specification[EB/OL]. (2021-05-20)[2025-05-10]. https://codeql.github.com/docs/ql-language-reference/ql-language-specification/.
[12]	FRANCIS N, GREEN A, GUAGLIARDO P, et al. Cypher: An Evolving Query Language for Property Graphs[C]// ACM. The 2018 International Conference on Management of Data. New York: ACM, 2018: 1433-1445.
[13]	YAMAGUCHI F, GOLDE N, ARP D, et al. Modeling and Discovering Vulnerabilities with Code Property Graphs[C]// IEEE. 2014 IEEE Symposium on Security and Privacy. New York: IEEE, 2014: 590-604.
[14]	JOERN PROJECT CONTRIBUTORS. Joern Documentation[EB/OL]. (2019-10-23)[2025-05-10]. https://docs.joern.io/quickstart/.
[15]	ALLAMANIS M, BROCKSCHMIDT M, KHADEMI M. Learning to Represent Programs with Graphs[EB/OL]. (2018-05-04)[2025-05-10]. https://doi.org/10.48550/arXiv.1711.00740.
[16]	CADAR C, DUNBAR D, ENGLER D R. Klee: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs[C]// USENIX. 8th Symposium on Operating Systems Design and Implementation. Berkeley: USENIX, 2008: 209-224.
[17]	SHOSHITAISHVILI Y, WANG Ruoyu, SALLS C, et al. SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis[C]// IEEE. 2016 IEEE Symposium on Security and Privacy. New York: IEEE, 2016: 138-157.
[18]	HE Jingxuan, SIVANRUPAN G, TSANKOV P, et al. Learning to Explore Paths for Symbolic Execution[C]// ACM. The 2021 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2021: 2526-2540.

维度	核心参数	描述
控制流构建	cfg_type, context_sensitivity_level	控制快速或模拟执行构建模式，上下文敏感级别（1~3）
数据依赖分析	track_tmps, cross_ function, def_use_threshold	控制是否追踪临时变量、是否启用跨过程追踪与定义使用阈值
符号执行策略	symbolic_memory, concretize_memory, max_steps	控制符号内存启用与路径深度
图数据库同步	sync_nodes, sync_edges, sync_constraints, batch_size	控制Neo4j节点、边、约束同步策略
运行时日志	log_level	控制日志详细程度

类型	描述	用途示例
CYPHER	原生图查询语言	自定义控制流模板、函数调用图提取
SEMANTIC	基于正则的指令与约束查询	指针传播路径、输入检查语义识别
DATAFLOW	路径追踪变量传播路径	敏感源-汇路径挖掘
CONSTRAINED_PATH	路径约束验证与路径可行性求解	特定输入驱动路径验证
VISUALIZATION	指定路径可视化图生成	控制/数据依赖图生成

属性名	类型	含义
address	hex/int	基本块地址
function	hex/int	所属函数入口地址
instructions	list[str]	该块反汇编指令序列
size	int	基本块大小（字节）
constraints	list[str]	路径约束（由符号执行抽取）

边类型	方向性	属性字段	含义
CONTROL_FLOW	有向	identifier	源与目标地址标识符
DATA_DEP	有向	var_type, var_info	依赖变量的类型与符号信息

查询类型	面向语义维度	支持功能
CYPHER	图结构层	通用子图匹配、模式遍历、元数据提取
SEMANTIC	指令+约束组合层	指令序列+符号约束的正则语义模式匹配
DATAFLOW	数据传播层	特定变量在数据依赖图中的路径提取与传播分析
CONSTRAINED_PATH	路径验证层	控制流路径是否满足一组符号约束的验证判断
VISUALIZATION	图可视化辅助层	子图导出为 DOT 图格式，支持外部工具渲染