信息网络安全 ›› 2025, Vol. 25 ›› Issue (10): 1589-1603.doi: 10.3969/j.issn.1671-1122.2025.10.010

• 理论研究 • 上一篇    下一篇

基于多元语义图的二进制代码相似性检测方法

张璐, 贾鹏(), 刘嘉勇   

  1. 四川大学网络空间安全学院,成都 610207
  • 收稿日期:2024-06-05 出版日期:2025-10-10 发布日期:2025-11-07
  • 通讯作者: 贾鹏 E-mail:pengjia@scu.edu.cn
  • 作者简介:张璐(1999—),女,重庆,硕士研究生,主要研究方向为二进制安全|贾鹏(1988—),男,河南,副教授,博士,主要研究方向为漏洞挖掘和软件动静态分析|刘嘉勇(1962—),男,四川,教授,博士,主要研究方向为网络应用安全和信息内容安全
  • 基金资助:
    国家重点研发计划(2023YFB3106600)

Binary Code Similarity Detection Method Based on Multivariate Semantic Graph

ZHANG Lu, JIA Peng(), LIU Jiayong   

  1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610207, China
  • Received:2024-06-05 Online:2025-10-10 Published:2025-11-07
  • Contact: JIA Peng E-mail:pengjia@scu.edu.cn

摘要:

二进制代码相似性检测是代码克隆、漏洞搜索、软件盗窃检测等应用的基础。然而,二进制代码在经过编译后丢失了源代码的丰富语义信息,同时由于编译过程的多样性,这些代码通常缺乏有效的特征表达。针对这一挑战,文章提出一种创新的相似性检测架构——SiamGGCN,该架构融合了门控图神经网络和注意力机制,并引入了一种多元语义图。该多元语义图有效结合汇编语言的控制流信息、顺序流信息和数据流信息,为二进制代码的相似性检测提供了更加准确和全面的语义解析。文章在多个数据集和广泛的场景下对所提方法进行了实验验证。实验结果表明,SiamGGCN在精确率和召回率上均显著优于现有方法,充分证明了其在二进制代码相似性检测领域的优越性能和应用潜力。

关键词: 代码相似性, 二进制分析, 图神经网络, 图嵌入

Abstract:

Binary code similarity detection is the basis for applications such as code cloning, vulnerability search, and software theft detection. However, binary codes lose the rich semantic information of the source code after compilation, while these codes often lack effective feature representation due to the diversity of the compilation process. To address this challenge, this paper proposed an innovative similarity detection architecture-SiamGGCN, which fused gated graph neural networks and attention mechanisms, and creatively introduced a multivariate semantic graph, which effectively combined the control flow information, sequence flow information and data flow information of assembly language, and provided a more accurate and comprehensive semantic parsing for similarity detection of binary codes. In this paper, the proposed method was experimentally validated on multiple datasets and a wide range of scenarios. The experimental results show that SiamGGCN significantly outperform the existing methods in terms of precision and recall, which fully demonstrates its superior performance and application potential in the field of binary code similarity detection.

Key words: code similarity, binary analysis, graph neural networks, graph embedding

中图分类号: