信息网络安全 ›› 2025, Vol. 25 ›› Issue (10): 1589-1603.doi: 10.3969/j.issn.1671-1122.2025.10.010
收稿日期:2024-06-05
出版日期:2025-10-10
发布日期:2025-11-07
通讯作者:
贾鹏
E-mail:pengjia@scu.edu.cn
作者简介:张璐(1999—),女,重庆,硕士研究生,主要研究方向为二进制安全|贾鹏(1988—),男,河南,副教授,博士,主要研究方向为漏洞挖掘和软件动静态分析|刘嘉勇(1962—),男,四川,教授,博士,主要研究方向为网络应用安全和信息内容安全
基金资助:
ZHANG Lu, JIA Peng(
), LIU Jiayong
Received:2024-06-05
Online:2025-10-10
Published:2025-11-07
Contact:
JIA Peng
E-mail:pengjia@scu.edu.cn
摘要:
二进制代码相似性检测是代码克隆、漏洞搜索、软件盗窃检测等应用的基础。然而,二进制代码在经过编译后丢失了源代码的丰富语义信息,同时由于编译过程的多样性,这些代码通常缺乏有效的特征表达。针对这一挑战,文章提出一种创新的相似性检测架构——SiamGGCN,该架构融合了门控图神经网络和注意力机制,并引入了一种多元语义图。该多元语义图有效结合汇编语言的控制流信息、顺序流信息和数据流信息,为二进制代码的相似性检测提供了更加准确和全面的语义解析。文章在多个数据集和广泛的场景下对所提方法进行了实验验证。实验结果表明,SiamGGCN在精确率和召回率上均显著优于现有方法,充分证明了其在二进制代码相似性检测领域的优越性能和应用潜力。
中图分类号:
张璐, 贾鹏, 刘嘉勇. 基于多元语义图的二进制代码相似性检测方法[J]. 信息网络安全, 2025, 25(10): 1589-1603.
ZHANG Lu, JIA Peng, LIU Jiayong. Binary Code Similarity Detection Method Based on Multivariate Semantic Graph[J]. Netinfo Security, 2025, 25(10): 1589-1603.
表3
不同优化级别的对比
| 数据集 | 方法 | 精确率 | 召回率 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| -O0-O3 | -O1-O3 | -O2-O3 | 均值 | -O0-O3 | -O1-O3 | -O2-O3 | 均值 | ||
| Findutils | Gemini | 79.85% | 73.77% | 84.55% | 79.39% | 91.33% | 90.05% | 93.35% | 91.57% |
| SAFE | 97.89% | 96.92% | 98.85% | 97.88% | 97.80% | 96.80% | 98.26% | 97.62% | |
| GraphEmb | 97.17% | 98.12% | 95.95% | 97.08% | 98.36% | 98.65% | 99.49% | 98.83% | |
| Palmtree | 94.02% | 95.71% | 95.85% | 95.19% | 94.97% | 96.11% | 96.23% | 95.77% | |
| SiamGGCN | 98.73% | 98.96% | 99.55% | 98.98% | 98.72% | 98.66% | 99.55% | 98.97% | |
| Coreutils | Gemini | 92.63% | 98.96% | 81.66% | 91.08% | 97.22% | 98.81% | 96.11% | 97.38% |
| SAFE | 95.59% | 95.91% | 98.83% | 96.77% | 95.00% | 95.83% | 98.80% | 96.54% | |
| GraphEmb | 98.81% | 99.31% | 98.70% | 98.94% | 96.61% | 99.49% | 99.32% | 98.47% | |
| Palmtree | 92.31% | 96.25% | 96.11% | 94.89% | 93.67% | 97.26% | 97.21% | 96.05% | |
| SiamGGCN | 99.29% | 99.66% | 99.58% | 99.51% | 99.28% | 99.65% | 99.57% | 99.50% | |
| Mixdatasets | Gemini | 77.31% | 66.52% | 87.79% | 77.21% | 93.96% | 93.31% | 96.56% | 94.61% |
| SAFE | 93.55% | 94.91% | 90.31% | 92.92% | 93.51% | 94.63% | 89.08% | 92.40% | |
| GraphEmb | 92.51% | 96.75% | 93.12% | 94.12% | 97.33% | 97.94% | 98.71% | 97.99% | |
| Palmtree | 91.22% | 92.98% | 94.51% | 92.90% | 91.67% | 93.51% | 94.52% | 93.23% | |
| SiamGGCN | 97.78% | 97.99% | 98.98% | 98.25% | 97.72% | 97.96% | 98.90% | 98.19% | |
表4
不同网络结构的影响
| 数据集 | 模型 | 精确率 | 召回率 | ||||
|---|---|---|---|---|---|---|---|
| CC | CO | CA | CC | CO | CA | ||
| Findutils | GCN | 83.96% | 67.31% | 95.88% | 82.46% | 66.79% | 95.61% |
| No-attention | 92.59% | 82.75% | 96.32% | 92.58% | 81.52% | 96.08% | |
| Two-layer | 98.13% | 97.34% | 97.76% | 98.11% | 97.22% | 97.67% | |
| One-layer | 98.97% | 99.73% | 98.54% | 98.95% | 99.72% | 98.51% | |
| Coreutils | GCN | 79.29% | 70.98% | 94.91% | 78.27% | 69.49% | 94.41% |
| No-attention | 78.91% | 72.36% | 92.39% | 78.19% | 71.53% | 91.19% | |
| Two-layer | 93.73% | 99.26% | 95.05% | 93.47% | 99.25% | 94.69% | |
| One-layer | 98.09% | 97.69% | 98.91% | 97.93% | 97.67% | 98.82% | |
| Binutils | GCN | 93.42% | 64.86% | 99.92% | 93.33% | 63.93% | 99.91% |
| No-attention | 96.65% | 66.31% | 97.62% | 96.45% | 66.14% | 97.61% | |
| Two-layer | 97.89% | 99.78% | 99.90% | 97.89% | 99.77% | 99.90% | |
| One-layer | 98.77% | 99.90% | 99.97% | 98.75% | 99.89% | 99.97% | |
表5
漏洞搜索的表现
| 项目 | CVE | 脆弱函数 | Gemini | SAFE | GraphEmb | 本文 |
|---|---|---|---|---|---|---|
| OpenSSL | 2014-0160 | tls1_process_heartbeat | 86.3% | 93.2% | 88.6% | 95.5% |
| 2015-1791 | ssl3_get_new_session_ticket | |||||
| 2016-6304 | ssl_parse_clienthello_tlsext | |||||
| 2021-3711 | EVP_PKEY_decrypt | |||||
| Libav | 2016-8675 | get_vlc2 | 78.9% | 84.8% | 75.8% | 87.9% |
| 2017-9051 | nsv_read_chunk | |||||
| 2017-16803 | smacker_decode_tree | |||||
| Libarchive | 2016-4302 | parse_codes | 72.7% | 81.8% | 90.9% | 90.9% |
| [1] | DAVID Y, PARTUSH N, YAHAV E. Similarity of Binaries through Re-Optimization[C]// ACM. The 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2017: 79-94. |
| [2] | EYAL I, JONAS, RON I. Karta[EB/OL]. (2022-03-15)[2024-06-03]. https://github.com/CheckPointSW/Karta. |
| [3] | PEWNY J, SCHUSTER F, BERNHARD L, et al. Leveraging Semantic Signatures for Bug Search in Binary Programs[C]// ACM. The 30th Annual Computer Security Applications Conference. New York: ACM, 2014: 406-415. |
| [4] | GAO Debin, REITER M K, SONG D. BinHunt: Automatically Finding Semantic Differences in Binary Programs[EB/OL]. (2008-10-20)[2024-06-03]. https://doi.org/10.1007/978-3-540-88625-9_16. |
| [5] | LUO Lannan, MING Jiang, WU Dinghao, et al. Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software Plagiarism Detection[C]// ACM. The 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014: 389-400. |
| [6] | FENG Qian, ZHOU Rundong, XU Chengcheng, et al. Scalable Graph-Based Bug Search for Firmware Images[C]// ACM. The 2016 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2016: 480-491. |
| [7] | LIU Bingchang, HUO Wei, ZHANG Chao, et al. αDiff: Cross-Version Binary Code Similarity Detection with DNN[C]// ACM. The 33rd ACM/IEEE International Conference on Automated Software Engineering. New York: ACM, 2018: 667-678. |
| [8] | ZUO Fei, LI Xiaopeng, YOUNG P, et al. Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs[EB/OL]. (2018-12-16)[2024-06-03]. https://arxiv.org/pdf/1808.04706. |
| [9] | YANG Shouguo, CHENG Long, ZENG Yicheng, et al. Asteria: Deep Learning-Based AST-Encoding for Cross-Platform Binary Code Similarity Detection[C]// IEEE. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2021). New York: IEEE, 2021:224-236. |
| [10] | MASSARELLI L, DI-LUNA G A, PETRONI F, et al. SAFE: Self-Attentive Function Embeddings for Binary Similarity[EB/OL]. (2019-12-19)[2024-06-03]. https://doi.org/10.48550/arXiv.1811.05296. |
| [11] | MASSARELLI L, DI-LUNA G A, PETRONI F, et al. Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis[EB/OL].(2019-02-24)[2024-06-03].https://dx.doi.org/10.14722/bar.2019.23020. |
| [12] | DING S H H, FUNG B C M, CHARLAND P. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization[C]// IEEE. 2019 IEEE Symposium on Security and Privacy. New York: IEEE, 2019: 472-489. |
| [13] | LI Xuezixiang, YU Qu, YIN Heng. PalmTree: Learning an Assembly Language Model for Instruction Embedding[C]// ACM. The 2021 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2021: 3236-3251. |
| [14] | WANG Hao, QU Wenjie, KATZ G, et al. JTrans: Jump-Aware Transformer for Binary Code Similarity Detection[C]// ACM. The 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2022: 1-13. |
| [15] | PEI Kexin, XUAN Zhou, YANG Junfeng, et al. TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity[EB/OL]. (2021-04-26)[2024-06-03]. https://doi.org/10.48550/arXiv.2012.08680. |
| [16] | XU Xiaojun, LIU Chang, FENG Qian, et al. Neural Network-Based Graph Embedding for Cross-Platform Binary Code Similarity Detection[C]// ACM. 2017 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 363-376. |
| [17] | GAO Jian, YANG Xin, FU Ying, et al. VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-Platform Binary[C]// ACM. The 33rd ACM/IEEE International Conference on Automated Software Engineering. New York: ACM, 2018: 896-899. |
| [18] | YU Zeping, CAO Rui, TANG Qiyi, et al. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection[C]// AAAI. The AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2021:1145-1152. |
| [19] | LUO Zhenhao, WANG Pengfei, WANG Baosheng, et al. VulHawk: Cross-Architecture Vulnerability Detection with Entropy-Based Binary Code Search[EB/OL].(2023-02-27)[2024-06-03].https://dx.doi.org/10.14722/ndss.2023.24415. |
| [20] | NAIR A, ROY A, MEINKE K. FuncGNN: A Graph Neural Network Approach to Program Similarity[C]// ACM. The 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. New York: ACM, 2020: 1-11. |
| [21] | PEWNY J, GARMANY B, GAWLIK R, et al. Cross-Architecture Bug Search in Binary Executables[C]// IEEE. 2015 IEEE Symposium on Security and Privacy. New York: IEEE, 2015: 709-724. |
| [22] | ESCHWEILER S, YAKDAN K, GERHARDS-PADILLA E. DiscovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code[EB/OL].(2016-02-21)[2024-06-03].http://dx.doi.org/10.14722/ndss.2016.23185. |
| [23] | CHRISTIAN B, ALEXANDER J, PRATIK C, et al. Bindiff[EB/OL]. (2024-01-05)[2024-06-03]. https://github.com/google/bindiff. |
| [24] |
LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-Based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
doi: 10.1109/5.726791 URL |
| [25] |
HOCHREITER S, SCHMIDHUBER J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276 |
| [26] | DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// NAACL. The 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186. |
| [27] | AHN S, AHN S, KOO H, et al. Practical Binary Code Similarity Detection with BERT-Based Transferable Similarity Learning[C]// ACM. The 38th Annual Computer Security Applications Conference. New York: ACM, 2022: 361-374. |
| [28] | WILLIAM L, HAMILTON, REX Y, et al. Inductive Representation Learning on Large Graphs[C]// NIPS. The 31st International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2017: 1025-1035. |
| [29] |
SCARSELLI F, GORI M, TSOI A C, et al. The Graph Neural Network Model[J]. IEEE Transactions on Neural Networks, 2008, 20(1): 61-80.
doi: 10.1109/TNN.2008.2005605 URL |
| [30] | VECTOR 35. Binary Ninja[EB/OL]. (2023-03-05)[2024-06-03]. https://binary.ninja/. |
| [31] | MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL].(2013-09-07)[2024-06-03].https://doi.org/10.48550/arXiv.1301.3781. |
| [32] | WANG Minjie, YU Lingfan. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs[EB/OL].(2019-08-25)[2024-06-03].https://doi.org/10.48550/arXiv.1909.01315. |
| [33] | LI Yujia, TARLOW D, BROCKSCHMIDT M, et al. Gated Graph Sequence Neural Networks[EB/OL].(2017-09-22)[2024-06-03].https://doi.org/10.48550/arXiv.1511.05493. |
| [34] | VELICKOVIC P, CUCURULL G, CASANOVA A, et al. Graph Attention Networks[EB/OL].(2018-02-04)[2024-06-03].https://doi.org/10.48550/arXiv.1710.10903. |
| [35] | BROMLEY J, GUYON I, LECUN Y, et al. Signature Verification Using a "Siamese" Time Delay Neural Network[C]// NIPS. The 7th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1993:737-744. |
| [36] | YOOH H, DONGKWAN K, JOSH B, et al. Binkit[EB/OL]. (2023-04-03)[2024-06-03]. https://github.com/SoftSec-KAIST/BinKit. |
| [37] | WANG Xinda, SUN Kun, BATCHELLER A, et al. Detecting" 0-day" Vulnerability: An Empirical Study of Secret Security Patch in OSS[C]// IEEE. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. New York: IEEE, 2019: 485-492. |
| [1] | 薛磊, 张际灿, 杜平心. 基于增强型语义程序依赖图的智能化二进制分析方法[J]. 信息网络安全, 2025, 25(9): 1357-1366. |
| [2] | 李骁, 宋晓, 李勇. 基于知识蒸馏的医疗诊断差分隐私方法研究[J]. 信息网络安全, 2025, 25(4): 524-535. |
| [3] | 刘晨飞, 万良. 基于时空图神经网络的CAN总线入侵检测方法[J]. 信息网络安全, 2025, 25(3): 478-493. |
| [4] | 李涛, 程柏丰. 基于图神经网络的网络资产主动识别技术研究[J]. 信息网络安全, 2025, 25(10): 1615-1626. |
| [5] | 王彦昕, 贾鹏, 范希明, 彭熙. C/C++代码跨形态相似性检测技术研究[J]. 信息网络安全, 2025, 25(10): 1627-1638. |
| [6] | 刘强, 王坚, 王亚男, 王珊. 基于集成学习的恶意代码动态检测方法[J]. 信息网络安全, 2025, 25(1): 159-172. |
| [7] | 王健, 陈琳, 王凯崙, 刘吉强. 基于时空图神经网络的应用层DDoS攻击检测方法[J]. 信息网络安全, 2024, 24(4): 509-519. |
| [8] | 张新有, 孙峰, 冯力, 邢焕来. 基于多视图表征的虚假新闻检测[J]. 信息网络安全, 2024, 24(3): 438-448. |
| [9] | 余尚戎, 肖景博, 殷琪林, 卢伟. 关注社交异配性的社交机器人检测框架[J]. 信息网络安全, 2024, 24(2): 319-327. |
| [10] | 李奕轩, 贾鹏, 范希明, 陈尘. 基于控制流变换的恶意程序检测GNN模型对抗样本生成方法[J]. 信息网络安全, 2024, 24(12): 1896-1910. |
| [11] | 张选, 万良, 罗恒, 杨阳. 基于两阶段图学习的僵尸网络自动化检测方法[J]. 信息网络安全, 2024, 24(12): 1933-1947. |
| [12] | 李鹏超, 张全涛, 胡源. 基于双注意力机制图神经网络的智能合约漏洞检测方法[J]. 信息网络安全, 2024, 24(11): 1624-1631. |
| [13] | 马卓, 陈东子, 何佳涵, 王群. 基于多因素解纠缠的用户—兴趣点联合预测[J]. 信息网络安全, 2024, 24(11): 1685-1695. |
| [14] | 芦效峰, 程天泽, 龙承念. 基于随机游走的图神经网络黑盒对抗攻击[J]. 信息网络安全, 2024, 24(10): 1570-1577. |
| [15] | 秦中元, 马楠, 余亚聪, 陈立全. 基于双重图神经网络和自编码器的网络异常检测[J]. 信息网络安全, 2023, 23(9): 1-11. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||