Netinfo Security ›› 2025, Vol. 25 ›› Issue (10): 1589-1603.doi: 10.3969/j.issn.1671-1122.2025.10.010
Previous Articles Next Articles
ZHANG Lu, JIA Peng(
), LIU Jiayong
Received:2024-06-05
Online:2025-10-10
Published:2025-11-07
Contact:
JIA Peng
E-mail:pengjia@scu.edu.cn
CLC Number:
ZHANG Lu, JIA Peng, LIU Jiayong. Binary Code Similarity Detection Method Based on Multivariate Semantic Graph[J]. Netinfo Security, 2025, 25(10): 1589-1603.
Add to citation manager EndNote|Ris|BibTeX
URL: http://netinfo-security.org/EN/10.3969/j.issn.1671-1122.2025.10.010
| 数据集 | 方法 | 精确率 | 召回率 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| -O0-O3 | -O1-O3 | -O2-O3 | 均值 | -O0-O3 | -O1-O3 | -O2-O3 | 均值 | ||
| Findutils | Gemini | 79.85% | 73.77% | 84.55% | 79.39% | 91.33% | 90.05% | 93.35% | 91.57% |
| SAFE | 97.89% | 96.92% | 98.85% | 97.88% | 97.80% | 96.80% | 98.26% | 97.62% | |
| GraphEmb | 97.17% | 98.12% | 95.95% | 97.08% | 98.36% | 98.65% | 99.49% | 98.83% | |
| Palmtree | 94.02% | 95.71% | 95.85% | 95.19% | 94.97% | 96.11% | 96.23% | 95.77% | |
| SiamGGCN | 98.73% | 98.96% | 99.55% | 98.98% | 98.72% | 98.66% | 99.55% | 98.97% | |
| Coreutils | Gemini | 92.63% | 98.96% | 81.66% | 91.08% | 97.22% | 98.81% | 96.11% | 97.38% |
| SAFE | 95.59% | 95.91% | 98.83% | 96.77% | 95.00% | 95.83% | 98.80% | 96.54% | |
| GraphEmb | 98.81% | 99.31% | 98.70% | 98.94% | 96.61% | 99.49% | 99.32% | 98.47% | |
| Palmtree | 92.31% | 96.25% | 96.11% | 94.89% | 93.67% | 97.26% | 97.21% | 96.05% | |
| SiamGGCN | 99.29% | 99.66% | 99.58% | 99.51% | 99.28% | 99.65% | 99.57% | 99.50% | |
| Mixdatasets | Gemini | 77.31% | 66.52% | 87.79% | 77.21% | 93.96% | 93.31% | 96.56% | 94.61% |
| SAFE | 93.55% | 94.91% | 90.31% | 92.92% | 93.51% | 94.63% | 89.08% | 92.40% | |
| GraphEmb | 92.51% | 96.75% | 93.12% | 94.12% | 97.33% | 97.94% | 98.71% | 97.99% | |
| Palmtree | 91.22% | 92.98% | 94.51% | 92.90% | 91.67% | 93.51% | 94.52% | 93.23% | |
| SiamGGCN | 97.78% | 97.99% | 98.98% | 98.25% | 97.72% | 97.96% | 98.90% | 98.19% | |
| 数据集 | 模型 | 精确率 | 召回率 | ||||
|---|---|---|---|---|---|---|---|
| CC | CO | CA | CC | CO | CA | ||
| Findutils | GCN | 83.96% | 67.31% | 95.88% | 82.46% | 66.79% | 95.61% |
| No-attention | 92.59% | 82.75% | 96.32% | 92.58% | 81.52% | 96.08% | |
| Two-layer | 98.13% | 97.34% | 97.76% | 98.11% | 97.22% | 97.67% | |
| One-layer | 98.97% | 99.73% | 98.54% | 98.95% | 99.72% | 98.51% | |
| Coreutils | GCN | 79.29% | 70.98% | 94.91% | 78.27% | 69.49% | 94.41% |
| No-attention | 78.91% | 72.36% | 92.39% | 78.19% | 71.53% | 91.19% | |
| Two-layer | 93.73% | 99.26% | 95.05% | 93.47% | 99.25% | 94.69% | |
| One-layer | 98.09% | 97.69% | 98.91% | 97.93% | 97.67% | 98.82% | |
| Binutils | GCN | 93.42% | 64.86% | 99.92% | 93.33% | 63.93% | 99.91% |
| No-attention | 96.65% | 66.31% | 97.62% | 96.45% | 66.14% | 97.61% | |
| Two-layer | 97.89% | 99.78% | 99.90% | 97.89% | 99.77% | 99.90% | |
| One-layer | 98.77% | 99.90% | 99.97% | 98.75% | 99.89% | 99.97% | |
| 项目 | CVE | 脆弱函数 | Gemini | SAFE | GraphEmb | 本文 |
|---|---|---|---|---|---|---|
| OpenSSL | 2014-0160 | tls1_process_heartbeat | 86.3% | 93.2% | 88.6% | 95.5% |
| 2015-1791 | ssl3_get_new_session_ticket | |||||
| 2016-6304 | ssl_parse_clienthello_tlsext | |||||
| 2021-3711 | EVP_PKEY_decrypt | |||||
| Libav | 2016-8675 | get_vlc2 | 78.9% | 84.8% | 75.8% | 87.9% |
| 2017-9051 | nsv_read_chunk | |||||
| 2017-16803 | smacker_decode_tree | |||||
| Libarchive | 2016-4302 | parse_codes | 72.7% | 81.8% | 90.9% | 90.9% |
| [1] | DAVID Y, PARTUSH N, YAHAV E. Similarity of Binaries through Re-Optimization[C]// ACM. The 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2017: 79-94. |
| [2] | EYAL I, JONAS, RON I. Karta[EB/OL]. (2022-03-15)[2024-06-03]. https://github.com/CheckPointSW/Karta. |
| [3] | PEWNY J, SCHUSTER F, BERNHARD L, et al. Leveraging Semantic Signatures for Bug Search in Binary Programs[C]// ACM. The 30th Annual Computer Security Applications Conference. New York: ACM, 2014: 406-415. |
| [4] | GAO Debin, REITER M K, SONG D. BinHunt: Automatically Finding Semantic Differences in Binary Programs[EB/OL]. (2008-10-20)[2024-06-03]. https://doi.org/10.1007/978-3-540-88625-9_16. |
| [5] | LUO Lannan, MING Jiang, WU Dinghao, et al. Semantics-Based Obfuscation-Resilient Binary Code Similarity Comparison with Applications to Software Plagiarism Detection[C]// ACM. The 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014: 389-400. |
| [6] | FENG Qian, ZHOU Rundong, XU Chengcheng, et al. Scalable Graph-Based Bug Search for Firmware Images[C]// ACM. The 2016 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2016: 480-491. |
| [7] | LIU Bingchang, HUO Wei, ZHANG Chao, et al. αDiff: Cross-Version Binary Code Similarity Detection with DNN[C]// ACM. The 33rd ACM/IEEE International Conference on Automated Software Engineering. New York: ACM, 2018: 667-678. |
| [8] | ZUO Fei, LI Xiaopeng, YOUNG P, et al. Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs[EB/OL]. (2018-12-16)[2024-06-03]. https://arxiv.org/pdf/1808.04706. |
| [9] | YANG Shouguo, CHENG Long, ZENG Yicheng, et al. Asteria: Deep Learning-Based AST-Encoding for Cross-Platform Binary Code Similarity Detection[C]// IEEE. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2021). New York: IEEE, 2021:224-236. |
| [10] | MASSARELLI L, DI-LUNA G A, PETRONI F, et al. SAFE: Self-Attentive Function Embeddings for Binary Similarity[EB/OL]. (2019-12-19)[2024-06-03]. https://doi.org/10.48550/arXiv.1811.05296. |
| [11] | MASSARELLI L, DI-LUNA G A, PETRONI F, et al. Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis[EB/OL].(2019-02-24)[2024-06-03].https://dx.doi.org/10.14722/bar.2019.23020. |
| [12] | DING S H H, FUNG B C M, CHARLAND P. Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization[C]// IEEE. 2019 IEEE Symposium on Security and Privacy. New York: IEEE, 2019: 472-489. |
| [13] | LI Xuezixiang, YU Qu, YIN Heng. PalmTree: Learning an Assembly Language Model for Instruction Embedding[C]// ACM. The 2021 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2021: 3236-3251. |
| [14] | WANG Hao, QU Wenjie, KATZ G, et al. JTrans: Jump-Aware Transformer for Binary Code Similarity Detection[C]// ACM. The 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2022: 1-13. |
| [15] | PEI Kexin, XUAN Zhou, YANG Junfeng, et al. TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity[EB/OL]. (2021-04-26)[2024-06-03]. https://doi.org/10.48550/arXiv.2012.08680. |
| [16] | XU Xiaojun, LIU Chang, FENG Qian, et al. Neural Network-Based Graph Embedding for Cross-Platform Binary Code Similarity Detection[C]// ACM. 2017 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 363-376. |
| [17] | GAO Jian, YANG Xin, FU Ying, et al. VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-Platform Binary[C]// ACM. The 33rd ACM/IEEE International Conference on Automated Software Engineering. New York: ACM, 2018: 896-899. |
| [18] | YU Zeping, CAO Rui, TANG Qiyi, et al. Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection[C]// AAAI. The AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2021:1145-1152. |
| [19] | LUO Zhenhao, WANG Pengfei, WANG Baosheng, et al. VulHawk: Cross-Architecture Vulnerability Detection with Entropy-Based Binary Code Search[EB/OL].(2023-02-27)[2024-06-03].https://dx.doi.org/10.14722/ndss.2023.24415. |
| [20] | NAIR A, ROY A, MEINKE K. FuncGNN: A Graph Neural Network Approach to Program Similarity[C]// ACM. The 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. New York: ACM, 2020: 1-11. |
| [21] | PEWNY J, GARMANY B, GAWLIK R, et al. Cross-Architecture Bug Search in Binary Executables[C]// IEEE. 2015 IEEE Symposium on Security and Privacy. New York: IEEE, 2015: 709-724. |
| [22] | ESCHWEILER S, YAKDAN K, GERHARDS-PADILLA E. DiscovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code[EB/OL].(2016-02-21)[2024-06-03].http://dx.doi.org/10.14722/ndss.2016.23185. |
| [23] | CHRISTIAN B, ALEXANDER J, PRATIK C, et al. Bindiff[EB/OL]. (2024-01-05)[2024-06-03]. https://github.com/google/bindiff. |
| [24] |
LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-Based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
doi: 10.1109/5.726791 URL |
| [25] |
HOCHREITER S, SCHMIDHUBER J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276 |
| [26] | DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// NAACL. The 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186. |
| [27] | AHN S, AHN S, KOO H, et al. Practical Binary Code Similarity Detection with BERT-Based Transferable Similarity Learning[C]// ACM. The 38th Annual Computer Security Applications Conference. New York: ACM, 2022: 361-374. |
| [28] | WILLIAM L, HAMILTON, REX Y, et al. Inductive Representation Learning on Large Graphs[C]// NIPS. The 31st International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2017: 1025-1035. |
| [29] |
SCARSELLI F, GORI M, TSOI A C, et al. The Graph Neural Network Model[J]. IEEE Transactions on Neural Networks, 2008, 20(1): 61-80.
doi: 10.1109/TNN.2008.2005605 URL |
| [30] | VECTOR 35. Binary Ninja[EB/OL]. (2023-03-05)[2024-06-03]. https://binary.ninja/. |
| [31] | MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL].(2013-09-07)[2024-06-03].https://doi.org/10.48550/arXiv.1301.3781. |
| [32] | WANG Minjie, YU Lingfan. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs[EB/OL].(2019-08-25)[2024-06-03].https://doi.org/10.48550/arXiv.1909.01315. |
| [33] | LI Yujia, TARLOW D, BROCKSCHMIDT M, et al. Gated Graph Sequence Neural Networks[EB/OL].(2017-09-22)[2024-06-03].https://doi.org/10.48550/arXiv.1511.05493. |
| [34] | VELICKOVIC P, CUCURULL G, CASANOVA A, et al. Graph Attention Networks[EB/OL].(2018-02-04)[2024-06-03].https://doi.org/10.48550/arXiv.1710.10903. |
| [35] | BROMLEY J, GUYON I, LECUN Y, et al. Signature Verification Using a "Siamese" Time Delay Neural Network[C]// NIPS. The 7th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 1993:737-744. |
| [36] | YOOH H, DONGKWAN K, JOSH B, et al. Binkit[EB/OL]. (2023-04-03)[2024-06-03]. https://github.com/SoftSec-KAIST/BinKit. |
| [37] | WANG Xinda, SUN Kun, BATCHELLER A, et al. Detecting" 0-day" Vulnerability: An Empirical Study of Secret Security Patch in OSS[C]// IEEE. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. New York: IEEE, 2019: 485-492. |
| [1] | XUE Lei, ZHANG Jican, DU Pingxin. Intelligent Binary Analysis Method Based on Enhanced Semantic Program Dependency Graph [J]. Netinfo Security, 2025, 25(9): 1357-1366. |
| [2] | LI Xiao, SONG Xiao, LI Yong. Research on Differential Privacy Methods for Medical Diagnosis Based on Knowledge Distillation [J]. Netinfo Security, 2025, 25(4): 524-535. |
| [3] | WANG Yanxin, JIA Peng, FAN Ximing, PENG Xi. Research on Cross Form Similarity Detection for C/C++ Code [J]. Netinfo Security, 2025, 25(10): 1627-1638. |
| [4] | ZHANG Xuan, WAN Liang, LUO Heng, YANG Yang. Automated Botnet Detection Method Based on Two-Stage Graph Learning [J]. Netinfo Security, 2024, 24(12): 1933-1947. |
| [5] | MA Zhuo, CHEN Dongzi, HE Jiahan, WANG Qun. Joint Prediction for User and Point of Interest Based on Disentangling Influences [J]. Netinfo Security, 2024, 24(11): 1685-1695. |
| [6] | CHEN Zitong, JIA Peng, LIU Jiayong. Identification Method of Malicious Software Hidden Function Based on Siamese Architecture [J]. Netinfo Security, 2023, 23(5): 62-75. |
| [7] | LIANG Yan, LI Dong, ZHAO Yizhu, YU Junqing. Channel Interference Measurement and Optimization Based on Link Conflict Graph Embedding [J]. Netinfo Security, 2022, 22(9): 76-85. |
| [8] | LIANG Xiaobing, KONG Lingda, LIU Yan, YE Xin. Lightweight Dynamic Binary Instrumentation Algorithm for Embedded Software [J]. Netinfo Security, 2021, 21(4): 89-95. |
| [9] | Yanpeng CUI, Luming FENG, Zheng YAN, Huaqing LIN. Research on Software Security Model of Cloud Computing Based on Program Slicing Technology [J]. Netinfo Security, 2019, 19(7): 31-41. |
| [10] | . Research of Malicious Code in Automatic Unpacking [J]. , 2014, 14(5): 41-. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||