信息网络安全 ›› 2024, Vol. 24 ›› Issue (8): 1231-1240.doi: 10.3969/j.issn.1671-1122.2024.08.009

• 理论研究 • 上一篇    下一篇

基于大语言模型的内生安全异构体生成方法

陈昊然1, 刘宇2, 陈平3()   

  1. 1.复旦大学软件学院,上海 200433
    2.复旦大学计算机科学技术学院,上海 200433
    3.复旦大学大数据研究院,上海 200433
  • 收稿日期:2024-05-13 出版日期:2024-08-10 发布日期:2024-08-22
  • 通讯作者: 陈平 pchen@fudan.edu.cn
  • 作者简介:陈昊然(1997—),男,黑龙江,硕士研究生,主要研究方向为大语言模型|刘宇(1997—),男,黑龙江,博士研究生,主要研究方向为大语言模型、漏洞挖掘|陈平(1985—),男,江苏,研究员,博士,主要研究方向为软件和系统安全。
  • 基金资助:
    国家重点研发计划(2022YFB3102800)

Endogenous Security Heterogeneous Entity Generation Method Based on Large Language Model

CHEN Haoran1, LIU Yu2, CHEN Ping3()   

  1. 1. School of Software, Fudan University, Shanghai 200433, China
    2. School of Computer Science, Fudan University, Shanghai 200433, China
    3. Institute of Big Data, Fudan University, Shanghai 200433, China
  • Received:2024-05-13 Online:2024-08-10 Published:2024-08-22

摘要:

为应对软件系统中未知漏洞和后门带来的安全挑战,文章提出了一种基于大语言模型的内生安全异构体生成方法。该方法以内生安全策略为核心,对程序中安全薄弱的代码执行体进行异构,使得程序在受到攻击时能迅速切换至健康的异构体,保证系统稳定运行。再利用大语言模型生成多样化的异构体,并结合基于种子距离的方法优化现有的模糊测试技术,提高测试用例的生成质量和代码覆盖率,确保这些异构体在功能上的等价性。实验结果表明,该方法能有效修复代码漏洞,并生成功能等价的异构体;此外,相较于现有的AFL算法,优化后的模糊测试方法在达到相同代码覆盖率的情况下,所耗时间更少。因此,文章所提出的方法能够显著提高软件系统的安全性和鲁棒性,为未知威胁的防御提供了新的策略。

关键词: 内生安全, 大语言模型, 模糊测试

Abstract:

To address the security challenges posed by unknown vulnerabilities and backdoors in software systems, the paper proposed an endogenous security heterogeneous entity generation method based on large language models. This method, centered around endogenous security strategies, diversified the execution bodies of code that were vulnerable within the program, enabling the system to swiftly switch to a healthy heterogeneous entity upon attack, thereby ensuring stable operation. Furthermore, it leveraged large language models to generate a variety of heterogeneous entities and optimized existing fuzz testing techniques with a seed distance-based method, enhancing the quality of test case generation and code coverage rates, ensuring the functional equivalence of these heterogeneous entities. Experimental results demonstrate that this method can effectively repair code vulnerabilities and produce functionally equivalent heterogeneous entities. Additionally, compared to the existing AFL algorithm, the optimized fuzz testing method consumes less time to achieve the same code coverage rate. It is evident that the method put forward in the paper can significantly improve the security and robustness of software systems, offering a new strategy for the defense against unknown threats.

Key words: endogenous security, large language model, fuzz testing

中图分类号: