信息网络安全 ›› 2024, Vol. 24 ›› Issue (6): 917-925.doi: 10.3969/j.issn.1671-1122.2024.06.009

• 密码专题 • 上一篇    下一篇

针对大语言模型生成的密码应用代码安全性分析

郭祥鑫1, 林璟锵1(), 贾世杰2, 李光正1   

  1. 1.中国科学技术大学网络空间安全学院,合肥 230027
    2.中国科学院信息工程研究所,北京 100085
  • 收稿日期:2024-04-11 出版日期:2024-06-10 发布日期:2024-07-05
  • 通讯作者: 林璟锵 linjq@ustc.edu.cn
  • 作者简介:郭祥鑫(2000—),男,河南,硕士研究生,主要研究方向为密码应用安全|林璟锵(1978—),男,福建,教授,博士,CCF会员,主要研究方向为密码误用检测、电子认证和密钥安全等|贾世杰(1989—),男,山东,副研究员,博士,主要研究方向为密码应用安全|李光正(2001—),男,黑龙江,硕士研究生,主要研究方向为密码应用安全
  • 基金资助:
    国家自然科学基金(62272457);国家重点研发计划(2020YFB1005803)

Security Analysis of Cryptographic Application Code Generated by Large Language Model

GUO Xiangxin1, LIN Jingqiang1(), JIA Shijie2, LI Guangzheng1   

  1. 1. School of Cyber Security, University of Science and Technology of China, Hefei 230027, China
    2. Institute of Information Engineering Chinese Academy of Sciences, Beijing 100085, China
  • Received:2024-04-11 Online:2024-06-10 Published:2024-07-05

摘要:

随着大语言模型在软件开发领域的广泛应用,在提升开发效率的同时也引入了新的安全风险,特别是在对安全性要求较高的密码学应用领域。文章针对大语言模型提出了一个密码应用安全评估的开源提示词库LLMCryptoSE,该词库包含460个密码场景自然语言描述提示词。同时,通过对大语言模型生成的代码片段进行深入分析,着重评估了密码API使用不当的情况,采用静态分析工具CryptoGuard结合人工的方法进行审查。在评估ChatGPT3.5、文心3.5和星火3.5等主流大语言模型时,文章对生成的1380个代码片段进行了密码误用检测,发现52.90%的代码片段至少存在一处密码误用,其中星火3.5大模型表现较佳,误用率为48.48%。文章不仅揭示了当前大语言模型在密码应用代码安全性方面所面临的挑战,还为模型的使用者和开发者提出了一系列增强安全性的建议,旨在为大语言模型在密码领域的推广应用提供实践指导。

关键词: 大语言模型, 密码应用安全提示词, 密码误用检测

Abstract:

With the extensive application of large language model(LLM) in software development, the role in enhancing development efficiency has also introduced new security risks, particularly in the field of cryptography applications that demand high security. This paper proposed an open-source prompt dataset named LLMCryptoSE, containing 460 natural language description prompts of cryptographic scenarios. It aimed to assess the security of code generated by LLM for cryptographic applications. At the same time, through an in-depth analysis of code snippets generated by LLM, this paper primarily evaluated the misuse of cryptographic API, employing the methodology that combined the static analysis tool CryptoGuard with manual review to conduct a detailed evlatuation of 1380 code snippets. The assessment of three mainstream LLM, including ChatGPT 3.5, ERNIE 3.5, and Spark 3.5, revealed that 52.90% of the code snippets contained at least one instance of cryptographic misuse, with Spark 3.5 showing a relatively better performance with a misuse rate of 48.48%. Based on these findings, the study not only reveals the current challenges in cryptographic application security faced by LLM, but also offers a series of recommendations for LLM users and developers to enhance security. These are aims at providing practical guidance for improving the application of LLM in cryptographic fields.

Key words: large language model, cryptographic application security prompts, cryptographic misuse detection

中图分类号: