针对大语言模型生成的密码应用代码安全性分析

doi:10.3969/j.issn.1671-1122.2024.06.009

摘要/Abstract

摘要：

随着大语言模型在软件开发领域的广泛应用，在提升开发效率的同时也引入了新的安全风险，特别是在对安全性要求较高的密码学应用领域。文章针对大语言模型提出了一个密码应用安全评估的开源提示词库LLMCryptoSE，该词库包含460个密码场景自然语言描述提示词。同时，通过对大语言模型生成的代码片段进行深入分析，着重评估了密码API使用不当的情况，采用静态分析工具CryptoGuard结合人工的方法进行审查。在评估ChatGPT3.5、文心3.5和星火3.5等主流大语言模型时，文章对生成的1380个代码片段进行了密码误用检测，发现52.90%的代码片段至少存在一处密码误用，其中星火3.5大模型表现较佳，误用率为48.48%。文章不仅揭示了当前大语言模型在密码应用代码安全性方面所面临的挑战，还为模型的使用者和开发者提出了一系列增强安全性的建议，旨在为大语言模型在密码领域的推广应用提供实践指导。

关键词: 大语言模型, 密码应用安全提示词, 密码误用检测

Abstract:

With the extensive application of large language model(LLM) in software development, the role in enhancing development efficiency has also introduced new security risks, particularly in the field of cryptography applications that demand high security. This paper proposed an open-source prompt dataset named LLMCryptoSE, containing 460 natural language description prompts of cryptographic scenarios. It aimed to assess the security of code generated by LLM for cryptographic applications. At the same time, through an in-depth analysis of code snippets generated by LLM, this paper primarily evaluated the misuse of cryptographic API, employing the methodology that combined the static analysis tool CryptoGuard with manual review to conduct a detailed evlatuation of 1380 code snippets. The assessment of three mainstream LLM, including ChatGPT 3.5, ERNIE 3.5, and Spark 3.5, revealed that 52.90% of the code snippets contained at least one instance of cryptographic misuse, with Spark 3.5 showing a relatively better performance with a misuse rate of 48.48%. Based on these findings, the study not only reveals the current challenges in cryptographic application security faced by LLM, but also offers a series of recommendations for LLM users and developers to enhance security. These are aims at providing practical guidance for improving the application of LLM in cryptographic fields.

Key words: large language model, cryptographic application security prompts, cryptographic misuse detection

中图分类号:

TP309

郭祥鑫, 林璟锵, 贾世杰, 李光正. 针对大语言模型生成的密码应用代码安全性分析[J]. 信息网络安全, 2024, 24(6): 917-925.

GUO Xiangxin, LIN Jingqiang, JIA Shijie, LI Guangzheng. Security Analysis of Cryptographic Application Code Generated by Large Language Model[J]. Netinfo Security, 2024, 24(6): 917-925.

图/表 8

图1

表1

表2

表3

表4

表5

表6

表7

参考文献 20

[1]	ZHAO W, ZHOU Kun, LI Junyi, et al. A Survey of Large Language Models[EB/OL]. (2023-11-24)[2024-04-05]. https://arxiv.org/abs/2303.18223.
[2]	HAZHIRPASAND M, GHAFARI M, NIERSTRASZ O. Java Cryptography Uses in the Wild[C]// ACM. The 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). New York: ACM, 2020: 1-6.
[3]	ACAR Y, BACKES M, FAHL S, et al. You Get Where You’re Looking for: The Impact of Information Sources on Code Security[C]// IEEE. 2016 IEEE Symposium on Security and Privacy (SP). New York: IEEE, 2016: 289-305.
[4]	KHOURY R, AVILA A R, BRUNELLE J, et al. How Secure is Code Generated by ChatGPT?[C]// IEEE. 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC). New York: IEEE, 2023: 2445-2451.
[5]	PEARCE H, AHMAD B, TAN B, et al. Asleep at the Keyboard? Assessing the Security of Github Copilot’S Code Contributions[C]// IEEE. 2022 IEEE Symposium on Security and Privacy (SP). New York: IEEE, 2022: 754-768.
[6]	FU Yujia, LIANG Peng, TAHIR A, et al. Security Weaknesses of Copilot Generated Code in GitHub[EB/OL]. (2024-04-04)[2024-04-05]. https://arxiv.org/abs/2310.02059.
[7]	ELGEDAWY R, SADIK J, DUTTA S, et al. Ocassionally Secure: A Comparative Analysis of Code Generation Assistants[EB/OL]. (2024-02-01)[2024-04-05]. https://arxiv.org/abs/2402.00689.
[8]	TONY C, MUTAS M, FERREYRA N E D, et al. Llmseceval: A Dataset of Natural Language Prompts for Security Evaluations[C]// IEEE. 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). New York: IEEE, 2023: 588-592.
[9]	RAHAMAN S, XIAO Ya, AFROSE S, et al. Cryptoguard: High Precision Detection of Cryptographic Vulnerabilities in Massive-Sized Java Projects[C]// ACM. The 2019 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2019: 2455-2472.
[10]	LI Wenqing, JIA Shijie, LIU Limin, et al. Cryptogo: Automatic Detection of Go Cryptographic API Misuses[C]// ACM. The 38th Annual Computer Security Applications Conference. New York: ACM, 2022: 318-331.
[11]	XU Bowen, JIA Shijie, LIN Jingqiang, et al. JWTKey: Automatic Cryptographic Vulnerability Detection in JWT Applications[C]// Springer. European Symposium on Research in Computer Security. Heidelberg: Springer, 2023: 263-282.
[12]	CHEN Yikang, LIU Yibo, WU K L, et al. Towards Precise Reporting of Cryptographic Misuses[EB/OL]. (2024-03-01)[2024-04-05]. https://www.ndss-symposium.org/wp-content/uploads/2024-1032-paper.pdf.
[13]	ZHOU Yongchao, MURESANU A I, HAN Ziwen, et al. Large Language Models are Human-Level Prompt Engineers[EB/OL]. (2023-03-10)[2024-04-05]. https://arxiv.org/abs/2211.01910.
[14]	EKIN S. Prompt Engineering for ChatGPT: A Quick Guide to Techniques[EB/OL]. (2023-05-04)[2024-04-05]. https://www.techrxiv.org/doi/full/10.36227/techrxiv.22683919.v2.
[15]	ELAINE B. NIST Special Publication 800-57 Part 1 Revision 5, Recommendation for Key Management, Part 1-General[EB/OL]. (2020-05-04)[2024-04-05]. https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-57pt1r5.pdf.
[16]	KLEIN A. Attacks on the RC4 Stream Cipher[J]. Designs, Codes and Cryptography, 2008, 48: 269-286.
[17]	BLEICHENBACHER D. Chosen Ciphertext Attacks against Protocols Based on the RSA Encryption Standard PKCS# 1[C]// Springer. The 18th Annual International Cryptology Conference Santa Barbara (CRYPTO’98). Heidelberg: Springer, 1998: 1-12.
[18]	KALISKI B, RUSCH A. RFC 8018: PKCS# 5: Password-Based Cryptography Specification Version 2.1[EB/OL]. (2017-01-01)[2024-04-05]. https://dl.acm.org/doi/10.17487/RFC8018.
[19]	SVETLIN N. Practical Cryptography for Developers[EB/OL]. (2018-11-01)[2024-04-05]. https://cryptobook.nakov.com.
[20]	VALLÉE-RAI R, CO P, GAGNON E, et al. Soot: A Java Bytecode Optimization Framework[C]// ACM. The 2010 Conference of the Centre for Advanced Studies on Collaborative Research. New York: ACM, 2010: 214-224.

编辑推荐 0

Metrics

阅读次数

全文

297

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	60	0	0	237

来源	本网站	其他网站

次数	297	0
比例	100%	0%

摘要

508

最新录用	在线预览	正式出版

0	0	508

	来源	本网站

	次数	508
	比例	100%

模板编号	提示词模板
模板1	你是一个Java语言代码片段生成器，生成安全的代码，生成一个名为{类名称}的Java源文件。任务需求：{密码任务}
模板2	你将扮演Java语言的代码生成器角色，负责制作一个名为{类名称}的类文件。它的主要职责是{密码任务}。请注意，代码安全性是此任务的首要考虑
模板3	你是一个能够理解Java的智能代码构造器。现在，请构建一个Java类，名为{类名称}，它的功能是{密码任务}。关键在于保证代码的安全性
模板4	作为一个Java代码片段的生成专家，你的目标是开发一个名为{类名称}的类。该类必须能够实现{密码任务}，保障代码的安全性是执行此任务的重点

场景代码	密码应用场景名称	潜在的密码误用
1	对称算法加解密场景	硬编码密钥，不安全的工作模式，不安全的密码算法，不安全的初始向量
2	非对称算法加解密场景	不安全的填充模式，不安全的密钥长度，硬编码密钥
3	密钥派生场景	不安全的盐值使用，不安全的迭代次数设置
4	签名/验签场景	不安全的填充模式，不安全的杂凑密码算法，硬编码密钥，不安全的密钥长度
5	杂凑密码算法场景	不安全的杂凑密码算法
6	口令存储场景	不安全的杂凑密码算法，使用不加盐值的杂凑密码算法存储口令
7	随机数使用场景	不安全的随机数发生器使用
8	多用户下的对称密钥分发场景	硬编码密钥，密钥重用
9	多用户下的对称加密场景	硬编码密钥，密钥重用，硬编码初始向量，初始向量重用

突变编号	突变场景
1	中英文场景
2	错误密码常识场景
3	无显式安全提示场景
4	密码安全最佳实践场景

模型名称	编译通过率
模型名称	情景1（无人工干预）	情景2（导入必要库）	情景3（人工修改）
ChatGPT 3.5	88.70%	96.30%	99.57%
星火3.5	93.91%	97.61%	99.78%
文心3.5	71.09%	89.13%	96.52%
平均	84.57%	94.35%	98.62%

编号	误用密码规则	大语言模型的误用数量/个			误用总数量/个
编号	误用密码规则	ChatGPT3.5	星火3.5	文心3.5	误用总数量/个
1	不安全加解密函数使用	53	52	40	145
2	不安全杂凑密码算法使用	1	9	0	10
3	硬编码密钥	156	59	111	326
4	弱密钥问题	14	10	5	29
5	密钥派生函数中硬编码盐值	89	8	3	100
6	硬编码初始向量	210	43	63	316
7	不安全伪随机数发生器	10	17	13	40
8	硬编码口令来存储私钥	0	0	3	3
9	多用户场景下的密钥重用	16	17	14	47
10	多用户场景下的初始向量重用	90	99	91	280