Netinfo Security ›› 2026, Vol. 26 ›› Issue (2): 274-290.doi: 10.3969/j.issn.1671-1122.2026.02.008
Previous Articles Next Articles
GU Zhaojun1, LI Li2, SUI He3(
)
Received:2025-09-11
Online:2026-02-10
Published:2026-02-23
CLC Number:
GU Zhaojun, LI Li, SUI He. A Payload Generation Method for SQL Injection Vulnerability Detection Based on Large Language Models[J]. Netinfo Security, 2026, 26(2): 274-290.
Add to citation manager EndNote|Ris|BibTeX
URL: http://netinfo-security.org/EN/10.3969/j.issn.1671-1122.2026.02.008
| 定义变量 | 含义 |
|---|---|
| D={d1, d2,…, dN} | SQL注入漏洞样本集合,其中,di表示第i个样本,N表示漏洞样本总数 |
| Ki={ki,1, ki,2,…, ki, M} | 样本di的关键词集合,该样本的关键词总数为M=|Ki| |
| F={f1, f2,…, fS} Fi={fi,1, fi,2,…, fi, S} | F为总体漏洞特征集合,Fi为样本di特征集合,其中,每个特征表示为键值对形式(fi, j,vi, j),fi, j∈Fi,特征 总数为S |
| 样本di中特征fi, j的共现次数 | |
| 样本di中特征fi, j的共现频率 | |
| 特征fj在所有漏洞样本中的总共现频率 | |
| 特征fj的初始权重 |
| 数据集 | 来源 | 描述 |
|---|---|---|
| 训练数据 | Exploit-Database | 收录大量真实攻击载荷与利用代码,提升模型对实际攻击行为的理解 |
| PacketStorm | 提供丰富的安全技术文档与利用示例,增强数据集的多样性与覆盖范围 | |
| CVE[ | 提供标准化漏洞标识与描述,确保数据的代表性和时效性 | |
| 测试数据 | SQL-Libs[ | 专注于SQL注入漏洞的资源库,提供多种攻击示例及修复方法 |
| DVWA[ | 提供常见Web攻击技术的合法环境,包括基础、中级和高级的SQL注入漏洞场景 | |
| Pikachu[ | 专注于练习Web漏洞的安全测试平台,漏洞类型广泛,包括但不限于SQL注入和XSS等 | |
| bWAPP[ | 支持超过100种不同漏洞场景的练习平台,适用于多种服务器端编程语言,包括SQL注入漏洞 | |
| Newsqliset | 收集CVE、CNVD和Freebuf等报告及个人复现的ORM、NoSQL和GraphQL等注入漏洞案例 |
| 来源 类型 | Exploit-Database/个 | PacketStorm /个 | CVE/个 | 总数/个 |
|---|---|---|---|---|
| Union-based | 1032 | 33 | 54 | 1119 |
| Time Blind | 126 | 147 | 88 | 361 |
| Error-based | 48 | 41 | 42 | 131 |
| Boolean Blind | 240 | 94 | 28 | 362 |
| Stacked Query | 49 | 6 | 7 | 62 |
| Wide Byte | 0 | 0 | 1 | 1 |
| Column Probing | 10 | 2 | 1 | 13 |
| File Write | 11 | 3 | 3 | 17 |
| Bypass | 173 | 51 | 13 | 237 |
| Multi Parameter | 5 | 0 | 0 | 5 |
| ORM | 0 | 2 | 1 | 3 |
| 来源 类型 | bWAPP /个 | DVWA /个 | Pikachu /个 | SQL-Libs /个 | Newsqliset /个 | 总数 /个 |
|---|---|---|---|---|---|---|
| Union-based | 14 | 7 | 11 | 10 | 0 | 42 |
| Time Blind | 1 | 6 | 0 | 7 | 0 | 14 |
| Error-based | 5 | 7 | 6 | 8 | 0 | 26 |
| Boolean Blind | 1 | 9 | 1 | 4 | 0 | 15 |
| Stacked Query | 0 | 0 | 0 | 5 | 0 | 5 |
| Wide Byte | 0 | 0 | 4 | 3 | 0 | 7 |
| Column Probing | 1 | 2 | 0 | 1 | 0 | 4 |
| File Write | 0 | 0 | 0 | 0 | 0 | 0 |
| Bypass | 2 | 0 | 0 | 2 | 0 | 4 |
| Multi Parameter | 0 | 0 | 0 | 3 | 0 | 3 |
| ORM | 0 | 0 | 0 | 0 | 7 | 7 |
| NoSQL | 0 | 0 | 0 | 0 | 14 | 14 |
| GraphQL | 0 | 0 | 0 | 0 | 9 | 9 |
| 测试数据集 | 模型 | Nsuccess | AC | FPR | FNR |
|---|---|---|---|---|---|
| SQL-Libs、DVWA、Pikachu、bWAPP | Qwen | 91 | 0.7583 | 0.0990 | 0.1727 |
| SqliGPT | 61 | 0.5083 | 0.1644 | 0.4352 | |
| GPT-2-web | 57 | 0.4750 | 0.1972 | 0.4623 | |
| SQLMap | 79 | 0.6583 | 0.0920 | 0.2946 | |
| SQL-Libs、DVWA、Pikachu、bWAPP、Newsqliset | Qwen | 104 | 0.6933 | 0.1034 | 0.2464 |
| SqliGPT | 72 | 0.4800 | 0.1627 | 0.4706 | |
| GPT-2-web | 67 | 0.4467 | 0.2024 | 0.4962 |
| 测试数据集 | 模型 | Nsuccess | AC | FPR | FNR |
|---|---|---|---|---|---|
| bWAPP | Qwen | 18 | 0.7500 | 0.1000 | 0.1818 |
| SqliGPT | 13 | 0.5417 | 0.1333 | 0.4091 | |
| GPT-2-web | 12 | 0.5000 | 0.2000 | 0.4286 | |
| SQLMap | 14 | 0.5833 | 0.1250 | 0.3636 | |
| DVWA | Qwen | 23 | 0.7419 | 0.1481 | 0.1481 |
| SqliGPT | 18 | 0.5806 | 0.1818 | 0.3333 | |
| GPT-2-web | 16 | 0.5161 | 0.1579 | 0.4286 | |
| SQLMap | 24 | 0.7742 | 0.1111 | 0.1429 | |
| Newsqliset | Qwen | 13 | 0.4333 | 0.1333 | 0.5357 |
| SqliGPT | 11 | 0.3667 | 0.1538 | 0.6071 | |
| GPT-2-web | 10 | 0.3333 | 0.2308 | 0.6296 | |
| SQLMap | 0 | 0 | 0 | 0 | |
| Pikachu | Qwen | 19 | 0.8636 | 0 | 0.1364 |
| SqliGPT | 5 | 0.2273 | 0 | 0.7727 | |
| GPT-2-web | 8 | 0.3636 | 0.1111 | 0.6190 | |
| SQLMap | 14 | 0.6364 | 0 | 0.3636 | |
| SQL-Libs | Qwen | 31 | 0.7209 | 0.1143 | 0.2051 |
| SqliGPT | 25 | 0.5814 | 0.1935 | 0.3243 | |
| GPT-2-web | 21 | 0.4884 | 0.2500 | 0.4167 | |
| SQLMap | 27 | 0.6279 | 0.1000 | 0.3250 |
| 漏洞类型 | 模型 | Nsuccess | AC | FPR | FNR |
|---|---|---|---|---|---|
| ORM | Qwen | 3 | 0.4286 | 0.2500 | 0.5000 |
| SqliGPT | 2 | 0.2857 | 0.5000 | 0.6000 | |
| GPT-2-web | 4 | 0.5714 | 0.2000 | 0.3333 | |
| NoSQL | Qwen | 5 | 0.3571 | 0.1667 | 0.6154 |
| SqliGPT | 3 | 0.2143 | 0 | 0.7857 | |
| GPT-2-web | 2 | 0.1429 | 0.3333 | 0.8462 | |
| GraphQL | Qwen | 5 | 0.5556 | 0 | 0.4444 |
| SqliGPT | 6 | 0.6667 | 0 | 0.3333 | |
| GPT-2-web | 4 | 0.4444 | 0.2000 | 0.5000 |
| [1] | JAIN S. 160 Cybersecurity Statistics[EB/OL]. (2025-01-09)[2025-05-24]. https://www.getastra.com/blog/security-audit/cyber-security-statistics/. |
| [2] | FreeBuf. 2023 Global Top 10 Security Vulnerabilities | FreeBuf Annual Review[EB/OL]. (2024-01-04)[2025-05-24]. https://www.freebuf.com/news/388742.html. |
| FreeBuf. 2023 全球年度安全漏洞TOP 10 | FreeBuf 年度盘点[EB/OL]. (2024-01-04)[2025-05-24]. https://www.freebuf.com/news/388742.html. | |
| [3] | HUANG Kaijie, WANG Jian, CHEN Jiongyi. A Large Language Model Based SQL Injection Attack Detection Method[J]. Netinfo Security, 2023, 23(11): 84-93. |
| 黄恺杰, 王剑, 陈炯峄. 一种基于大语言模型的SQL注入攻击检测方法[J]. 信息网络安全, 2023, 23(11):84-93. | |
| [4] | LU Dongzhe, FEI Jinlong, LIU Long. A Semantic Learning-Based SQL Injection Attack Detection Technology[EB/OL]. (2023-02-09)[2025-05-10]. https://doi.org/10.3390/electronics1206134. |
| [5] |
BOLOTNIKOV I V, BORODIN A E. Interprocedural Static Analysis for Finding Bugs in Go Programs[J]. Programming and Computer Software, 2021, 47(5): 344-352.
doi: 10.1134/S0361768821050030 |
| [6] | LIVSHITS V B, LAM M S. Finding Security Vulnerabilities in Java Applications with Static Analysis[C]// USENIX. The 14th Conference on USENIX Security Symposium. New York: USENIX, 2005: 18-29. |
| [7] |
LI Qi, LI Weishi, WANG Junfeng, et al. A SQL Injection Detection Method Based on Adaptive Deep Forest[J]. IEEE Access, 2019, 7: 145385-145394.
doi: 10.1109/ACCESS.2019.2944951 |
| [8] | RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving Language Understanding by Generative Pre-Training[EB/OL]. [2025-05-17]. https://api.semanticscholar.org/CorpusID:49313245. |
| [9] | TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: Open and Efficient Foundation Language Models[EB/OL]. (2023-02-27)[2025-05-17]. https://arxiv.org/abs/2302.13971. |
| [10] | GUI Zhiwen, WANG Enze, DENG Binbin, et al. SqliGPT: Evaluating and Utilizing Large Language Models for Automated SQL Injection Black-Box Detection[EB/OL]. (2024-08-07)[2025-05-17]. https://doi.org/10.3390/app1416692. |
| [11] | ĆIRKOVIĆ S, MLADENOVIĆ V, TOMIĆ S, et al. Utilizing Fine-Tuning of Large Language Models for Generating Synthetic Payloads: Enhancing Web Application Cybersecurity through Innovative Penetration Testing Techniques[J]. Computers, Materials & Continua, 2025, 82(3): 4409-4430. |
| [12] | WU Peize, LI Guanghui, WU Jinyu. Research on Automated Vulnerability Verification Code Generation Based on Large Language Models[EB/OL]. (2024-06-20)[2025-05-10]. https://kns.cnki.net/kcms2/article/abstract?v=MXvIvFkaDQz0Ed1hcQN9CL-gXr5KEIhM5964CkAGitVLQj534FnW1QowKkJ4WAgttjFFL0fZhSaGn07arFP_v3d_Buwl9snK_NfzS-YnI0oSzgnHjsO-O0TrWBMHKVS99os3LXwpBVAl_JCWrFc-_pT5Ybux81d8cT6Gw2I5naP9T-kI9v978mcS2fJKkXwY&uniplatform=NZKPT&language=CHS. |
| 吴佩泽, 李光辉, 吴津宇. 基于大语言模型的自动化漏洞验证代码生成方法研究[EB/OL]. (2024-06-20)[2025-05-10]. https://kns.cnki.net/kcms2/article/abstract?v=MXvIvFkaDQz0Ed1hcQN9CL-gXr5KEIhM5964CkAGitVLQj534FnW1QowKkJ4WAgttjFFL0fZhSaGn07arFP_v3d_Buwl9snK_NfzS-YnI0oSzgnHjsO-O0TrWBMHKVS99os3LXwpBVAl_JCWrFc-_pT5Ybux81d8cT6Gw2I5naP9T-kI9v978mcS2fJKkXwY&uniplatform=NZKPT&language=CHS. | |
| [13] | YANG Guang, ZHOU Yu, CHEN Xiang, et al. ExploitGen: Template-Augmented Exploit Code Generation Based on CodeBERT[EB/OL]. (2023-03-01)[2025-05-17]. https://doi.org/10.1016/j.jss.2022.11157. |
| [14] |
PENG Qi, CAI Yi, LIU Jiankun, et al. Integration of Multi-Source Medical Data for Medical Diagnosis Question Answering[J]. IEEE Transactions on Medical Imaging, 2025, 44(3): 1373-1385.
doi: 10.1109/TMI.2024.3496862 pmid: 40030182 |
| [15] | LIAO Xingming, CHEN Chong, WANG Zhuowei, et al. Large Language Model Assisted Fine-Grained Knowledge Graph Construction for Robotic Fault Diagnosis[EB/OL]. (2025-05-01)[2025-06-17]. https://doi.org/10.1016/j.aei.2025.10313. |
| [16] | WEI J, WANG Xuezhi, SCHUURMANS D, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models[C]// ACM. The 36th International Conference on Neural Information Processing Systems. New York: ACM, 2022: 24824-24837. |
| [17] | OU Jianjiu, ZHOU Jianlong, DONG Yifei, et al. Chain of Thought Prompting in Vision-Language Model for Vision Reasoning Tasks[C]// Springer. The 37th Australasian Joint Conference on Artificial Intelligence. Heidelberg: Springer, 2024: 298-311. |
| [18] | YAO Chengyuan, FUJITA S. Adaptive Control of Retrieval-Augmented Generation for Large Language Models through Reflective Tags[EB/OL]. (2024-11-25)[2025-07-15]. https://doi.org/10.3390/electronics1323464. |
| [19] | OffSEC. Exploit Database[EB/OL]. (2010-11-01)[2025-09-07]. https://www.exploit-db.com/. |
| [20] | Private Internet Access. PacketStorm Security Archive[EB/OL]. [2025-07-15]. https://packetstormsecurity.com/. |
| [21] | OpenAI. Completion-OpenAI API[EB/OL]. [2025-12-05]. https://beta.openai.com/docs/guides/completion/prompt-design. |
| [22] | DeepSeek. DeepSeek-V3[EB/OL]. [2025-03-07]. https://huggingface.co/deepseek-ai/DeepSeek-V3. |
| [23] | HU E J, SHEN Yelong, WALLIS P, et al. LoRA: Low-Rank Adaptation of Large Language Models[EB/OL]. (2021-06-17)[2025-04-17]. https://doi.org/10.48550/arXiv.2106.0968. |
| [24] | STAMPAR M, DAMELE A G B. SQLMap: Automatic SQL Injection and Database Takeover Tool[EB/OL]. [2025-03-26]. https://github.com/sqlmapproject/sqlmap. |
| [25] | MITRE. Common Vulnerabilities and Exposures(CVE)[EB/OL]. (2000-01-01)[2025-07-15]. https://cve.mitre.org/. |
| [26] | Audi-1. SQLI-Labs[EB/OL]. (2014-04-01)[2025-04-17]. https://github.com/Audi-1/sqli-labs. |
| [27] | DEWHURST R. Damn Vulnerable Web Application(DVWA)[EB/OL]. (2023-05-21)[2025-04-17]. https://github.com/digininja/DVWA. |
| [28] | HUN Lu. Pikachu[EB/OL]. [2025-04-18]. https://github.com/zhuifengshaonianhanlu/pikachu. |
| [29] | MALIK B. BWAPP[EB/OL]. (2013-01-08)[2025-04-18]. https://sourceforge.net/projects/bwapp/. |
| [30] | SHITOU CLOUD. AutoDL: High-Performance Cloud Computing Platform for Deep Learning[EB/OL]. [2025-05-07]. https://www.autodl.com. |
| 视拓云. AutoDL:高性能深度学习算力云平台[EB/OL]. [2025-05-07]. https://www.autodl.com. | |
| [31] | ZHENG Yaowei, ZHANG Richong, ZHANG Junhao, et al. LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models[C]// Association for Computational Linguistics. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Pennsylvania: Association for Computational Linguistics, 2024: 400-410. |
| [32] | PAPINENI K, ROUKOS S, WARD T, et al. Bleu: A Method for Automatic Evaluation of Machine Translation[C]// ACL. The 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia: ACL, 2002: 311-318. |
| [33] |
BERGER B, WATERMAN M S, YU Y W. Levenshtein Distance, Sequence Comparison and Biological Database Search[J]. IEEE Transactions on Information Theory, 2021, 67(6): 3287-3294.
doi: 10.1109/tit.2020.2996543 pmid: 34257466 |
| [1] | TONG Xin, JIAO Qiang, WANG Jingya, YUAN Deyu, JIN Bo. A Survey on the Trustworthiness of Large Language Models in the Public Security Domain: Risks, Countermeasures, and Challenges [J]. Netinfo Security, 2026, 26(1): 24-37. |
| [2] | HU Yucui, GAO Haotian, ZHANG Jie, YU Hang, YANG Bin, FAN Xuejian. Automated Exploitation of Vulnerabilities in Vehicle Network Security [J]. Netinfo Security, 2025, 25(9): 1348-1356. |
| [3] | LIU Hui, ZHU Zhengdao, WANG Songhe, WU Yongcheng, HUANG Linquan. Jailbreak Detection for Large Language Model Based on Deep Semantic Mining [J]. Netinfo Security, 2025, 25(9): 1377-1384. |
| [4] | WANG Lei, CHEN Jiongyi, WANG Jian, FENG Yuan. Intelligent Reverse Analysis Method of Firmware Program Interaction Relationships Based on Taint Analysis and Textual Semantics [J]. Netinfo Security, 2025, 25(9): 1385-1396. |
| [5] | ZHANG Yanyi, RUAN Shuhua, ZHENG Tao. Research on REST API Design Security Testing [J]. Netinfo Security, 2025, 25(8): 1313-1325. |
| [6] | CHEN Ping, LUO Mingyu. Research on Large Model Analysis Methods for Kernel Race Vulnerabilities in Cloud-Edge-Device Scenarios [J]. Netinfo Security, 2025, 25(7): 1007-1020. |
| [7] | FENG Wei, XIAO Wenming, TIAN Zheng, LIANG Zhongjun, JIANG Bin. Research on Semantic Intelligent Recognition Algorithms for Meteorological Data Based on Large Language Models [J]. Netinfo Security, 2025, 25(7): 1163-1171. |
| [8] | ZHANG Xuewang, LU Hui, XIE Haofei. A Data Augmentation Method Based on Graph Node Centrality and Large Model for Vulnerability Detection [J]. Netinfo Security, 2025, 25(4): 550-563. |
| [9] | XIE Mengfei, FU Jianming, YAO Renyi. Research on LLM-Based Fuzzing of Native Multimedia Libraries [J]. Netinfo Security, 2025, 25(3): 403-414. |
| [10] | QIN Zhongyuan, WANG Tiantian, LIU Weiqiang, ZHANG Qunfang. Advances in Watermarking Techniques for Large Language Models [J]. Netinfo Security, 2025, 25(2): 177-193. |
| [11] | YANG Liqun, LI Zhen, WEI Chaoren, YAN Zhimin, QIU Yongxin. Research on Protocol Fuzzing Technology Guided by Large Language Models [J]. Netinfo Security, 2025, 25(12): 1847-1862. |
| [12] | MENG Hui, MAO Linlin, PENG Juzhi. Sanitize Processing and Recognition Method Driven by Large Language Model [J]. Netinfo Security, 2025, 25(12): 1990-1998. |
| [13] | HU Longhui, SONG Hong, WANG Weiping, YI Jia, ZHANG Zhixiong. Research on the Application of Large Language Model in False Positive Handling for Managed Security Services [J]. Netinfo Security, 2025, 25(10): 1570-1578. |
| [14] | CHEN Haoran, LIU Yu, CHEN Ping. Endogenous Security Heterogeneous Entity Generation Method Based on Large Language Model [J]. Netinfo Security, 2024, 24(8): 1231-1240. |
| [15] | XIANG Hui, XUE Yunhao, HAO Lingxin. Large Language Model-Generated Text Detection Based on Linguistic Feature Ensemble Learning [J]. Netinfo Security, 2024, 24(7): 1098-1109. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||