信息网络安全 ›› 2026, Vol. 26 ›› Issue (2): 274-290.doi: 10.3969/j.issn.1671-1122.2026.02.008

• 学术研究 • 上一篇    下一篇

基于大语言模型的SQL注入漏洞检测载荷生成方法

顾兆军1, 李丽2, 隋翯3()   

  1. 1.中国民航大学信息安全测评中心天津 300300
    2.中国民航大学安全科学与工程学院天津 300300
    3.中国民航大学航空工程学院天津 300300
  • 收稿日期:2025-09-11 出版日期:2026-02-10 发布日期:2026-02-23
  • 通讯作者: 隋翯 hsui@cauc.edu.cn
  • 作者简介:顾兆军(1966—),男,山东,教授,博士,主要研究方向为网络与信息安全、民航信息系统|李丽(1998—),女,河南,硕士研究生,主要研究方向为网络空间安全|隋翯(1987—),男,吉林,讲师,博士,主要研究方向为工业控制系统网络与信息安全
  • 基金资助:
    国家自然科学基金(U2333201)

A Payload Generation Method for SQL Injection Vulnerability Detection Based on Large Language Models

GU Zhaojun1, LI Li2, SUI He3()   

  1. 1. Information Security Evaluation Center, Civil Aviation University of China, Tianjin 300300, China
    2. College of Safety Science and Engineering, Civil Aviation University of China, Tianjin 300300, China
    3. College of Aeronautical Engineering, Civil Aviation University of China, Tianjin 300300, China
  • Received:2025-09-11 Online:2026-02-10 Published:2026-02-23

摘要:

针对现有SQL注入漏洞检测方法存在鲁棒性不足以及测试用例缺乏针对性等问题,文章提出一种基于大语言模型的SQL注入漏洞检测载荷生成方法。该方法通过生成针对性的检测载荷实现SQL注入漏洞检测,借助提示工程与DeepSeek-V3模型自动提取和统一构建漏洞特征;利用贡献度对漏洞特征进行分析和选择,构建模型的核心输入;通过将关键特征组织成思维链的形式促进多维度漏洞表征融合,并采用低秩适配技术对Qwen模型进行领域自适应监督微调。实验在多个公开漏洞靶场中验证Qwen模型与SqliGPT、GPT-2-web和SQLMap等模型的性能差异和生成质量,并深入分析DeepSeek-V3模型在复杂SQL注入漏洞数据中的特征提取能力。实验结果表明,Qwen模型的平均检测准确率达到75%以上,比SqliGPT、GPT-2-web和SQLMap模型分别提升49.18%、59.64%和15.19%,且载荷生成质量显著优于现有模型,证明了基于大语言模型生成检测载荷,实现SQL注入漏洞检测方法的有效性与优越性。

关键词: 大语言模型, SQL注入漏洞, 代码生成, 检测载荷

Abstract:

Existing SQL injection vulnerability detection methods suffer from insufficient robustness and a lack of targeted test cases. To address these limitations, this paper proposed a large language model(LLM)-based approach for generating targeted detection payloads to effectively identify SQL injection vulnerabilities. Specifically, by integrating prompt engineering with the DeepSeek-V3 model, the method automatically extracted heterogeneous vulnerability features and constructed them into a unified semantic representation. A contribution-based feature selection mechanism was then employed to identify the most influential features, which serve as the core input to the model. Furthermore, key features were structured into a chain-of-thought format to enable effective fusion of multi-dimensional vulnerability representations. Domain-adaptive supervised fine-tuning was performed on the Qwen model using low-rank adaptation.Extensive experiments was conducted on multiple public vulnerability benchmarks to evaluate both the detection performance and payload generation quality of the proposed method against SqliGPT, GPT-2-web, and SQLMap. Additionally, we conducted an in-depth analysis of DeepSeek-V3’s capability in extracting meaningful features from complex SQL injection vulnerability data. Experimental results show that the Qwen model achieves an average detection accuracy of over 75%, representing improvements of 49.18%, 59.64%, and 15.19% over SqliGPT, GPT-2-web, and SQLMap, respectively. Moreover, the quality of its generated payloads is significantly superior to that of existing models, demonstrating the effectiveness and superiority of the proposed approach—leveraging large language models to generate detection payloads for SQL injection vulnerability identification.

Key words: large language model, SQL injection vulnerability, code generation, detection payload

中图分类号: