Netinfo Security ›› 2026, Vol. 26 ›› Issue (2): 274-290.doi: 10.3969/j.issn.1671-1122.2026.02.008

Previous Articles     Next Articles

A Payload Generation Method for SQL Injection Vulnerability Detection Based on Large Language Models

GU Zhaojun1, LI Li2, SUI He3()   

  1. 1. Information Security Evaluation Center, Civil Aviation University of China, Tianjin 300300, China
    2. College of Safety Science and Engineering, Civil Aviation University of China, Tianjin 300300, China
    3. College of Aeronautical Engineering, Civil Aviation University of China, Tianjin 300300, China
  • Received:2025-09-11 Online:2026-02-10 Published:2026-02-23

Abstract:

Existing SQL injection vulnerability detection methods suffer from insufficient robustness and a lack of targeted test cases. To address these limitations, this paper proposed a large language model(LLM)-based approach for generating targeted detection payloads to effectively identify SQL injection vulnerabilities. Specifically, by integrating prompt engineering with the DeepSeek-V3 model, the method automatically extracted heterogeneous vulnerability features and constructed them into a unified semantic representation. A contribution-based feature selection mechanism was then employed to identify the most influential features, which serve as the core input to the model. Furthermore, key features were structured into a chain-of-thought format to enable effective fusion of multi-dimensional vulnerability representations. Domain-adaptive supervised fine-tuning was performed on the Qwen model using low-rank adaptation.Extensive experiments was conducted on multiple public vulnerability benchmarks to evaluate both the detection performance and payload generation quality of the proposed method against SqliGPT, GPT-2-web, and SQLMap. Additionally, we conducted an in-depth analysis of DeepSeek-V3’s capability in extracting meaningful features from complex SQL injection vulnerability data. Experimental results show that the Qwen model achieves an average detection accuracy of over 75%, representing improvements of 49.18%, 59.64%, and 15.19% over SqliGPT, GPT-2-web, and SQLMap, respectively. Moreover, the quality of its generated payloads is significantly superior to that of existing models, demonstrating the effectiveness and superiority of the proposed approach—leveraging large language models to generate detection payloads for SQL injection vulnerability identification.

Key words: large language model, SQL injection vulnerability, code generation, detection payload

CLC Number: