Netinfo Security ›› 2025, Vol. 25 ›› Issue (12): 1990-1998.doi: 10.3969/j.issn.1671-1122.2025.12.013

Previous Articles     Next Articles

Sanitize Processing and Recognition Method Driven by Large Language Model

MENG Hui1(), MAO Linlin2, PENG Juzhi2   

  1. 1. Criminal Investigation Police University of China, Shenyang 110854, China
    2. China Southern Airlines Digital Technology (Guangdong) Co., Ltd., Guangzhou 510080, China
  • Received:2025-11-20 Online:2025-12-10 Published:2026-01-06
  • Contact: MENG Hui E-mail:1441209123@qq.com

Abstract:

Static taint analysis plays a crucial role in automatically discovering data-flow related security vulnerabilities, but traditional rule-based or symbol-based approaches often suffer from high false positive and false negative rates in real-world engineering settings due to custom sanitizer functions, context-dependent validation/escaping logic, and dynamic code features. To address this problem, this paper proposed a sanitize processing and recognition method driven by large language model: code and its calling context were mapped into model-understandable descriptions via a semantic transformation operator; structured prompts guided the large language model to output determinations along with evidence-based explanations; and confidence thresholds, caching, and selective symbolic-execution fallback were combined to improve reliability and engineering practicality. Evaluation on three public Java Web benchmark datasets shows that the proposed method significantly outperforms rule-based matching method and AST stain analysis method in sanitize processing and recognition, achieving at least 89.4% identification accuracy across different vulnerability scenarios.

Key words: static taint analysis, sanitize processing and recognition, large language model

CLC Number: