信息网络安全 ›› 2025, Vol. 25 ›› Issue (12): 1990-1998.doi: 10.3969/j.issn.1671-1122.2025.12.013

• 技术研究 • 上一篇    下一篇

大语言模型驱动的无害化处理识别方法

孟辉1(), 毛琳琳2, 彭聚智2   

  1. 1.中国刑事警察学院,沈阳 110854
    2.南航数智科技(广东)有限公司,广州 510080
  • 收稿日期:2025-11-20 出版日期:2025-12-10 发布日期:2026-01-06
  • 通讯作者: 孟辉 E-mail:1441209123@qq.com
  • 作者简介:孟辉(1981—),男,山东,讲师,硕士,主要研究方向为视频侦查、网络与系统安全|毛琳琳(1986—),女,山东,高级工程师,硕士,主要研究方向为国资数智监管、安全管理|彭聚智(1982—),男,河南,高级工程师,本科,主要研究方向为信息安全管理、产品运营管理
  • 基金资助:
    辽宁省自然科学基金(2025-MS-101)

Sanitize Processing and Recognition Method Driven by Large Language Model

MENG Hui1(), MAO Linlin2, PENG Juzhi2   

  1. 1. Criminal Investigation Police University of China, Shenyang 110854, China
    2. China Southern Airlines Digital Technology (Guangdong) Co., Ltd., Guangzhou 510080, China
  • Received:2025-11-20 Online:2025-12-10 Published:2026-01-06
  • Contact: MENG Hui E-mail:1441209123@qq.com

摘要:

静态污点分析在自动发现数据流相关安全漏洞中扮演重要角色,但传统基于规则或符号的方法在工程化场景下常因自定义处理函数、上下文相关的验证/转义逻辑以及动态代码特性而产生高误报或漏报。针对这一痛点,文章提出一种大语言模型驱动的无害化处理识别方法,将代码及其调用上下文通过语义化转换算子映射为模型可理解的描述,采用结构化提示引导大语言模型给出判定并输出可证据化的解释,同时结合置信度阈值、缓存策略与选择性符号执行的回退验证,以提升判定的可靠性与工程可用性。在3个公开Java Web基准数据集上的评估结果表明,文章所提方法在无害化处理识别方面显著优于规则匹配法和AST污点分析法,同时针对不同漏洞场景,识别准确率可达89.4%以上。

关键词: 静态污点分析, 无害化处理识别, 大语言模型

Abstract:

Static taint analysis plays a crucial role in automatically discovering data-flow related security vulnerabilities, but traditional rule-based or symbol-based approaches often suffer from high false positive and false negative rates in real-world engineering settings due to custom sanitizer functions, context-dependent validation/escaping logic, and dynamic code features. To address this problem, this paper proposed a sanitize processing and recognition method driven by large language model: code and its calling context were mapped into model-understandable descriptions via a semantic transformation operator; structured prompts guided the large language model to output determinations along with evidence-based explanations; and confidence thresholds, caching, and selective symbolic-execution fallback were combined to improve reliability and engineering practicality. Evaluation on three public Java Web benchmark datasets shows that the proposed method significantly outperforms rule-based matching method and AST stain analysis method in sanitize processing and recognition, achieving at least 89.4% identification accuracy across different vulnerability scenarios.

Key words: static taint analysis, sanitize processing and recognition, large language model

中图分类号: