信息网络安全 ›› 2026, Vol. 26 ›› Issue (3): 341-354.doi: 10.3969/j.issn.1671-1122.2026.03.001

• 入选论文 • 上一篇    下一篇

大语言模型提示词注入攻击与防御综述

袁明1,2(), 邹其霖3, 袁文骐4, 王群1   

  1. 1.江苏警官学院计算机信息与网络安全系,南京 210031
    2.南京邮电大学计算机学院,南京 210023
    3.射阳县公安局,盐城 224300
    4.盐城市公安局大丰分局,盐城 224199
  • 收稿日期:2025-08-11 出版日期:2026-03-10 发布日期:2026-03-30
  • 通讯作者: 袁明 E-mail:yuanming_cn@163.com
  • 作者简介:袁明(1989—),男,江苏,讲师,博士研究生,主要研究方向为自然语言处理|邹其霖(1993—),男,江苏,本科,主要研究方向为网络空间安全|袁文骐(1999—),男,江苏,本科,主要研究方向为安全防范工程|王群(1971—),男,甘肃,教授,博士,CCF杰出会员,主要研究方向为网络空间安全
  • 基金资助:
    江苏省教育科学“十四五”规划课题(C-c/2021/01/11)

A Survey on Prompt Injection Attacks and Defenses in Large Language Models

YUAN Ming1,2(), ZOU Qilin3, YUAN Wenqi4, WANG Qun1   

  1. 1. Department of Computer Information and Cyber Security, Jiangsu Police Institute, Nanjing 210031, China
    2. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
    3. Sheyang County Public Security Bureau, Yancheng 224300, China
    4. Dafeng Branch of Yancheng Public Security Bureau, Yancheng 224199, China
  • Received:2025-08-11 Online:2026-03-10 Published:2026-03-30

摘要:

随着大语言模型及其驱动的AI Agent在多个领域被广泛应用,大语言模型安全问题日益突出。提示词注入攻击作为一种新兴的安全威胁,给大语言模型带来巨大安全隐患,它利用大语言模型无法区分用户指令与注入指令的缺陷,诱导模型偏离目标任务,执行攻击者任务,造成数据泄露、系统入侵等问题。文章系统梳理了提示词注入攻击的研究现状,包括早期注入攻击和基于角色注入攻击、载荷拆分注入攻击、基于混淆注入攻击以及基于优化注入攻击等。在防御方面,根据防御手段将现有方法归纳为基于检测的防御和基于预防的防御。

关键词: 大语言模型, 提示词注入攻击, AI智能体, AI安全

Abstract:

With the widespread application of Large Language Models and their powered AI Agents in various domains, the security of LLMs has become increasingly prominent. As an emerging security threat, prompt injection attacks pose huge security risks to large language models. They exploit the weakness that large language models cannot distinguish user instructions from injected instructions, thereby inducing the model to deviate from the intended task and execute the attacker’s commands, leading to issues such as data leakage and system intrusion. This paper systematically reviewed the current research status of prompt injection attacks, covering attack types such as early direct injection, role-based injection, payload splitting, obfuscation injection, and optimization-based injection. In terms of defenses, this paper classified existing methods into detection-based defenses and prevention-based defenses according to defense mechanisms.

Key words: large language models, prompt injection attacks, AI agent, AI security

中图分类号: