Netinfo Security ›› 2025, Vol. 25 ›› Issue (4): 619-629.doi: 10.3969/j.issn.1671-1122.2025.04.010

Previous Articles     Next Articles

Research on Hidden Backdoor Prompt Attack Methods Based on False Demonstrations

GU Huanhuan1,2(), LI Qianmu1, LIU Zhen3, WANG Fangyuan1, JIANG Yu4   

  1. 1. School of Cyberspace Security, Nanjing University of Science and Technology, Nanjing 210094, China
    2. Nanjing Sinovatio Technology Co., Ltd., Nanjing 211153, China
    3. Guodian Nanjing Automation Co., Ltd., Nanjing 211106, China
    4. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Received:2025-01-21 Online:2025-04-10 Published:2025-04-25

Abstract:

: This paper proposeed an HDPAttack, a hidden backdoor prompt attack method based on fake demonstrations. This method used the overall semantics of natural language prompts as a trigger. By inserting carefully crafted fake demonstrations into the training data, these fake demonstrations generated fake examples with high semantic consistency by semantically re-expressing the prompts, guiding the model to learn specific trigger patterns in deep representations. Unlike traditional backdoor attack methods, HDPAttack did not rely on rare words, special characters, or abnormal tokens. Instead, it generated fake examples by altering the linguistic expression of prompts without significantly changing the semantics or labels of the input data, thereby evading detection techniques based on explicit abnormal features. This enabled the model to activate hidden backdoor behaviors in seemingly normal inputs, improving the stealth and success rate of the attack. This method has great potential in the field of stealthy attacks and provides a new research direction for enhancing backdoor defense technologies.

Key words: pre-trained language model, backdoor attack, prompt Learning

CLC Number: