信息网络安全 ›› 2025, Vol. 25 ›› Issue (12): 1878-1888.doi: 10.3969/j.issn.1671-1122.2025.12.004

• 理论研究 • 上一篇    下一篇

面向无目标后门攻击的投毒样本检测方法

逄淑超1, 李政骁1, 曲俊怡1, 马儒昊1, 陈贺昌2, 杜安安3()   

  1. 1.南京理工大学网络空间安全学院,南京 210094
    2.吉林大学人工智能学院,长春 130012
    3.南京工业职业技术大学计算机与软件学院,南京 210023
  • 收稿日期:2025-10-05 出版日期:2025-12-10 发布日期:2026-01-06
  • 通讯作者: 杜安安 E-mail:anan.du@niit.edu.cn
  • 作者简介:逄淑超(1988—),男,山东,教授,博士,CCF会员,主要研究方向为人工智能应用安全、数据安全与隐私保护|李政骁 (2002—),男,江苏,硕士研究生,主要研究方向为网络空间安全、无数据蒸馏、模型鲁棒性|曲俊怡(2002—),男,山东,本科,主要研究方向为人工智能应用及其安全|马儒昊(2000—),男,山东,硕士研究生,主要研究方向为数据安全与隐私保护、人工智能应用|陈贺昌(1988—),男,吉林,研究员,博士,主要研究方向为机器学习、数据挖掘、智能博弈、知识工程、复杂系统|杜安安(1989—),女,山东,副教授,博士,主要研究方向为弱监督学习、智能感知、人工智能模型安全
  • 基金资助:
    国家自然科学基金(62206128);国家重点研发计划(2023YFB2703900)

Detecting Poisoned Samples for Untargeted Backdoor Attacks

PANG Shuchao1, LI Zhengxiao1, QU Junyi1, MA Ruhao1, CHEN Hechang2, DU Anan3()   

  1. 1. School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
    2. School of Artificial Intelligence, Jilin University, Changchun 130012, China
    3. School of Computer and Software, Nanjing University of Industry Technology, Nanjing 210023, China
  • Received:2025-10-05 Online:2025-12-10 Published:2026-01-06
  • Contact: DU Anan E-mail:anan.du@niit.edu.cn

摘要:

后门攻击作为数据投毒攻击的重要方式,严重威胁数据集可靠性和模型训练安全性。现有的主流防御方法大多针对特定目标后门攻击而设计,缺乏对无目标后门攻击的研究。因此,文章提出一种面向无目标后门攻击的投毒样本检测方法。该方法是一种基于预测行为异常的黑盒方法,用于检测潜在的无目标后门样本,主要由两个模块组成:基于预测行为异常的投毒样本检测模块可以根据前后两次预测行为的不同来检测可疑样本;面向投毒样本攻击的扩散模型数据生成模块用于生成与原始数据集相似但不含触发器的新数据集。通过不同类型无目标后门攻击实验和不同生成模型实验,证明了该方法的可行性,以及生成模型尤其是扩散模型在后门攻击检测领域的巨大潜力和应用价值。

关键词: 数据安全, 无目标后门攻击, 图像识别, 生成模型, 深度学习

Abstract:

Backdoor attacks, as an important way of data poisoning attacks, represent a significant threat to the reliability of datasets and the security of model training. Currently, the predominant defensive strategies are largely targeted-backdoor-attacks and lack of research on non-target backdoor attacks. This study, however, proposed a poisoned sample detection method for untargeted backdoor attacks. This method was to propose a black-box method based on predicted behavioral anomalies to detect potential untargeted backdoor examples. This method consisted of two modules: a poisoned-example-detection module based on predictive behavior anomalies, which detected suspicious examples based on the discrepancy inprediction behaviors betweenthe original and the reconstructed samples; and a diffusion-model-data-generation module for poisoned examples attacks, which generated a new dataset similar to the original dataset, and without triggers. The feasibility of the method is demonstrated through experiments involving different types of targetless backdoor attack and different generative models. The great potential and application value of generative models, especially diffusion models, in the field of backdoor detection and defense is also demonstrated.

Key words: data security, untargeted backdoor attacks, image recognition, generative models, deep learning

中图分类号: