信息网络安全 ›› 2024, Vol. 24 ›› Issue (8): 1163-1172.doi: 10.3969/j.issn.1671-1122.2024.08.003

• 理论研究 • 上一篇    下一篇

基于特征空间相似的隐形后门攻击

夏辉, 钱祥运()   

  1. 中国海洋大学信息科学与工程学部计算机科学与技术学院,青岛 266100
  • 收稿日期:2024-04-17 出版日期:2024-08-10 发布日期:2024-08-22
  • 通讯作者: 钱祥运 qianxiangyun@stu.ouc.edu.cn
  • 作者简介:夏辉(1986—),男,山东,教授,博士,主要研究方向为无线自组织网络、物联网安全、人工智能安全、隐私保护、边缘计算和联邦学习|钱祥运(1999—),男,山东,硕士研究生,主要研究方向为后门攻击和人工智能安全
  • 基金资助:
    国家自然科学基金(62172377)

Invisible Backdoor Attack Based on Feature Space Similarity

XIA Hui, QIAN Xiangyun()   

  1. College of Computer Science and Technology, Faculty of Information Science and Engineering, Ocean University of China, Qingdao 266100, China
  • Received:2024-04-17 Online:2024-08-10 Published:2024-08-22

摘要:

后门攻击指通过在深度神经网络模型训练过程中对原模型植入特定的触发器,导致模型误判的攻击。目前后门攻击方案普遍面临触发器隐蔽性差、攻击成功率低、投毒效率低与中毒模型易被检测的问题。为解决上述问题,文章在监督学习模式下,提出一种基于特征空间相似理论的模型反演隐形后门攻击方案。该方案首先通过基于训练的模型反演方法和一组随机的目标标签类别样本获得原始触发器。然后,通过Attention U-Net网络对良性样本进行特征区域分割,在重点区域添加原始触发器,并对生成的中毒样本进行优化,提高了触发器的隐蔽性和投毒效率。通过图像增强算法扩充中毒数据集后,对原始模型再训练,生成中毒模型。实验结果表明,该方案在保证触发器隐蔽性的前提下,在GTSRB和CelebA数据集中以1%的投毒比例达到97%的攻击成功率。同时,该方案保证了目标样本与中毒样本在特征空间内相似性,生成的中毒模型能够成功逃脱防御算法检测,提高了中毒模型的不可分辨性。通过对该方案进行深入分析,也可为防御此类后门攻击提供思路。

关键词: 数据投毒, 后门攻击, 特征空间相似, 监督学习

Abstract:

Backdoor attack refers to an attack that leads to model misjudgment by implanting a specific trigger to the original model during the model training process of deep neural networks. However, the current backdoor attack schemes generally face the problems of poor trigger concealment, low success rate of attack, low poisoning efficiency with easy detection of the poison model. To solve the above problems, the article proposed a model inversion stealthy backdoor attack scheme based on feature space similarity theory under supervised learning mode. The scheme first obtaind the original triggers through a training-based model inversion method and a set of random target label category samples. After that, the benign samples were segmented into feature regions by Attention U-Net network, the original triggers were added to the focus regions, and the generated poison samples were optimized to improve the stealthiness of the triggers and enhance the poisoning efficiency. After expanding the poison dataset by image enhancement algorithm, the original model was retrained to generate the poison model. The experimental results show that the scheme achieves 97% attack success rate with 1% poisoning ratio in GTSRB and CelebA datasets while ensuring the stealthiness of the trigger. At the same time, the scheme ensures the similarity between target samples and poison samples in the feature space, and the generated poison model can successfully escape detection by the defense algorithm, which improves the indistinguishability of the poison model. Through in-depth analysis of this scheme, it can also provide ideas for defending against such backdoor attacks.

Key words: data poisoning, backdoor attack, feature space similarity, supervised learning

中图分类号: