信息网络安全 ›› 2025, Vol. 25 ›› Issue (5): 767-777.doi: 10.3969/j.issn.1671-1122.2025.05.009

• 理论研究 • 上一篇    下一篇

基于高阶特征与重要通道的通用性扰动生成方法

张兴兰, 陶科锦()   

  1. 北京工业大学计算机学院,北京 100124
  • 收稿日期:2025-02-19 出版日期:2025-05-10 发布日期:2025-06-10
  • 通讯作者: 陶科锦 taokejin@emails.bjut.edu.cn
  • 作者简介:张兴兰(1970—),女,山西,教授,博士,主要研究方向为人工智能安全、量子计算、密码学|陶科锦(2000—),男,山东,硕士研究生,主要研究方向为对抗样本与人工智能安全
  • 基金资助:
    国家自然科学基金(62202017)

Universal Perturbations Generation Method Based on High-Level Features and Important Channels

ZHANG Xinglan, TAO Kejin()   

  1. School of Computer Science, Beijing University of Technology, Beijing 100124, China
  • Received:2025-02-19 Online:2025-05-10 Published:2025-06-10

摘要:

以深度卷积神经网络(DCNN)为代表的深度神经网络模型在面对精心设计的对抗样本时,往往存在鲁棒性不足的问题。在现有的攻击方法中,基于梯度的对抗样本生成方法常常因过度拟合白盒模型而缺乏跨模型的迁移攻击能力。针对这一问题,文章提出一种基于高阶特征与重要通道的通用性扰动生成方法来提高对抗样本的可迁移性。文章基于高阶特征深度挖掘设计了3种损失模块。首先,通过干净样本对指定类别的类别梯度矩阵与对抗样本的高阶特征图相乘得到高阶特征重要通道损失,以此来引导对抗样本在高阶特征重点区域的变动趋势。其次,通过计算全局高阶特征矩阵与局部高阶特征矩阵的相似度作为高阶特征相似度损失,控制扰动对高阶特征的引导方向。最后,由分类损失控制目标攻击时扰动优化的总体方向。该方法在梯度更新过程中可与DIM、TIM、SIM等梯度更新策略联合训练扰动。通过在ImageNet与Fashion MNIST数据集上,针对多种正常训练与对抗训练的不同架构DCNN模型进行实验与测试,结果表明,该方法生成的对抗样本攻击迁移性显著优于现有的基于梯度的对抗样本生成方法。

关键词: 深度学习, 对抗样本, 卷积神经网络, 高阶特征, 可迁移性

Abstract:

Deep convolutional neural networks (DCNN) often exhibit insufficient robustness against carefully crafted adversarial examples. Existing gradient-based adversarial example generation methods frequently suffer from weak cross-model transferability due to overfitting to white-box models. To address this issue, this paper proposed a universal perturbations generation method based on high-level features and important channels to enhance the transferability of adversarial examples. The method incorporated three loss modules designed through deep mining of high-level features. First, the category gradient matrix of clean samples for specific classes was multiplied with the high-level feature maps of adversarial examples to construct the high-level feature important channel loss, which guided the perturbation direction in key regions of high-level features. Second, the similarity between global and local high-level feature matrices was calculated as the high-level feature similarity loss to control the perturbation guidance direction. Finally, the classification loss regulated the overall optimization direction during targeted attacks. The proposed method could be jointly trained with gradient update strategies such as DIM, TIM, and SIM during the gradient update process. Extensive experiments on ImageNet and Fashion MNIST datasets against various normally trained and adversarially trained DCNN models demonstrates that the adversarial examples generated by this method achieved significantly superior transferability attack performance compared to existing gradient-based adversarial example generation methods.

Key words: deep learning, adversarial examples, convolutional neural networks, high-level features, transferability

中图分类号: