信息网络安全 ›› 2024, Vol. 24 ›› Issue (4): 545-554.doi: 10.3969/j.issn.1671-1122.2024.04.005

• 理论研究 • 上一篇    下一篇

基于JSMA对抗攻击的去除深度神经网络后门防御方案

张光华1,2, 刘亦纯2, 王鹤1, 胡勃宁2()   

  1. 1.西安电子科技大学网络与信息安全学院,西安 710071
    2.河北科技大学信息科学与工程学院,石家庄 050018
  • 收稿日期:2023-09-10 出版日期:2024-04-10 发布日期:2024-05-16
  • 通讯作者: 胡勃宁 wwhbn@hebust.edu.cn
  • 作者简介:张光华(1979—),男,河北,教授,博士,CCF会员,主要研究方向为网络与信息安全|刘亦纯(1999—),女,河北,硕士研究生,主要研究方向为网络与信息安全|王鹤(1987—),女,河南,讲师,博士,主要研究方向为应用密码和量子密码协议|胡勃宁(1978—),女,河北,讲师,硕士,主要研究方向为通信网络安全
  • 基金资助:
    国家自然科学基金(U1836210)

Defense Scheme for Removing Deep Neural Network Backdoors Based on JSMA Adversarial Attacks

ZHANG Guanghua1,2, LIU Yichun2, WANG He1, HU Boning2()   

  1. 1. School of Cyber Engineering, Xidian University, Xi’an 710071, China
    2. School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050018, China
  • Received:2023-09-10 Online:2024-04-10 Published:2024-05-16

摘要:

深度学习模型缺乏透明性和可解释性,在推理阶段触发恶意攻击者设定的后门时,模型会出现异常行为,导致性能下降。针对此问题,文章提出一种基于JSMA对抗攻击的去除深度神经网络后门防御方案。首先通过模拟JSMA产生的特殊扰动还原潜藏的后门触发器,并以此为基础模拟还原后门触发图案;然后采用热力图定位还原后隐藏触发器的权重位置;最后使用脊回归函数将权重置零,有效去除深度神经网络中的后门。在MNIST和CIFAR10数据集上对模型性能进行测试,并评估去除后门后的模型性能,实验结果表明,文章所提方案能有效去除深度神经网络模型中的后门,而深度神经网络的测试精度仅下降了不到3%。

关键词: 深度学习模型, 对抗攻击, JSMA, 脊回归函数

Abstract:

Deep learning models lack transparency and interpretability, and the abnormal behavior triggered by malicious attacks during the inference stage can lead to a decline in their performance. In response to this issue, this paper proposed a defense scheme for removing deep neural network (DNN) backdoors based on JSMA adversarial attacks. Firstly, the hidden backdoor trigger was restored using special disturbances generated by simulations of JSMA, and this foundation formed the basis for simulating the restoration of the backdoor trigger pattern. Secondly, a heatmap was used to locate the weight position of the restored hidden trigger. Finally, a ridge regression function was used to reset the weights to zero effectively removing the backdoor in the DNN. This paper tested the model on the MNIST and CIFAR10 datasets, and evaluated the performance of the model after the backdoor removal. The experimental results show that this scheme can effectively remove the backdoors in DNN models, with only less than a 3% decrease in the testing accuracy of the DNN.

Key words: deep learning model, counter attack, JSMA, ridge regression function

中图分类号: