信息网络安全 ›› 2022, Vol. 22 ›› Issue (3): 70-77.doi: 10.3969/j.issn.1671-1122.2022.03.008

• 技术研究 • 上一篇    下一篇

基于网络结构自动搜索的对抗样本防御方法研究

郑耀昊1,2, 王利明1(), 杨婧1   

  1. 1. 中国科学院信息工程研究所,北京 100093
    2. 中国科学院大学网络空间安全学院,北京 100049
  • 收稿日期:2021-09-19 出版日期:2022-03-10 发布日期:2022-03-28
  • 通讯作者: 王利明 E-mail:wangliming@iie.ac.cn
  • 作者简介:郑耀昊(1995—),男,浙江,硕士研究生,主要研究方向为人工智能安全|王利明(1978—),男,北京,正高级工程师,博士,主要研究方向为云计算安全、智能安全|杨婧(1984—),女,山西,高级工程师,博士,主要研究方向为网络安全、数据安全分析
  • 基金资助:
    国家重点研发计划(2017YFB1010004)

A Defense Method against Adversarial Attacks Based on Neural Architecture Search

ZHENG Yaohao1,2, WANG Liming1(), YANG Jing1   

  1. 1. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
    2. School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2021-09-19 Online:2022-03-10 Published:2022-03-28
  • Contact: WANG Liming E-mail:wangliming@iie.ac.cn

摘要:

针对图像分类任务中存在的对抗样本攻击使图像分类器分类出错导致深度学习模型不可信的问题,文章提出一种基于网络结构自动搜索的对抗样本防御方法。该方法利用强化学习的思想,将搜索防御网络建模成智能体的行为。通过搜索空间的定义、搜索策略的设计以及网络性能的评估,控制网络自动搜索可以得到性能最优的图像重构网络,将对抗样本恢复成自然图片从而达到防御对抗攻击的目的。经实验验证,该方法有效地将对抗样本进行了重构,降低了其攻击性,从而保证分类器的分类正确率。

关键词: 网络结构搜索, 图像分类, 对抗样本, 深度学习

Abstract:

Aiming at the problem that the neural networks are easy to misclassify under the attack of adversarial examples in the task of image classification, which leads to the unreliability of deep learning models, this paper proposed a defense method against adversarial attacks based on neural architecture search. This method used reinforcement learning to model the search of defense network as the behavior of the agent. Through the definition of search space, the design of search strategy, and the evaluation of subnetwork performance, the search network can automatically obtain the best performance network to reconstruct adversarial images and restore them to natural images, achieving the purpose of defense against adversarial attacks. The experimental results show that the method can effectively reconstruct illegal examples, and make them lose aggressiveness, and consequently ensure the classification accuracy of the classifier.

Key words: neural architecture search, image classification, adversarial attack, deep learning

中图分类号: