信息网络安全 ›› 2020, Vol. 20 ›› Issue (9): 12-16.doi: 10.3969/j.issn.1671-1122.2020.09.003

• 入选论文 • 上一篇    下一篇

面向中文文本分类的词级对抗样本生成方法

仝鑫1, 王罗娜2, 王润正1, 王靖亚1()   

  1. 1. 中国人民公安大学信息网络安全学院,北京 100038
    2. 北京字节跳动科技有限公司,北京 100000
  • 收稿日期:2020-07-16 出版日期:2020-09-10 发布日期:2020-10-15
  • 通讯作者: 王靖亚 E-mail:wangjingya@ppsuc.edu.cn
  • 作者简介:仝鑫(1995—),男,河南,硕士研究生,主要研究方向为对抗样本和自然语言处理|王罗娜(1992—),女,山东,硕士,主要研究方向为自然语言处理和新媒体传播|王润正(1996—),男,山东,硕士研究生,主要研究方向为信息安全|王靖亚(1966—),女,陕西,教授,硕士,主要研究方向为对抗样本和自然语言处理
  • 基金资助:
    公安部技术研究计划竞争性遴选项目(2019JZX009);公安部科技强警基础专项(2018GABJC03);河南省高等学校重点科研项目计划(20B520008)

A Generation Method of Word-level Adversarial Samples for Chinese Text Classification

TONG Xin1, WANG Luona2, WANG Runzheng1, WANG Jingya1()   

  1. 1. College of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
    2. Beijing Bytedance Technology Co., Ltd, Beijing 100000, China
  • Received:2020-07-16 Online:2020-09-10 Published:2020-10-15
  • Contact: WANG Jingya E-mail:wangjingya@ppsuc.edu.cn

摘要:

针对基于深度学习方法的中文文本分类模型的鲁棒性问题,文章提出一种词级黑盒对抗样本生成方法CWordAttacker。该算法采用定向词删除评分机制,能够在模型内部细节未知的情况下定位显著影响分类结果的关键词,并使用繁体、拼音替换等多种攻击策略生成与原句语义一致的对抗样本,可完成定向和非定向两种攻击模式。在情感、垃圾短信和新闻分类数据集上针对LSTM、TextCNN和带注意力的CNN模型进行测试的结果表明:CWordAttacker能够以较小的扰动大幅度降低靶机模型准确率。

关键词: 对抗样本, 自然语言处理, 中文文本分类, 黑盒攻击, 人工智能安全

Abstract:

Aiming at the robustness of the Chinese text classification model based on deep learning methods, a word-level black-box adversarial sample generation method CWordAttacker is proposed. The algorithm uses the targeted deletion scoring mechanism, which can locate the key words that significantly affect the classification results when the internal details of the model are unknown. It also uses a variety of attack strategies such as traditional Chinese and Pinyin replacement to generate the adversarial samples consistent with the original sentence semantics, which can complete the targeted and non-targeted attack modes. The results of testing LSTM, TextCNN and CNN with attention on sentiment, spam messages and news classification datasets show that CWordAttacker can greatly reduce the accuracy of the target machine model with less perturbation.

Key words: adversarial samples, natural language processing, Chinese text classification, black-box attack, AI security

中图分类号: