基于改进黏菌算法的词级对抗样本生成方法

doi:10.3969/j.issn.1671-1122.2026.05.005

摘要/Abstract

摘要：

在人工智能技术的持续革新与广泛应用的背景下,深度学习技术取得了突破性进展,但其在受到对抗样本攻击时往往展现出脆弱性。在自然语言处理领域,文本的离散性和语义约束使文本对抗攻击更具挑战性,其中单词级攻击作为典型组合优化任务,现有方法面临搜索空间构建易引发语义偏移和优化算法常陷入局部最优的双重瓶颈,难以高效探索高质量扰动策略。针对这些问题,文章提出一种基于改进黏菌算法的词级对抗样本生成方法。首先,通过构建基于义原知识的替换候选词搜索空间,有效约束替换词的语义一致性,避免传统方法中因语义偏移导致的文本自然度下降问题；然后,改进的黏菌算法通过离散空间逻辑运算重构位置更新规则,在全局探索与局部开发间实现动态平衡,避免算法陷入局部最优问题；最后,采用中日类形字字典的形近字替换策略,进一步增强对抗样本的攻击性。实验结果表明,该方法在中文情感分类任务中实现了对文本分类模型的有效攻击,分类准确率平均下降30%以上,且在语义相似度上显著优于对比方法。

关键词: 对抗样本, 单词级对抗攻击, 黏菌算法

Abstract:

In the context of the continuous innovation and wide application of artificial intelligence technology, deep learning has achieved significant breakthroughs, but it often shows vulnerability when subjected to adversarial sample attacks. In the field of natural language processing, the discreteness and semantic constraints of text make text adversarial attacks more challenging. Among them, word-level attacks, as a typical combinatorial optimization task, existing methods face the dual bottlenecks of easily causing semantic deviation in the construction of the search space and often getting stuck in local optima of the optimization algorithm, making it difficult to efficiently explore high-quality perturbation strategies. To address these issues, this paper proposed a word-level adversarial sample generation method based on the improved slime mold algorithm. Firstly, by constructing a search space of replacement candidate words based on semantic origin knowledge, it effectively constrained the semantic consistency of the replacement words, avoiding the problem of naturalness degradation of the text caused by semantic deviation in traditional methods; then, the improved slime mold algorithm reconstructed the position update rule through discrete space logical operations, achieving a dynamic balance between global exploration and local development, avoiding the problem of the algorithm getting stuck in local optima; finally, by using the substitution strategy of similar characters in Chinese and Japanese dictionaries, the attackability of the adversarial samples was further enhanced. Experimental results show that this method achieves effective attacks on text classification models in the Chinese sentiment classification task, with an average decrease in classification accuracy of more than 30%, and significantly outperforms the comparison methods in semantic similarity.

Key words: adversarial samples, word-level adversarial attack, slime mold algorithm

中图分类号:

TP309

徐茹枝, 武晓欣. 基于改进黏菌算法的词级对抗样本生成方法[J]. 信息网络安全, 2026, 26(5): 725-735.

XU Ruzhi, WU Xiaoxin. A Word-Level Adversarial Sample Generation Method Based on the Improved Slime Mold Algorithm[J]. Netinfo Security, 2026, 26(5): 725-735.

图/表 10

图1

表1

图2

表2

表3

表4

表5

表6

表7

表8

参考文献 26

[1]	DHARIWAL P, NICHOL A. Diffusion Models Beat Gans on Image Synthesis[J]. Advances in Neural Information Processing Systems, 2021, 34: 8780-8794.
[2]	DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]//ACL. The 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186.
[3]	DING Hongwei, CHEN Leiyang, DONG Liang, et al. Imbalanced Data Classification: A KNN and Generative Adversarial Networks-Based Hybrid Approach for Intrusion Detection[J]. Future Generation Computer Systems, 2022, 131: 240-254. doi: 10.1016/j.future.2022.01.026 URL
[4]	SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing Properties of Neural Networks[EB/OL]. (2013-12-19)[2025-11-21]. https://arxiv.org/abs/1312.6199.
[5]	GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and Harnessing Adversarial Examples[EB/OL]. (2014-12-20)[2025-11-21]. https://arxiv.org/abs/1412.6572.
[6]	YUAN Tianhao, JI Shunhui, ZHANG Pengcheng, et al. Adversarial Example Generation Method for Black Box Intelligent Speech Software[J]. Journal of Software, 2022, 33(5): 1569-1586.
	袁天昊, 吉顺慧, 张鹏程, 等. 针对黑盒智能语音软件的对抗样本生成方法[J]. 软件学报, 2022, 33(5): 1569-1586.
[7]	MADRY A, MAKELOV A, SCHMIDT L, et al. Towards Deep Learning Models Resistant to Adversarial Attacks[EB/OL]. (2017-06-19)[2025-11-21]. https://arxiv.org/abs/1706.06083.
[8]	MAHESHWARY R, MAHESHWARY S, PUDI V. Generating Natural Language Attacks in a Hard Label Black Box Setting[C]//AAAI. The AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2021: 13525-13533.
[9]	ALZANTOT M, SHARMA Y, ELGOHARY A, et al. Generating Natural Language Adversarial Examples[C]//ACL. The 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2018: 2890-2896.
[10]	ZANG Yuan, QI Fanchao, YANG Chenghao, et al. Word-Level Textual Adversarial Attacking as Combinatorial Optimization[C]//ACL. The 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 6066-6080.
[11]	TONG Xin, WANG Luona, WANG Runzheng, et al. A Generation Method of Word-Level Adversarial Samples for Chinese Text Classification[J]. Netinfo Security, 2020, 20(9): 12-16.
	仝鑫, 王罗娜, 王润正, 等. 面向中文文本分类的词级对抗样本生成方法[J]. 信息网络安全, 2020, 20(9): 12-16.
[12]	WANG Wenqi, WANG Run, WANG Lina, et al. Adversarial Examples Generation Approach for Tendency Classification on Chinese Texts[J]. Journal of Software, 2019, 30(8): 2415-2427.
	王文琦, 汪润, 王丽娜, 等. 面向中文文本倾向性分类的对抗样本生成方法[J]. 软件学报, 2019, 30(8): 2415-2427.
[13]	GAO Ji, LANCHANTIN J, SOFFA M L, et al. Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers[C]//IEEE.2018 IEEE Security and Privacy Workshops (SPW). New York: IEEE, 2018: 50-56.
[14]	JIN Di, JIN Zhijing, ZHOU J T, et al. Is Bert Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment[C]//AAAI. The AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 8018-8025.
[15]	JIA R, LIANG P. Adversarial Examples for Evaluating Reading Comprehension Systems[C]//ACL. The 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2017: 2021-2031.
[16]	MINERVINI P, RIEDEL S. Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge[C]//ACL.The 22nd Conference on Computational Natural Language Learning. Stroudsburg: ACL, 2018: 65-74.
[17]	LI Shimin, CHEN Huiling, WANG Mingjing, et al. Slime Mould Algorithm: A New Method for Stochastic Optimization[J]. Future generation computer systems, 2020, 111: 300-323. doi: 10.1016/j.future.2020.03.055 URL
[18]	DONG Zhendoong, DONG Qiang. HowNet and the Computation of Meaning (with CD-ROM)[M]. Singapore: World Scientific, 2006.
[19]	SATO M, SUZUKI J, SHINDO H, et al. Interpretable Adversarial Perturbation in Input Embedding Space for Text[C]//International Joint Conferences on Artificial Intelligence Organization. The 27th International Joint Conference on Artificial Intelligence. New York: International Joint Conferences on Artificial Intelligence Organization, 2018: 4323-4330.
[20]	ZHANG Huangzhao, ZHOU Hao, MIAO Ning, LI Lei. Generating Fluent Adversarial Examples for Natural Languages[C]//ACL. The 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 5564-5569.
[21]	SAMANTA S, MEHTA S. Towards Crafting Text Adversarial Samples[EB/OL]. (2017-07-10)[2025-11-21]. https://arxiv.org/abs/1707.02812.
[22]	MILLER G A. WordNet: A Lexical Database for English[J]. Communications of the ACM, 1995, 38(11): 39-41.
[23]	WANG Tianrui. Text Adversarial Sample Generation Method Based on Swarm Intelligence Optimization[D]. Chengdu: University of Electronic Science and Technology of China, 2023.
	王天瑞. 基于群智能优化的文本对抗样本生成方法[D]. 成都: 电子科技大学, 2023.
[24]	LIU Wenjuan, WU Houyue, ZHANG Shunxiang. Adversarial Sample Generation Model Based on Improved Ant Colony Algorithm[J]. Journal of Chinese Information Processing, 2024, 38(8): 44-54.
	刘文娟, 吴厚月, 张顺香. 基于改进蚁群算法的对抗样本生成模型[J]. 中文信息学报, 2024, 38(8): 44-54.
[25]	MENG Yuxian, WU Wei, WANG Fei, et al. Glyce:Glyph-Vectors for Chinese character representations[C]//Neural Information Processing Systems Foundation. Advances in Neural Information Processing Systems. New York: Curran Associates, Inc., 2019: 2742-2753.
[26]	KUSNER M J, SUN Yu, KOLKIN N I, et al. From Word Embeddings to Document Distances[C]//International Machine Learning Society. The 32nd International Conference on Machine Learning. Cambridge: JMLR, 2015: 957-966.

运算操作	符号	条件	结果生成规则
与（AND）	$\wedge $	输入词相同（${{V}_{1}}={{V}_{2}}$）	保留原词
与（AND）	$\wedge $	输入词不同（${{V}_{1}}\ne {{V}_{2}}$）	在$V-{{V}_{1}}-{{V}_{2}}$集合中随机选取
或（OR）	$\vee $	输入词相同（${{V}_{1}}={{V}_{2}}$）	保留原词
或（OR）	$\vee $	输入词不同（${{V}_{1}}\ne {{V}_{2}}$）	在${{V}_{1}},{{V}_{2}}$中随机选取
异或（XOR）	⊕	输入词相同（${{V}_{1}}={{V}_{2}}$）	在$V-{{V}_{1}}$集合中随机选取
异或（XOR）	⊕	输入词不同（${{V}_{1}}\ne {{V}_{2}}$）	在${{V}_{1}},{{V}_{2}}$中随机选取

网络模型	无修改	对比方法						本文方法
	无修改	CWordAttacker		WordHanding		TextFooler		ISMA
	准确率	准确率	降低幅度	准确率	降低幅度	准确率	降低幅度	准确率	降低幅度
TextCNN	89.32%	60.38%	28.94%	66.48%	22.84%	74.22%	15.10%	57.34%	31.98%
LSTM	91.46%	55.74%	35.72%	57.17%	34.29%	67.28%	24.18%	51.28%	40.18%
BERT	92.64%	57.25%	35.39%	60.95%	31.69%	67.82%	24.82%	54.12%	38.52%
Transformer	92.91%	60.52%	32.39%	65.93%	26.98%	70.43%	22.48%	59.66%	33.25%

网络模型	无修改	对比方法						本文方法
	无修改	CWordAttacker		WordHanding		TextFooler		ISMA
	准确率	准确率	降低幅度	准确率	降低幅度	准确率	降低幅度	准确率	降低幅度
TextCNN	87.32%	59.43%	27.89%	63.34%	23.98%	72.84%	14.48%	55.49%	31.83%
LSTM	88.44%	56.17%	32.27%	60.42%	28.02%	68.45%	19.99%	58.14%	30.30%
BERT	87.96%	60.74%	27.22%	63.21%	24.75%	69.73%	18.23%	58.78%	29.18%
Transformer	89.88%	57.55%	32.33%	62.16%	27.72%	70.65%	19.23%	59.57%	30.31%

网络模型	无修改	对比方法						本文方法
	无修改	CWordAttacker		WordHanding		TextFooler		ISMA
	准确率	准确率	降低幅度	准确率	降低幅度	准确率	降低幅度	准确率	降低幅度
TextCNN	88.05%	60.47%	27.58%	67.43%	20.62%	71.24%	16.81%	56.63%	31.42%
LSTM	88.74%	58.35%	30.39%	61.24%	27.50%	67.52%	21.22%	57.92%	30.82%
BERT	90.86%	62.64%	28.22%	66.85%	24.01%	71.44%	19.42%	54.16%	36.70%
Transformer	89.47%	54.48%	34.99%	60.58%	28.89%	72.26%	17.21%	48.85%	40.62%

WMD	对比方法			本文方法
WMD	CWordAttacker	WordHanding	TextFooler	ISMA
0~0.2	12.35%	13.15%	13.80%	14.70%
0.2~0.4	22.90%	23.80%	23.90%	24.60%
0.4~0.6	28.85%	28.05%	26.15%	25.45%
0.6~0.8	18.95%	17.30%	14.70%	18.90%
0.8及以上	16.95%	17.70%	21.45%	16.35%