信息网络安全 ›› 2025, Vol. 25 ›› Issue (6): 859-871.doi: 10.3969/j.issn.1671-1122.2025.06.002

• 专题论文: 网络主动防御 • 上一篇    下一篇

基于知识蒸馏的轻量化恶意流量检测方法

孙剑文1(), 张斌1, 司念文2, 樊莹3   

  1. 1.信息工程大学密码工程学院,郑州 450001
    2.信息工程大学信息系统工程学院,郑州 450001
    3.武警工程大学装备管理与保障学院,西安 710038
  • 收稿日期:2025-02-20 出版日期:2025-06-10 发布日期:2025-07-11
  • 通讯作者: 孙剑文 jianwensun_edu@163.com
  • 作者简介:孙剑文(1988—),女,北京,工程师,博士研究生,主要研究方向为流量异常检测、机器学习|张斌(1969—),男,河南,教授,博士,主要研究方向为信息系统安全|司念文(1992—),男,湖北,讲师,博士,主要研究方向为大模型与微调、可解释深度学习|樊莹(1988—),女,陕西,硕士,主要研究方向为优化算法、装备管理。
  • 基金资助:
    河南省自然科学基金(252300420990);河南省科技攻关项目(252102211040);信息工程大学研究生创新基金(2019f113)

Lightweight Malicious Traffic Detection Method Based on Knowledge Distillation

SUN Jianwen1(), ZHANG Bin1, SI Nianwen2, FAN Ying3   

  1. 1. Department of Cryptogram Engineering, Information Engineering University, Zhengzhou 450001, China
    2. Information System Engineering Institute, Information Engineering University, Zhengzhou 450001, China
    3. College of Equipment Management and Support, Engineering University of PAP, Xi’an 710038, China
  • Received:2025-02-20 Online:2025-06-10 Published:2025-07-11

摘要:

针对资源受限场景下多分类恶意流量检测的模型轻量化需求,文章提出一种基于知识蒸馏的轻量化恶意流量检测方法。通过将12层transformer教师模型的知识迁移至1层transformer学生模型,结合Kullback-Leibler散度蒸馏损失与Focal监督损失的双重监督信号机制,模型从286 MB压缩至26 MB,推理速度提升约10倍,同时分类精确率下降幅度小于1.4个百分点。实验结果表明,在USTC-TFC2016、ISCX-VPN2016-Service和CSE-CIC-IDS2018 3个公开数据集上,压缩后的模型对长尾类别流量和隐蔽攻击模式识别准确率高于99.38%,显著优于传统CNN或RNN架构的轻量化方法,在资源效率与检测性能之间实现了平衡。

关键词: 知识蒸馏, 模型深度压缩, transformer层, 恶意流量检测, 多分类

Abstract:

To address the model lightweight requirements for multi-class malicious traffic detection in resource-constrained scenarios, this paper proposed a lightweight malicious traffic detection method based on knowledge distillation. The methodology transferred knowledge from a 12-layer transformer teacher model to a 1-layer transformer student model through a dual supervision mechanism that combined Kullback-Leibler divergence distillation loss with Focal supervisory loss. This approach achieved model compression from 286 MB to 26 MB with approximately 10 times faster inference speed, while limiting the decline in classification precision to less than 1.4 percentage points. Experimental results on three public datasets including USTC-TFC2016, ISCX-VPN2016-Service and CSE-CIC-IDS2018 demonstrate that the compressed model attains over 99.38% recognition accuracy for long-tailed category traffic and stealthy attack patterns, significantly outperforming traditional CNN/RNN- architecture-based lightweight methods. The framework establishes balance between resource efficiency and detection performance compared to existing solutions.

Key words: knowledge distillation, model depth compression, transformer layers, malicious traffic detection, multi classification

中图分类号: