信息网络安全 ›› 2025, Vol. 25 ›› Issue (1): 148-158.doi: 10.3969/j.issn.1671-1122.2025.01.013

• 技术研究 • 上一篇    下一篇

基于最优传输与改进型极限学习机的加密流量分类方法

邰滢滢, 魏苑苑, 周翰逊, 王妍()   

  1. 辽宁大学网络与信息安全学院,沈阳 110036
  • 收稿日期:2024-11-15 出版日期:2025-01-10 发布日期:2025-02-14
  • 通讯作者: 王妍 E-mail:35902642@qq.com
  • 作者简介:邰滢滢(1978—),女,辽宁,副教授,博士,主要研究方向为网络与信息安全、图像处理|魏苑苑(1998—),女,山东,硕士研究生,主要研究方向为信息安全|周翰逊(1981—),男,辽宁,副教授,博士,主要研究方向为网络安全|王妍(1978—),女,辽宁,教授,博士,主要研究方向为网络安全、数据库
  • 基金资助:
    国家重点研发计划(2023YFC3304904)

Encrypted Traffic Classification Method Based on Optimal Transport and I-ELM

TAI Yingying, WEI Yuanyuan, ZHOU Hanxun, WANG Yan()   

  1. College of Cyber and Information Security, Liaoning University, Shenyang 110036, China
  • Received:2024-11-15 Online:2025-01-10 Published:2025-02-14
  • Contact: WANG Yan E-mail:35902642@qq.com

摘要:

为了解决加密流量分类任务中的数据不平衡以及模型微调过程中资源与时间消耗高的问题,文章提出一种名为CEFT的微调模型对加密流量进行分类。CEFT的预训练模型为ET-BERT,在此基础上引入最优传输OT和改进型极限学习机I-ELM模块,提升分类性能的同时,达到提高训练效率的目的。CEFT先将加密流量送入ET-BERT模型,实现特征提取,再接入最优传输模块,用以衡量模型预测与真实分布之间的传输成本。CEFT通过权重调整来使其最小化,使得模型在不同类别间的预测更加准确,从而有效应对数据不平衡问题。同时,CEFT通过引入I-ELM模块,实现快速权重更新,进而减少冗长的梯度计算,加速训练过程,解决资源和时间消耗高的问题。实验结果表明,CEFT在ISCX-VPN-Service和ISCX-VPN-App数据集上的准确率分别达到了98.97%和99.70%,且在精度、召回率和F1分数等指标上显著优于现有基准模型。在ISCX-VPN-Service数据集上,CEFT方法将训练时间减少了约33.33%,而在ISCX-VPN-App数据集上减少了约35.37%,显著缩短了训练时间。

关键词: CEFT, 加密流量分类, 数据不平衡, I-ELM, 最优传输

Abstract:

To address data imbalance as well as high resource and time consumption in encrypted traffic classification, this paper proposed a fine-tuning model named CEFT (Comprehensive Enhanced Fine-Tuning). CEFT used ET-BERT as its pre-trained model and introduced an OT (Optimal Transport) module and an I-ELM (Improved Extreme Learning Machine) module on top of it. These additions not only enhanced classification performance but also improved training efficiency. In CEFT, encrypted traffic was first fed into the ET-BERT model for feature extraction. Then, an OT module was employed to measure the transport cost between the model’s predicted distribution and the true distribution. By adjusting weighted to minimize this cost, the model achieved more accurate predictions across different categories, effectively mitigating the issue of data imbalance. Meanwhile, by incorporating the I-ELM module, CEFT enabled rapid weight updates, thereby reducing the lengthy gradient computation process and accelerating training, effectively addressing the problems of high resource and time consumption. Experiments show that CEFT achieves accuracies of 98.97% and 99.70% on the ISCX-VPN-Service and ISCX-VPN-App datasets, respectively, and significantly outperforms existing benchmark models in terms of precision, recall, and F1-score. On the ISCX-VPN-Service dataset, CEFT reduces training time by approximately 33.33%, and on the ISCX-VPN-App dataset, by about 35.37%, markedly shortening the training duration.

Key words: CEFT, encrypted traffic classification, data imbalance, I-ELM, optimal transport

中图分类号: