信息网络安全 ›› 2023, Vol. 23 ›› Issue (5): 1-10.doi: 10.3969/j.issn.1671-1122.2023.05.001

• 等级保护 • 上一篇    下一篇

基于机器学习的匿名流量分类方法研究

赵小林, 王琪瑶, 赵斌, 薛静锋()   

  1. 北京理工大学计算机学院,北京 100081
  • 收稿日期:2022-10-27 出版日期:2023-05-10 发布日期:2023-05-15
  • 通讯作者: 薛静锋 E-mail:xuejf@bit.edu.cn
  • 作者简介:赵小林(1971—),男,山西,副教授,博士,主要研究方向为网络安全、软件安全理论、Gauss数据库、软件工程应用|王琪瑶(1998—),女,湖南,硕士研究生,主要研究方向为网络安全|赵斌(1997—),男,内蒙古,硕士研究生,主要研究方向为网络安全|薛静锋(1975—),男,陕西,教授,博士,主要研究方向为网络安全、数据安全、软件安全、软件测试
  • 基金资助:
    国家重点研发计划(2020YFB1712104);山东省重点研发计划(重大科技创新工程)(2020CXGC010116)

Research on Anonymous Traffic Classification Method Based on Machine Learning

ZHAO Xiaolin, WANG Qiyao, ZHAO Bin, XUE Jingfeng()   

  1. School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China
  • Received:2022-10-27 Online:2023-05-10 Published:2023-05-15
  • Contact: XUE Jingfeng E-mail:xuejf@bit.edu.cn

摘要:

匿名通信工具在进行用户隐私保护的同时也为违法犯罪提供了便利,使得网络环境净化与监管愈发困难。对匿名网络信息交换产生的匿名流量进行分类可以细化网络监管范围。文章针对现有匿名流量分类方法存在流量分类粒度不细致和应用层匿名流量分类准确率偏低等问题,提出一种基于机器学习的匿名流量分类方法。该方法包括基于自动编码器和随机森林的特征提取模型以及基于卷积神经网络和XGBoost的匿名流量多分类模型两个模型,通过特征重构和模型结合的方式提升分类效果。最后在Anon17公开匿名流量数据集上进行了验证,证明了模型的可用性、有效性和准确性。

关键词: 机器学习, 匿名流量, 自动编码器, 特征提取, 卷积神经网络

Abstract:

Anonymous communication tools not only protect users’ privacy, but also provide shelter for crimes, making it more difficult to purify and supervise the network environment. Classification of anonymous traffic generated during information exchange in anonymous networks can refine the scope of network supervision. Aiming at the problems of insufficient granularity of traffic classification and low accuracy of anonymous traffic classification in the application layer in the existing anonymous traffic classification field, this paper proposed an application layer multi classification method for anonymous traffic based on machine learning. It included the feature extraction model based on auto-encoder and random forest, and the anonymous traffic multi classification model based on convolutional neural networks and XGBoost. The classification effect is improved through feature reconstruction and model combination, and is verified on Anon17 public anonymous traffic dataset, proving the usability, effectiveness and accuracy of the designed model.

Key words: machine learning, anonymous traffic, auto-encoder, feature extraction, convolutional neural networks

中图分类号: