基于深度度量学习的异常流量检测方法

doi:10.3969/j.issn.1671-1122.2024.03.011

信息网络安全 ›› 2024, Vol. 24 ›› Issue (3): 462-472.doi: 10.3969/j.issn.1671-1122.2024.03.011

基于深度度量学习的异常流量检测方法

张强¹, 何俊江¹(), 李汶珊¹^,², 李涛¹

1.四川大学网络空间安全学院，成都 610065
2.成都信息工程大学网络空间安全学院，成都 610225

收稿日期:2023-07-12 出版日期:2024-03-10 发布日期:2024-04-03
通讯作者: 何俊江 E-mail:hejunjiang@scu.edu.cn
作者简介:张强（1999—），男，河南，硕士研究生，主要研究方向为网络流量分类、深度学习、数据挖掘|何俊江（1993—），男，四川，助理研究员，博士，主要研究方向为网络流量分析识别、信息安全、数据挖掘|李汶珊（1995—），女，四川，讲师，博士研究生，主要研究方向为数据科学、机器学习、生物信息学|李涛（1965—），男，四川，教授，博士，主要研究方向为人工免疫、网络安全、信息安全、数据安全
基金资助:
国家自然科学基金(62032002);国家自然科学基金(62101358);国家重点研发计划(2020YFB1805400);中国博士后科学基金(2020M683345);中央高校基本科研业务费(2023SCU12127);四川省青年基金(2023NSFSC1395);四川大学和中国核动力院联合创新基金(HG2022143)

Anomaly Traffic Detection Based on Deep Metric Learning

ZHANG Qiang¹, HE Junjiang¹(), LI Wenshan¹^,², LI Tao¹

1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
2. School of Cybersecurity, Chengdu University of Information Technology, Chengdu 610225, China

Received:2023-07-12 Online:2024-03-10 Published:2024-04-03
Contact: HE Junjiang E-mail:hejunjiang@scu.edu.cn

摘要/Abstract

摘要：

网络异常流量识别是目前网络安全的重要任务之一。然而传统流量分类模型是依据流量数据训练得到，由于大部分流量数据分布不均导致分类边界模糊，极大限制了模型的分类性能。为解决上述问题，文章提出一种基于深度度量学习的异常流量检测方法。首先，与传统深度度量学习每个类别单一代理的算法不同，文章设计双代理机制，通过目标代理指引更新代理的优化方向，提升模型的训练效率，增强同类别流量数据的聚集能力和不同类别流量数据的分离能力，实现最小化类内距离和最大化类间距离，使数据的分类边界更清晰；然后，搭建基于1D-CNN和Bi-LSTM的神经网络，分别从空间和时间的角度高效提取流量特征。实验结果表明，NSL-KDD流量数据经过模型处理，其类内距离显著减小并且类间距离显著增大，类内距离相比原始类内距离减小了73.5%，类间距离相比原始类间距离增加了52.7%，且将文章搭建的神经网络比广泛使用的深度残差网络训练时间更短、效果更好。将文章所提模型应用在流量分类任务中，在NSL-KDD和CICIDS2017数据集上，相比传统的流量分类算法，其分类效果更好。

关键词: 深度度量学习, 异常流量检测, 流量数据分布, 神经网络

Abstract:

The identification of network anomalous traffic is one of the important tasks of cyber security nowadays. However, traditional traffic classification models are trained based on traffic data, and most of the traffic data are unevenly distributed, leading to fuzzy classification boundaries, which will greatly limits the classification performance of the model. In order to solve the above problems, this paper proposed a deep metric learning based abnormal traffic detection method. Firstly, a new double-proxy mechanism was designed to improve the efficiency of model training by guiding the optimization direction of updateable proxy through the target proxy compared with the traditional deep metric learning algorithm of single proxy for each category, and to enhance the ability of aggregating traffic data of the same category and separating traffic data of different categories to minimize the intra-class distance and maximized the inter-class distance, which in turn maked the classification of data boundaries more clearly, breaking the performance bottleneck of traditional traffic classification models. Secondly, this paper built neural networks based on 1D-CNN and Bi-LSTM, which can efficiently extract traffic features from spatial and temporal perspectives. The experimental results show that the intra-class distance of NSL-KDD traffic data is significantly reduced and the inter-class distance is significantly increased after the model processing. The intra-class distance decreased by 73.5% compared to the original intra-class distance and the inter-class distance increased by 52.7% compared to the original inter-class distance. And the neural network built in this paper is compared to the widely used deep residual network for deep metric learning with shorter training time and better results. Applying the model proposed in this paper to the traffic classification task on the NSL-KDD and CICIDS2017 datasets, the classification effect is also significantly improved compared to the traditional traffic classification algorithms.

Key words: deep metric learning, abnormal traffic detection, traffic data distribution, neural network

中图分类号:

TP309

张强, 何俊江, 李汶珊, 李涛. 基于深度度量学习的异常流量检测方法[J]. 信息网络安全, 2024, 24(3): 462-472.

ZHANG Qiang, HE Junjiang, LI Wenshan, LI Tao. Anomaly Traffic Detection Based on Deep Metric Learning[J]. Netinfo Security, 2024, 24(3): 462-472.

图/表 13

图1

图2

图3

图4

表1

表2

图5

表3

表4

表5

图6

表6

表7

参考文献 38

[1]	YANG Zheng, LIU Xiaodong, LI Tong, et al. A Systematic Literature Review of Methods and Datasets for Anomaly-Based Network Intrusion Detection[J]. Computers & Security, 2022, 116: 102675-102684. doi: 10.1016/j.cose.2022.102675 URL
[2]	ROSHAN K, ZAFAR A. Deep Learning Approaches for Anomaly and Intrusion Detection in Computer Network: A Review[J]. Cyber Security and Cyber Security and Digital Forensics, 2022, 73: 551-563.
[3]	AZAB A, KHASAWNEH M, ALRABAEE S, et al. Network Traffic Classification: Techniques, Datasets, and Challenges[EB/OL]. (2022-09-18) [2023-06-20]. https://doi.org/10.1016/j.dcan.2022.09.009.
[4]	SEN S, SPATSCHECK O, WANG Dongmei. Accurate, Scalable in-Network Identification of P2P Traffic Using Application Signatures[C]// ACM. Proceedings of the 13th International Conference on World Wide Web. New York: ACM, 2004: 512-521.
[5]	MOORE A W, PAPAGIANNAKI K. Toward the Accurate Identification of Network Applications[C]// Springer. 6th International Workshop on Passive and Active Network Measurement. Heidelberg: Springer, 2005: 41-54.
[6]	PARVAT T J, CHANDRA P. A Novel Approach to Deep Packet Inspection for Intrusion Detection[J]. Procedia Computer Science, 2015, 45: 506-513. doi: 10.1016/j.procs.2015.03.091 URL
[7]	ZHANG Chunying, JIA Donghao, WANG Liya, et al. Comparative Research on Network Intrusion Detection Methods Based on Machine Learning[J]. Computers & Security, 2022, 121: 102861-102873. doi: 10.1016/j.cose.2022.102861 URL
[8]	HEARST M A, DUMAIS S T, OSUNA E, et al. Support Vector Machines[J]. IEEE Intelligent Systems and Their Applications, 1998, 13(4): 18-28.
[9]	SONG Yanyan, YING Lu. Decision Tree Methods: Applications for Classification and Prediction[J]. Shanghai Archives of Psychiatry, 2015, 27(2): 130-135. doi: 10.11919/j.issn.1002-0829.215044 pmid: 26120265
[10]	LAAKSONEN J, OJA E. Classification with Learning K-Nearest Neighbors[C]// IEEE. Proceedings of International Conference on Neural Networks (ICNN’96). New York: IEEE, 1996, 3: 1480-1483.
[11]	BREIMAN L. Random Forests[J]. Machine Learning, 2001, 45: 5-32. doi: 10.1023/A:1010933404324 URL
[12]	CHEN T, GUESTRIN C. XGBoost: A Scalable Tree Boosting System[C]// ACM. Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 785-794.
[13]	WANG Wei, ZHU Ming, WANG Jinlin, et al. End-to-End Encrypted Traffic Classification with One-Dimensional Convolution Neural Networks[C]// IEEE. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). New York: IEEE, 2017: 43-48.
[14]	YAO Haipeng, LIU Chong, ZHANG Peiying, et al. Identification of Encrypted Traffic through Attention Mechanism Based Long Short Term Memory[J]. IEEE Transactions on Big Data, 2019, 8(1): 241-252. doi: 10.1109/TBDATA.2019.2940675 URL
[15]	ZHANG Wenming. Abnormal Network Traffic Detection Based on Deep Metric Learning[D]. Xi’an: Xidian University, 2021.
	张文铭. 基于深度度量学习的异常网络流量检测[D]. 西安: 西安电子科技大学, 2021.
[16]	XUE Jingliang. Research on Network Traffic Identificaiton Technology Based on Deep Metric Learning[D]. Zhengzhou: PLA Strategic Support Force Information Engineering University, 2021.
	薛靖靓. 基于深度度量学习的网络流量识别技术研究[D]. 郑州: 战略支援部队信息工程大学, 2021.
[17]	CHOPRA S, HADSELL R, LECUN Y. Learning a Similarity Metric Discriminatively, with Application to Face Verification[C]// IEEE. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). New York: IEEE, 2005: 539-546.
[18]	SCHROFF F, KALENICHENKO D, PHILBIN J. Facenet: A Unified Embedding for Face Recognition and Clustering[C]// IEEE. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2015: 815-823.
[19]	SONG O H, XIANG Yu, JEGELKA S, et al. Deep Metric LEARNING via Lifted Structured Feature Embedding[C]// IEEE. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 4004-4012.
[20]	CHEN Weihua, CHEN Xiaotang, ZHANG Jianguo, et al. Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-Identification[C]// IEEE. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 403-412.
[21]	SOHN K. Improved Deep Metric Learning with Multi-Class N-Pair Loss Objective[J]. Advances in Neural Information Processing Systems, 2016, 8: 29-37.
[22]	MOVSHOVITZ-ATTIAS Y, TOSHEV A, LEUNG T K, et al. No Fuss Distance Metric Learning Using Proxies[C]// IEEE. Proceedings of the IEEE International Conference on Computer Vision. New York: IEEE, 2017: 360-368.
[23]	KIM S, KIM D, CHO M, et al. Proxy Anchor Loss for Deep Metric Learning[C]// IEEE. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2020: 3238-3247.
[24]	WANG Yifan, LIU Pingping, LANG Yijun, et al. Learnable Dynamic Margin in Deep Metric Learning[J]. Pattern Recognition, 2022, 132: 128-140.
[25]	LOKOČ J, KOHOUT J, ČECH P, et al. K-NN Classification of Malware in HTTPS Traffic Using the Metric Space Approach[C]// Springer. Intelligence and Security Informatics:11th Pacific Asia Workshop. Heidelberg: Springer, 2016: 131-145.
[26]	DI M M, DI S C. Improving SIEM Capabilities through an Enhanced Probe for Encrypted Skype Traffic Detection[J]. Journal of Information Security and Applications, 2018, 38: 85-95. doi: 10.1016/j.jisa.2017.12.001 URL
[27]	LOTFOLLAHI M, JAFARI S M, SHIRALI H Z R, et al. Deep Packet: A Novel Approach for Encrypted Traffic Classification Using Deep Learning[J]. Soft Computing, 2020, 24(3): 1999-2012. doi: 10.1007/s00500-019-04030-2
[28]	ZENG Yi, GU Huaxi, WEI Wenting, et al. Deep-Full-Range: A Deep Learning Based Network Encrypted Traffic Classification and Intrusion Detection Framework[J]. IEEE Access, 2019, 7: 45182-45190. doi: 10.1109/Access.6287639 URL
[29]	LIU Chang, HE Longtao, XIONG Gang, et al. FS-Net: A Flow Sequence Network for Encrypted Traffic Classification[C]// IEEE. IEEE INFOCOM 2019-IEEE Conference On Computer Communications. New York: IEEE, 2019: 1171-1179.
[30]	WANG Xin, CHENShuhui, SUJinshu. App-Net: A Hybrid Neural Network for Encrypted Mobile Traffic Classification[C]// IEEE. IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). New York: IEEE, 2020: 424-429.
[31]	TANG Chaofei, LUKTARHAN N, ZHAO Yuxin. SAAE-DNN: Deep Learning Method on Intrusion Detection[J]. Symmetry, 2020, 12(10): 1695-1699. doi: 10.3390/sym12101695 URL
[32]	LIN Kunda, XU Xiaolong, XIAO Fu. MFFusion: A Multi-Level Features Fusion Model for Malicious Traffic Detection Based on Deep Learning[J]. Computer Networks, 2022, 22: 108658-108665.
[33]	LAN Jinghong, LIU Xudong, LI Bo, et al. MEMBER: A Multi-Task Learning Model with Hybrid Deep Features for Network Intrusion Detection[J]. Computers & Security, 2022, 123: 102919-102925. doi: 10.1016/j.cose.2022.102919 URL
[34]	TAVALLAEE M, BAGHERI E, LU Wei, et al. A Detailed Analysis of the KDD CUP 99 Data Set[C]// IEEE. 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. New York: IEEE, 2009: 1-6.
[35]	SHARAFALDIN I, LASHKARI A H, GHORBANI A A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization[J]. International Conference on Information Systems Security and Privacy, 2018, 1: 108-116.
[36]	KESKES N, FAKHFAKH S, KANOUN O, et al. High Performance Oversampling Technique Considering Intra-Class and Inter-Class Distances[EB/OL]. (2022-11-02)[2023-06-20]. https://doi.org/10.1002/cpe.6753.
[37]	LOPEZ-MARTIN M, CARRO B, SANCHEZ-ESGUEVILLAS A, et al. Network Traffic Classifier with Convolutional and Recurrent Neural Networks for Internet of Things[J]. IEEE Access, 2017, 5: 18042-18050. doi: 10.1109/Access.6287639 URL
[38]	SINHA J, MANOLLAS M. Efficient Deep CNN-BiLSTM Model for Network Intrusion Detection[C]// ACM. Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition. New York: ACM, 2020: 223-231.

类别	Normal	DoS	Probe	U2R	R2L
训练集/个	67343	45927	11656	52	995
测试集/个	9711	7460	2421	67	2885

类别	训练集/个	测试集/个
BENIGN	15898	4542
DDoS	8958	2560
PortScan	11113	3175
DoS Hulk	16105	4602
DoS GoldenEye	7205	2059
DoS slowloris	4057	1159
DoS Slowhttptest	3847	1099
FTP Patator	5555	1587
SSH Patator	4127	1180
Bot	1370	391
Web Attack Brute Force	1055	301
Web Attack XSS	456	130

算法	类内距离	类间距离
原始数据	1.496	1177.544
Proxy-NCA	0.708	1240.120
Proxy-Anchor	0.604	1691.011
AM	0.564	1647.869
本文方法	0.397	1797.618

网络结构	模型层数/层	参数量	模型大小/MB
1D-CNN	7	2224888	8.90
BiLSTM	5	1220872	3.83
ResNet-1D	50	12439226	49.76
VGGNet-1D	19	11997114	47.99
1D-CNN+BiLSTM	9	4748296	18.99

网络结构	每一轮训练时间/s	类内距离	类间距离
1D-CNN	2.48	0.410	1756.602
BiLSTM	1.73	0.523	1614.695
ResNet-1D	12.18	0.604	1764.859
VGGNet-1D	9.36	0.453	1453.839
1D-CNN+BiLSTM	3.11	0.397	1797.618

基于深度度量学习的异常流量检测方法

Anomaly Traffic Detection Based on Deep Metric Learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 38

相关文章 15

编辑推荐

Metrics

本文评价

分类器	Weighted-Precision		Weighted-Recall		Weighted-F1
分类器	处理前	处理后	处理前	处理后	处理前	处理后
KNN	79.0%	82.6%	75.8%	78.1%	71.1%	74.9%
SVM	81.7%	82.7%	77.4%	78.1%	74.7%	75.1%
DT	73.2%	80.0%	69.6%	77.7%	65.4%	74.4%
RF	72.0%	82.4%	57.1%	78.0%	52.0%	74.7%
XGBoost	78.0%	82.7%	71.5%	78.1%	70.2%	75.0%
文献[37]	79.2%	81.7%	74.1%	78.2%	71.1%	75.1%
文献[38]	81.2%	82.7%	75.7%	78.1%	72.0%	75.2%

[1]	杨志鹏, 刘代东, 袁军翼, 魏松杰. 基于自注意力机制的网络局域安全态势融合方法研究[J]. 信息网络安全, 2024, 24(3): 398-410.
[2]	张新有, 孙峰, 冯力, 邢焕来. 基于多视图表征的虚假新闻检测[J]. 信息网络安全, 2024, 24(3): 438-448.
[3]	余尚戎, 肖景博, 殷琪林, 卢伟. 关注社交异配性的社交机器人检测框架[J]. 信息网络安全, 2024, 24(2): 319-327.
[4]	秦中元, 马楠, 余亚聪, 陈立全. 基于双重图神经网络和自编码器的网络异常检测[J]. 信息网络安全, 2023, 23(9): 1-11.
[5]	薛羽, 张逸轩. 深层神经网络架构搜索综述[J]. 信息网络安全, 2023, 23(9): 58-74.
[6]	许春根, 薛少康, 徐磊, 张盼. 基于安全两方计算的高效神经网络推理协议[J]. 信息网络安全, 2023, 23(7): 22-30.
[7]	苑文昕, 陈兴蜀, 朱毅, 曾雪梅. 基于深度学习的HTTP负载隐蔽信道检测方法[J]. 信息网络安全, 2023, 23(7): 53-63.
[8]	李晨蔚, 张恒巍, 高伟, 杨博. 基于AdaN自适应梯度优化的图像对抗迁移攻击方法[J]. 信息网络安全, 2023, 23(7): 64-73.
[9]	刘宇啸, 陈伟, 张天月, 吴礼发. 基于稀疏自动编码器的可解释性异常流量检测[J]. 信息网络安全, 2023, 23(7): 74-85.
[10]	蒋英肇, 陈雷, 闫巧. 基于双通道特征融合的分布式拒绝服务攻击检测算法[J]. 信息网络安全, 2023, 23(7): 86-97.
[11]	李志华, 王志豪. 基于LCNN和LSTM混合结构的物联网设备识别方法[J]. 信息网络安全, 2023, 23(6): 43-54.
[12]	蒋曾辉, 曾维军, 陈璞, 武士涛. 面向调制识别的对抗样本研究综述[J]. 信息网络安全, 2023, 23(6): 74-90.
[13]	赵小林, 王琪瑶, 赵斌, 薛静锋. 基于机器学习的匿名流量分类方法研究[J]. 信息网络安全, 2023, 23(5): 1-10.
[14]	陈梓彤, 贾鹏, 刘嘉勇. 基于Siamese架构的恶意软件隐藏函数识别方法[J]. 信息网络安全, 2023, 23(5): 62-75.
[15]	赵彩丹, 陈璟乾, 吴志强. 基于多通道联合学习的自动调制识别网络[J]. 信息网络安全, 2023, 23(4): 20-29.

分类器	Weighted-Precision		Weighted-Recall		Weighted-F1
分类器	处理前	处理后	处理前	处理后	处理前	处理后
KNN	97.0%	98.0%	97.1%	98.1%	97.0%	98.1%
SVM	66.6%	96.4%	62.9%	96.2%	60.4%	96.0%
DT	95.4%	96.2%	94.7%	95.8%	94.1%	95.7%
RF	97.0%	97.3%	97.4%	97.6%	97.2%	97.4%
XGBoost	97.8%	97.9%	97.9%	98.0%	97.7%	97.8%
文献[37]	97.3%	99.1%	97.4%	99.0%	97.2%	98.8%
文献[38]	91.7%	96.9%	90.3%	96.7%	88.7%	96.4%