基于时频图与改进E-GraphSAGE的网络流量特征提取方法

doi:10.3969/j.issn.1671-1122.2023.09.002

摘要/Abstract

摘要：

由于网络系统的时变性，时域空间网络流量不稳定并且分离难度高，传统时空网络模型对时空序列数据空间结构的刻画和对时空特征的挖掘不充分。针对上述问题，文章提出一种基于时频图与改进E-GraphSAGE的网络流量特征提取方法。首先以bior1.3小波基函数为势变基底，完成原始流量一维时域向时频域空间的映射变换，通过可视化分析去除噪声频段；然后在E-GraphSAGE模型的内部融合ConvLSTM模型，构建融合时空长期依赖特征的三维特征提取方法；最后获得包含局部和全局信息的时空频三维特征的边缘嵌入信息，解决了传统时空特征提取模型存在的整体信息缺失问题。可视化分析和分类实验结果表明，处理后的流量特征具有更高的稳定性和可分离度。同时，将文章所提方法与其他关联度较高的方法进行比较，结果表明文章所提方法在准确率、精确度、召回率及F1-score上均表现较好。

关键词: 流量分类, 时频分析, 流谱理论, 特征提取, E-GraphSAGE

Abstract:

Due to the time variability of the network system, the instability of time-space network traffic and the difficulty of separation, and the traditional spatiotemporal network model are insufficient in characterizing the spatial structure of spatiotemporal sequence data and mining spatiotemporal features. Therefore, a method of feature extraction for network traffic based on time-frequency diagrams and improved E-GraphSAGE was proposed. Firstly, based on the potential change of the bior1.3 wavelet basis function, the mapping transformation of original traffic from the one-dimensional time domain to the time-frequency domain was completed, and the noise band was removed by visual analysis. Then, the 1D ConvLSTM model was fused within the E-GraphSAGE model to construct a 3D feature extraction method that integrated spatiotemporal and long-term dependent features. Finally, edge embedding of spatiotemporal frequency 3D features containing local and global information was obtained to solve the problem of global information loss in traditional spatiotemporal feature extraction models. The visual analysis and multi-classification experiments show that the traffic characteristics processed in this paper have higher stability and separability. At the same time, comparing with other methods with higher correlation degrees, this method achieves better results in accuracy, accuracy, recall rate, and F1-score.

Key words: traffic classification, time-frequency analysis, flow spectrum theory, feature extraction, E-GraphSAGE

中图分类号:

TP309

张玉臣, 张雅雯, 吴越, 李程. 基于时频图与改进E-GraphSAGE的网络流量特征提取方法[J]. 信息网络安全, 2023, 23(9): 12-24.

ZHANG Yuchen, ZHANG Yawen, WU Yue, LI Cheng. A Method of Feature Extraction for Network Traffic Based on Time-Frequency Diagrams and Improved E-GraphSAGE[J]. Netinfo Security, 2023, 23(9): 12-24.

图/表 11

图1

图2

图3

图4

表1

表2

图5

表3

表4

表5

图6

参考文献 29

[1]	ZHANG Surong, BU Youjun, CHEN Bo, et al. Encrypted Traffic Classification Method Based on Multi-Layer Bidirectional SRU and Attention Model[J]. Computer Engineering, 2022, 48(11): 127-136. doi: 10.19678/j.issn.1000-3428.0063626
	张稣荣, 卜佑军, 陈博, 等. 基于多层双向SRU与注意力模型的加密流量分类方法[J]. 计算机工程, 2022, 48(11): 127-136. doi: 10.19678/j.issn.1000-3428.0063626
[2]	NARGESIAN F, SAMULOWITZ H, KHURANA U, et al. Learning Feature Engineering for Classification[C]// ACM. 26th International Joint Conference on Artificial Intelligence. New York: ACM, 2017: 2529-2535.
[3]	TANG Jiliang, ALELYANI S, LIU Huan. Feature Selection for Classification: A Review[EB/OL]. (2014-01-01)[2023-04-11]. https://www.researchgate.net/publication/288257551_Feature_selection_for_classification_A_review.
[4]	KHAMMASSI C, KRICHEN S. A GA-LR Wrapper Approach for Feature Selection in Network Intrusion Detection[J]. Computers & Security, 2017(70): 255-277.
[5]	PACHECO J, BENITEZ V, FÉLIX L. Anomaly Behavior Analysis for IoT Network Nodes[C]// ACM. 3rd International Conference on Future Networks and Distributed Systems. New York: ACM, 2019: 1-6.
[6]	RATHORE M M, SAEED F, REHMAN A, et al. Intrusion Detection Using Decision Tree Model in High-Speed Environment[C]// IEEE. 2018 International Conference on Soft-Computing and Network Security (ICSNS). New York: IEEE, 2018: 1-4.
[7]	BENGIO Y, COURVILLE A, VINCENT P. Representation Learning: A Review and New Perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828. doi: 10.1109/TPAMI.2013.50 pmid: 23787338
[8]	LO W W, LAYEGHY S, SARHAN M, et al. E-GraphSAGE: A Graph Neural Network Based Intrusion Detection System for IoT[C]// IEEE. NOMS 2022 IEEE/IFIP Network Operations and Management Symposium. New York: IEEE, 2022: 1-9.
[9]	GUO Shize, WANG Xiaojuan, HE Mingshu, et al. Research on Intelligent Monitoring Technology in Cyberspace Adversarial Defense[J]. Information Security and Communications Privacy, 2021(11): 79-94.
	郭世泽, 王小娟, 何明枢, 等. 网络空间对抗防御中的智能监测技术研究[J]. 信息安全与通信保密, 2021(11): 79-94.
[10]	JIANG Weiwei. Graph-Based Deep Learning for Communication Networks: A Survey[J]. Computer Communications, 2022(185): 40-54.
[11]	CAI Hongyun, ZHENG V W, CHANG K C C. A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(9): 1616-1637. doi: 10.1109/TKDE.69 URL
[12]	XIAO Qingsai, LIU Jian, WANG Qiuyun, et al. Towards Network Anomaly Detection Using Graph Embedding[C]// Springer. Computational Science-ICCS 2020: 20th International Conference. Berlin: Springer, 2020: 156-169.
[13]	CHEN Zhaomin, YEO C K, LEE B S, et al. Autoencoder-Based Network Anomaly Detection[C]// IEEE. 2018 Wireless Telecommunications Symposium (WTS). New York: IEEE, 2018: 1-5.
[14]	GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural Message Passing for Quantum Chemistry[J]. International Conference on Machine Learning, 2017(70): 1263-1272.
[15]	ABDULRAHAMAN M D, ALHASSAN J K. Ensemble Learning Approach for the Enhancement of Performance of Intrusion Detection System[C]// Elsevier. International Conference on Information and Communication Technology and Its Applications (ICTA 2018). Amsterdam: Elsevier, 2018: 1-8.
[16]	WANG Yifei, MO Shuang, WU Wenrui, et al. Internal-External Convolutional Networks for Network Intrusion Detection[J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(5): 94-100. doi: 10.13190/j.jbupt.2021-007
	王艺霏, 莫爽, 吴文睿, 等. 基于内外卷积网络的网络入侵检测[J]. 北京邮电大学学报, 2021, 44(5): 94-100. doi: 10.13190/j.jbupt.2021-007
[17]	HAMILTON W L, YING R, LESKOVEC J. Inductive Representation Learning on Large Graphs[J]. Advances in Neural Information Processing Systems, 2017(30): 1025-1035.
[18]	LAN Jin, LU Jiazhong, WAN Guogao, et al. E-Minbatch GraphSAGE: An Industrial Internet Attack Detection Model[EB/OL]. (2022-07-14)[2023-04-11]. https://dl.acm.org/doi/10.1155/2022/5363764.
[19]	CHANG Liyan, BRANCO P. Graph-Based Solutions with Residuals for Intrusion Detection: the Modified E-GraphSAGE and E-ResGAT Algorithms[EB/OL]. (2021-11-26)[2023-04-11]. https://arxiv.org/abs/2111.13597.
[20]	CAVILLE E, LO W W, LAYEGHY S, et al. Anomal-E: A Self-Supervised Network Intrusion Detection System Based on Graph Neural Networks[EB/OL]. (2022-07-14)[2023-04-11]. https://arxiv.org/abs/2207.06819.
[21]	ZHANG Xin, JIANG Xiaolu, YU Fucai, et al. Application of Time-Frequency Analysis in Classification of P2P Flow[J]. Application Research of Computers, 2015, 32(10): 3078-3082.
	张昕, 蒋晓路, 于富财, 等. 时频分析在P2P网络流分类中的应用研究[J]. 计算机应用研究, 2015, 32(10): 3078-3082.
[22]	GUO Shize, LYU Renjian, HE Mingshu, et al. Application of Flow Spectrum Theory in Network Defense[J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(3): 19-25.
	郭世泽, 吕仁健, 何明枢, 等. 流谱理论及其在网络防御中的应用[J]. 北京邮电大学学报, 2022, 45(3): 19-25.
[23]	WANG Yingjie, XU Guangquan, LI Xing, et al. Identifying Vulnerabilities of SSL/TLS Certificate Verification in Android Apps with Static and Dynamic Analysis[EB/OL]. (2020-04-22)[2023-04-11]. https://www.sciencedirect.com/science/article/abs/pii/S016412122030087X.
[24]	COHEN L. Time-Frequency Analysis[M]. New Jersey: Control Engineering Practice Prentice Hall, 1995.
[25]	COHEN L. Time-Frequency Distributions-A Review[J]. Proceedings of the IEEE, 1989, 77(7): 941-981. doi: 10.1109/5.30749 URL
[26]	DURAK L, ARIKAN O. Short-Time Fourier Transform: Two Fundamental Properties and an Optimal Implementation[J]. IEEE Transactions on Signal Processing, 2003, 51(5): 1231-1242. doi: 10.1109/TSP.2003.810293 URL
[27]	ZHANG Dengsheng. Wavelet Transform[EB/OL]. (2019-05-14)[2023-04-11]. https://link.springer.com/chapter/10.1007/978-3-030-17989-2_2.
[28]	ZHANG Yawen, ZHANG Yuchen, WU Yue, et al. TPE-NIDS: Uses Graph Neural Networks to Detect Malicious Traffic[C]// IEEE. 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC). New York: IEEE, 2022: 949-958.
[29]	MOUSTAFA N, SLAY J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set)[C]// IEEE. 2015 Military Communications and Information Systems Conference (MilCIS). New York: IEEE, 2015: 1-6.

评价指标	计算方式
准确率	$\frac{TP+TN}{TP+FP+FN+TN}$
精确率	$\frac{TP}{TP+FP}$
召回率	$\frac{TP}{TP+FN}$
F1-score	$\frac{\text{2}TP}{\text{2}TP\text{+}FP\text{+}FN}$

实验平台	环境配置
操作系统	Ubuntu22.04.1LST
CPU	Intel Core?i7-7700HQ CPU
GPU	NVIDIA GeForce GTX 1070 Mobile
IDE工具	PyCharm Professional Edition 2022.2.2 x64
编程语言	Python 3.9
深度学习框架	PyTorch 1.12.1+cuda11.7

层序	类型	Filres/核	步长	激活函数
1	1D-conve+Maxpool1D	16/3	2	ReLU
2	1D-conve+Maxpool1D	32/3	2	ReLU
3	1D-conve+Maxpool1D	64/3	2	ReLU
4	1D-conve+Maxpool1D	128/3	2	ReLU
5	Dense	100	—	ReLU
6	Dense	50	—	ReLU
7	Dense	10	—	ReLU
8	Softmax	—	—	—

数据集	方法	准确率	精确率	召回率	F1-score
UNSW-NB15	原始流量	60.23	62.89	60.85	63.11
	时频分析模型	65.34	69.21	64.87	66.45
	流谱理论模型	76.47	80.11	77.05	78.55
	E-GraphSAGE M	94.09	9606	96.69	95.19
	TPE-GraphSAGE	96.96	97.19	97.22	95.37
	TPS-EGraphSAGE	97.20	97.26	97.40	96.82
CICIDS2017	原始流量	70.25	68.83	70.87	69.84
	时频分析模型	72.78	69.23	72.95	71.37
	流谱理论模型	74.57	79.44	80.36	79.56
	E-GraphSAGE M	94.09	96.06	96.69	95.19
	TPE-GraphSAGE	97.20	97.26	97.40	96.82
	TPS-EGraphSAGE	98.64	98.62	98.59	98.49

数据集	算法	每类加权F1-score
数据集	算法	0	1	2	3	4	5	6	7	8	9
UNSW-NB15	E-GraphSAGE M	98.2	12.2	0	0	73.5	0	21.4	7.8	0	0
	TPE-GraphSAGE	99.0	30.5	22.7	9.4	53.0	0	14.4	26.1	0	6.8
	TSP-EGraphSAGE	99.1	34.8	26.0	12.8	54.2	23.3	30.9	35.0	4.2	8.6
CICIDS 2017	E-GraphSAGE M	92.3	94.3	90.1	66.3	58.2	90.3	0	—	—	—
	TPE-GraphSAGE	95.4	97.3	92.9	97.2	62.4	91.8	0	—	—	—
	TSP-EGraphSAGE	96.6	98.6	95.1	98.1	71.8	94.3	7.6	—	—	—