基于生成对抗网络与自编码器的网络流量异常检测模型

doi:10.3969/j.issn.1671-1122.2022.12.002

摘要/Abstract

摘要：

近年来，机器学习尤其是深度学习算法在网络流量入侵检测领域得到了广泛应用，数据集样本类别分布情况是影响机器学习算法性能的一个重要因素。针对网络攻击类别多样，现有网络流量数据集类别分布不均的问题，文章提出了一种基于生成对抗网络与自编码器的网络流量异常检测模型。首先，文章使用基于Wasserstein距离的条件生成对抗网络对原始网络流量数据中的少数类别进行重采样；然后，使用堆叠去噪自编码器对重采样后的数据进行重构，获取数据的潜在信息；最后，使用编码器网络结合Softmax网络识别异常网络流量数据。在NSL-KDD入侵检测数据集上进行实验，实验结果表明，文章提出的异常检测模型可以有效提高类别占比不均衡的数据集中数量占比较少的攻击类型的识别率。

关键词: 深度学习, 异常检测, 生成对抗网络, 去噪自编码器

Abstract:

In recent years, machine learning, especially deep learning algorithms, has been widely used in the field of network traffic intrusion detection, the distribution of dataset sample categories is an important factor affecting the performance of machine learning algorithms. To address the problem of diverse network attack categories and uneven distribution of existing network traffic dataset categories, this paper proposed a network traffic anomaly detection model based on generative adversarial networks and self-encoders. Firstly, a conditional generative adversarial network based on Wasserstein distance was used to resample the minority categories in the original network traffic data. Secondly, the resampled data were reconstructed using a stacked denoising self-encoder to obtain potential information of the data. Finally, the encoder network combined with a Softmax network was used to identify anomalous network traffic data. Experiments are conducted on the NSL-KDD intrusion detection dataset, and the experimental results show that proposed anomaly detection model can effectively improve the recognition rate of minority categories.

Key words: deep learning, anomaly detection, generative adversarial networks, denoising autoencoder

中图分类号:

TP309

郭森森, 王同力, 慕德俊. 基于生成对抗网络与自编码器的网络流量异常检测模型[J]. 信息网络安全, 2022, 22(12): 7-15.

GUO Sensen, WANG Tongli, MU Dejun. Anomaly Detection Model Based on Generative Adversarial Network and Autoencoder[J]. Netinfo Security, 2022, 22(12): 7-15.

图/表 16

图1

图2

表1

表2

图3

图4

图5

图6

图7

图8

表3

表4

图9

图10

表5

表6

参考文献 17

[1]	China Internet Network Information Center. Statistical Report on Internet Development in China[EB/OL]. (2021-09-15) [2022-06-20]. http://www.cnnic.cn/n4/2022/0401/c88-1132.html.
[2]	National Internet Emergency Center. Internet Security Threat Report[EB/OL]. (2022-01-08) [2022-06-20]. https://www.cert.org.cn/publish/main/45/2022/20220118143459813267244/20220118143459813267244_.html.
[3]	KHRAISAT A, GONDAL I, VAMPLEW P, et al. Survey of Intrusion Detection Systems: Techniques, Datasets and Challenges[J]. Cybersecurity, 2019, 2(1): 1-22. doi: 10.1186/s42400-018-0018-3 URL
[4]	MODI C, PATEL D, BORISANIYA B, et al. A Survey of Intrusion Detection Techniques in Cloud[J]. Journal of Network and Computer Applications, 2013, 36(1): 42-57. doi: 10.1016/j.jnca.2012.05.003 URL
[5]	ZHAI Diqing, LYU Qi, YANG Huairen, et al. Machine Learning Based Network Anomaly Detection and Security Threat Level Prediction[J]. Computer Knowledge and Technology, 2021, 17(34): 10-12.
[6]	AMBUSAIDI M A, HE X, NANDA P, et al. Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm[J]. IEEE Transactions on Computers, 2016, 65(10): 2986-2998. doi: 10.1109/TC.2016.2519914 URL
[7]	LEI Yang. Network Anomaly Traffic Detection Algorithm Based on SVM[C]// IEEE.2017 International Conference on Robots & Intelligent System (ICRIS). New York: IEEE, 2017: 217-220.
[8]	GOLDSTEIN M, UCHIDA S. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data[EB/OL].(2016-04-19) [2022-06-20]. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173.
[9]	DROMARD J, ROUDIERE G, OWEZARSKI P. Online and Scalable Unsupervised Network Anomaly Detection Method[J]. IEEE Transactions on Network & Service Management, 2016, 14(1): 34-47.
[10]	BIGDELI E, MOHAMMADI M, RAAHEMI B, et al. Incremental Anomaly Detection Using Two-Layer Cluster-Based Structure[J]. Information Sciences, 2018, 429: 315-331. doi: 10.1016/j.ins.2017.11.023 URL
[11]	AN J, CHO S. Variational Autoencoder Based Anomaly Detection Using Reconstruction Probability[J]. Special Lecture on IE, 2015, 2(1): 1-18.
[12]	SHAH S, MUHURI P S, YUAN X, et al. Implementing a Network Intrusion Detection System Using Semi-Supervised Support Vector Machine and Random Forest[C]// ACM. Proceedings of the 2021 ACM Southeast Conference. New York: ACM, 2021: 180-184.
[13]	ABDEL-BASSET M, HAWASH H, CHAKRABORTTY R K, et al. Semi-Supervised Spatio-Temporal Deep Learning for Intrusions Detection in IoT Networks[J]. IEEE Internet of Things Journal, 2021, 8(15): 12251-12265. doi: 10.1109/JIOT.2021.3060878 URL
[14]	TAVALLAEE M, BAGHERI E, LU W, et al. A Detailed Analysis of the KDD CUP 99 Data Set[C]// IEEE. 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. New York: IEEE, 2009: 1-6.
[15]	GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved Training of Wasserstein GANs[C]// NIPS.Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: NIPS, 2017: 5768-5778.
[16]	VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion[J]. Journal of Machine Learning Research, 2010, 11(12): 3371-3408.
[17]	AYGUN R C, YAVUZ A G. Network Anomaly Detection with Stochastically Improved Autoencoder Based Models[C]// IEEE. 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud). New York: IEEE, 2017: 193-198.

原始类别标签	标签化后类别标签	类别数量/条
Normal	0	77054
DoS	1	53363
Probe	2	14230
R2L	3	3418
U2R	4	252

类别标签	样本数量	样本占比
Normal	77054	51.88%
DoS	53563	36.07%
Probe	14230	9.58%
R2L	3418	2.30%
U2R	252	0.17%

类别	采样前样本数量/条	采样后样本数量/条
Normal	53952	53952
DoS	37433	37433
Probe	9933	9933
R2L	2468	10000
U2R	175	10000

数据集	重采样算法	宏精确率	宏召回率	宏F1值
NSL-KDD	SMOTE	92.32%	92.54%	92.93%
	K-SMOTE	92.28%	92.76%	93.02%
	ADASYN	92.05%	92.90%	92.98%
	WBCGAN	94.46%	95.02%	94.88%

数据集	模型	宏精确率	宏召回率	宏F1值
NSL-KDD	XGBOOST	94.02%	94.78%	94.41%
	LightGBM	94.46%	95.02%	94.88%
	MLP	95.16%	94.89%	94.92%
	AE	91.23%	87.68%	89.51%
	DAE	96.48%	83.08%	89.28%
	SDAE	97.27%	96.59%	96.73%