基于改进MajorClust聚类的网络入侵行为检测

doi:10.3969/j.issn.1671-1122.2020.02.003

摘要/Abstract

摘要：

基于监督的入侵检测算法对于没有类别标记或识别特征不明显的网络访问连接,无法准确训练出入侵检测模型。为此,文章提出一种基于改进MajorClust聚类算法的无监督入侵检测算法,该算法能够动态自适应网络入侵行为数据的内在关系,实现自动高效地检测。改进MajorClust聚类算法,以未聚类邻边之和最小的点作为初始簇中心,依据簇中心与其他节点的距离分布特点,通过最小二乘法原理拟合点间的空间分布曲线,以曲线的拐点值作为聚类半径,并将簇抽象为节点重新进行聚类迭代,进而实现网络行为数据的自动聚类以及优化。文章构建了改进MajorClust算法、k-means算法及DBSCAN算法的无监督入侵检测模型,在优化处理的基础上,利用NSL-KDD数据集分析比较检测效果。实验结果表明,改进MajorClust算法在入侵检测性能及效果稳定性等方面具有较为显著的优势。

关键词: 入侵检测, MajorClust, NSL-KDD, 拐点半径

Abstract:

Based on the supervised intrusion detection algorithm, the intrusion detection model cannot be accurately trained for network access connections without category marking or identification features. Therefore, an unsupervised intrusion detection algorithm based on improved main class clustering algorithm is proposed, which can dynamically improve the MajorClust clustering algorithm, with the sum of the ungrouped neighbors and the smallest point as the initial cluster center, according to the cluster Center and other conventional distance distribution characteristics, the spatial distribution curve between points is fitted by the least squares principle, the inflection point value of the curve is used as the clustering slice, the cluster abstraction is broken into clusters, and the network behavior data is realized. Automatic clustering and optimization. MajorClust algorithm, k-means algorithm and unsupervised intrusion detection model of DBSCAN algorithm, based on the optimization process, use NSL-KDD dataset to analyze and compare the detection results. The experimental results show that the MajorClust algorithm has a significant advantage in terms of its intrusion detection performance and effect stability.

Key words: intrusion detection, MajorClust, NSL-KDD, inflection radius

中图分类号:

TP309

罗文华, 许彩滇. 基于改进MajorClust聚类的网络入侵行为检测[J]. 信息网络安全, 2020, 20(2): 14-21.

LUO Wenhua, XU Caidian. Network Intrusion Detection Based on Improved MajorClust Clustering[J]. Netinfo Security, 2020, 20(2): 14-21.

图/表 6

图1

表1

图2

表2

表3

表4

参考文献 20

[1]	ZHANG Ran, QIAN Depei, ZHANG Wenjie, et al.A Survey of Intrusion Detection Technology Research[J]. Mini-micro Systems, 2003, 27(7): 1113-1118.
	张然,钱德沛,张文杰,等.入侵检测技术研究综述[J]. 小型微型计算机系统,2003,27(7):1113-1118.
[2]	LIU Bailu, YANG Yahui, SHEN Qingni.Research and Implementation of Early Detection Method of Network Intrusion[J]. Computer Engineering, 2013, 39(7): 1-6.
	刘白璐,杨雅辉,沈晴霓.网络入侵早期检测方法的研究与实现[J]. 计算机工程,2013,39(7):1-6.
[3]	LIN Weichao, KE Shihwen, TSAI C F.CANN: An Intrusion Detection System Based on Combining Clustercenters and Nearest Neighbors[J]. Knowledge-based Systems, 2015, 78(5): 13-21.
[4]	WU Liyun, LI Shenglin, GAN Xusheng, et al.CVM Model of Network Abnormal Intrusion Detection Based on PLS Feature Extraction[J]. Control and Decision, 2017, 32(4): 755-758.
	吴丽云,李生林,甘旭升,等.基于PLS特征提取的网络异常入侵检测CVM模型[J]. 控制与决策,2017,32(4):755-758.
[5]	WANG Sheng, JIN Zhigang. IDS Classification Algorithm Based on Fuzzy SVM Models[EB/OL]. , 2019-7-15.
	汪生,金志刚.基于模糊SVM 模型的入侵检测分类算法[EB/OL]., 2019-7-15.
[6]	JIANG Yan, GAO Jia, CHEN Tieming.Intrusion Detection Method Based on AE-BNDNN Model[J]. Mini-micro Systems, 2019, 40(8): 1713-1717.
	江颉,高甲,陈铁明.基于AE-BNDNN模型的入侵检测方法[J]. 小型微型计算机系统,2019,40(8):1713-1717.
[7]	LI Yun, LIU Xuecheng. Research on Unsupervised Intrusion Detection Algorithm Based on Clustering[J]. Computer Applications and Software, 2014, 31(8): 307-310.
	李云,刘学诚.基于聚类的无监督式入侵检测算法研究[J]. 计算机应用与软件,2014,31(8):307-310.
[8]	ZHU Yi, ZHANG Qi.Application of Machine Learning in Network Intrusion Detection[J]. Journal of Data Acquisition and Processing, 2017, 32(3): 479-488.
	朱琨,张琪.机器学习在网络入侵检测中的应用[J]. 数据采集与处理,2017,32(3):479-488.
[9]	LUO Min, WANG Lina, ZHANG Huanguo.Intrusion Detection Method Based on Unsupervised Clustering[J]. Acta Electronica Sinica, 2003, 31(11): 1713-1716.
	罗敏,王丽娜,张焕国.基于无监督聚类的入侵检测方法[J]. 电子学报,2003,31(11):1713-1716.
[10]	STEIN B, NIGGEMANN O.On the Nature of Structure and Its Identification[C]//ETH Zvrich. The 25th International Workshop on Graph-Theoretic Concepts in Computer Science, June 17-19, 1999, Monte Verta, Switzerland. Tokyo: Springer-Verlag, 1999:122-134.
[11]	ZHANG Lei, CUI Yong, LIU Jing, et al.Application of Machine Learning in Cyberspace Security Research[J]. Chinese Journal of Computers, 2018,41(9):1943-1975.
	张蕾,崔勇,刘静,等.机器学习在网络空间安全研究中的应用[J]. 计算机学报, 2018,41(9):1943-1975.
[12]	LUO Wenhua, ZHANG Yan.Using Improved MajorClust Algorithm to Locate Abnormal User Behaviors[J]. Mini-micro Systems, 2019, 40(11): 2374-2379.
	罗文华,张艳.利用改进的MajorClust算法实现异常用户行为定位[J]. 小型微型计算机系统,2019,40(11):2374-2379.
[13]	HUDAN S, CHRISTIAN P, FERDOUS S.Graph Clustering and Anomaly Detection of Access Control log for Forensic Purposes[J]. Digital Investigation, 2017, 21(6): 76-87.
[14]	QIAO Lishan, WANG Yulan, ZENG Jinguang.Discussion on Curve Fitting Method in Experimental Data Processing[J]. Journal of Chengdu University of Technology(Science & Technology Edition), 2004, 31(1):91-95.
	乔立山,王玉兰,曾锦光.实验数据处理中曲线拟合方法探讨[J]. 成都理工大学学报(自然科学版),2004,31(1):91-95.
[15]	MAHBOD T, EBRAHIM B, WEI L, et al.A Detailed Analysis of the KDD CUP 99 Data Set[C]//IEEE. 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, July 8-10, 2009, Ottawa, ON, Canada. New York:IEEE, 2009:1-5.
[16]	DHANABAL L, SHANTHARAJAH S P.A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms[J]. International Journal of Advanced Research in Computer and Communication Engineering, 2015,4(6):446-452.
[17]	ZHANG Xueqin, GU Chunhua, LIN Jiajin.Intrusion Detection System Based on Featureselection and Support Vector Machine[C]//IEEE. 2006 First International Conference on Communications and Networking in China, October 25-27, 2006, Beijing, China. New York:IEEE, 2007:1-5.
[18]	THANH N T, KLAUDIA D, MICHAL D.Revised DBSCAN Algorithm to Cluster Data with Dense Adjacent Clusters[J]. Chemometrics and Intelligent Laboratory Systems, 2013,120(1): 92-96.
[19]	ZHOU Zhihua.Machine Learning[M]. Beijing: Tsinghua University Press, 2016.
	周志华. 机器学习[M].北京:清华大学出版社,2016.
[20]	FENG Shaorong, XIAO Wenjun.An Improved DBSCAN Clustering Algorithm[J]. Journal of China University of Mining & Technology, 2008, 31(1):105-111.
	冯少荣,肖文俊.DBSCAN聚类算法的研究与改进[J]. 中国矿业大学学报,2008,31(1):105-111.

特征序号	特征名称	描述	抽样数据
6	Dst_bytes	在单个连接中,从目标主机到源主机传输的数据字节数	0
12	Logged_in	登录状态：如果成功登录,值为1;否则为0	1
23	Count	在过去两秒内,与当前连接拥有相同目标主机的连接数	1
24	Srv_count	在过去两秒内,与当前连接拥有相同服务（端口号）的连接数	2
25	Serror_rate	已激活标志（4属性）S0、S1、S2或S3的连接数量在连接数（23属性）集合中的百分比	0
26	Srv_serror_rate	已激活标志（4属性）S0、S1、S2或S3的连接数量在服务数（24属性）集合中的百分比	0
29	Same_srv_rate	同一服务的连接数量在连接数（23属性）集合中的百分比	1
30	Diff_srv_rate	不同服务的连接数量在连接数（23属性）集合中的百分比	0
31	Srv_diff_host_rate	在服务数（24属性）集合中,不同目标主机的连接所占百分比	0
32	Dst_host_count	拥有相同目的IP地址的连接数	150
33	Dst_host_srv_count	拥有同目的端口号的连接数	25
34	Dst_host_same_srv_rate	在目的主机连接数（32属性）集合中,同一服务连接所占的百分比	0.17
35	Dst_host_diff_srv_rate	在目的主机连接数（32属性）集合中,不同服务连接所占的百分比	0.03
36	Dst_host_same_src_port_rate	在目的主机服务数（33属性）集合中,相同源端口号的连接所占的百分比	0.17
37	Dst_host_srv_diff_host_rate	在目的主机服务数（33属性）集合中,不同源端口号的连接所占的百分比	0
38	Dst_host_serror_rate	已激活标志（4属性）S0、S1、S2或S3的连接数量在目的主机连接数（32属性）集合中的百分比	0.15
39	Dst_host_srv_serror_rate	已激活标志（4属性）S0、S1、S2或S3的连接数量在目的主机服务数（33属性）集合中的百分比	0.23

预测类别真实类别	正常	异常
正常	真正常(TP)	假异常(FN)
异常	假正常(FP)	真异常(TN)

评估指标聚类算法	聚类准确度	检验率	平均查准率	误警率
改进MajorClust	79.54%/ 79.47%	89.36%/ 89.13%	80.23%/ 80.10%	28.89%/ 28.93%
优化k-means	64.15%/ 64.37%	93.11%/ 91.44%	66.65%/ 66.01%	58.68%/ 59.43%
优化DBSCAN	69.21%/ 68.71%	95.11%/ 93.75%	71.03%/ 70.31%	53.04%/ 53.13%

入侵类别测试		DoS	Probe	R2L	U2R	全部类别
第一次测试	已知入侵	97.84%	95.94%	73.21%	25.00%	87.78%
第一次测试	未知入侵	98.75%	90.16%	86.51%	40.00%	92.62%
第二次测试	已知入侵	93.35%	79.44%	79.72%	58.33%	88.16%
第二次测试	未知入侵	94.54%	81.75%	94.12%	26.67%	89.03%
第三次测试	已知入侵	90.18%	76.85%	82.09%	38.46%	86.41%
第三次测试	未知入侵	99.69%	82.52%	94.58%	21.05%	91.37%
第四次测试	已知入侵	94.06%	79.43%	79.53%	33.33%	88.48%
第四次测试	未知入侵	93.14%	81.93%	92.73%	33.33%	88.30%