The Intrusion Detection Method of SMOTE Algorithm with Maximum Dissimilarity Coefficient Density

doi:10.3969/j.issn.1671-1122.2019.03.008

Abstract

Abstract:

Intrusion detection method based on machine learning is applied in imbalanced intrusion datasets, mostly focused on enhancing the overall detection rate and reduce the overall failure rate, but the detection rates of minority classes are low, a good classification performance of the minority classes in practical application is also important. Therefore, an intrusion detection method for the SMOTE based on the maximum dissimilarity coefficient density algorithm with DBN (Deep Belief Network) and GBDT (Gradient Boosting Decision Tree) is proposed. Its core idea: in the data preprocessing stage, the SMOTE algorithm based on the maximum dissimilarity coefficient density is applied for data oversampling, and Deep Belief Network is used for feature extraction. In this way, improving the number of minority samples, and increasing the number of samples while reducing the number of sample dimensions, then training GBDT classifier on the balanced datasets, and the experimental verification is carried out by using the NSLKDD datasets. Experimental results show that ,while the proposed method maintains a high overall detection rate, the effect of minority detection is improved significantly, which improves the detection ability of intrusion detection for minority attack.

Key words: intrusion detection, maximum dissimilarity coefficient, density, SMOTE algorithm, DBN, GBDT

CLC Number:

TP309

Hong CHEN, Yue XIAO, Chenglong XIAO, Jianhu CHEN. The Intrusion Detection Method of SMOTE Algorithm with Maximum Dissimilarity Coefficient Density[J]. Netinfo Security, 2019, 19(3): 61-71.

Figures/Tables 13

References 28

[1]	CHENG Dongmei, YAN Biao, WEN Hui, et al.The Design and Implement of Rule Matching-based Distributed Intrusion Detection Framework for Industry Control System[J]. Netinfo Security, 2017, 17(7): 45-51.
	程东梅,严彪,文辉,等.基于规则匹配的分布式工控入侵检测系统设计与实现[J].信息网络安全,2017,17(7):45-51.
[2]	HUO Yudan, GU Qiong, CAI Zhihua, et al.Classification Method of Imbalance Dataset Based on Genetic Algorithm Improved Synthetic Minority Over-sampling Technique[J]. Journal of Computer Application, 2015, 35(1): 121-124,139.
	霍玉丹,谷穷,蔡之华,等.基于遗传算法改进的少数类样本合成过采样技术的非平衡数据集分类算法[J].计算机应用,2015,35(1):121-124,139.
[3]	XUE Limin, LI Zhong, LAN Wanwan.Research on Network Security Situation Prediction Technique Based on Online Learning RBFNN[J]. Netinfo Security, 2016, 16(4): 23-30.
	薛丽敏,李忠,蓝湾湾.基于在线学习RBFNN的网络安全态势预测技术研究[J].信息网络安全,2016,16(4):23-30.
[4]	TANG Chenghua, LIU Pengcheng, TANG Shensheng.Anomaly Intrusion Behavior Detection Based on Fuzzy Clustering and Features Selection[J]. Journal of Compute Research and Development, 2015, 52(3): 718-728.
	唐成华,刘鹏程,汤申生.基于特征选择的模糊聚类异常入侵行为检测[J].计算机研究与发展,2015,52(3):718-728.
[5]	HE Xiang, LIU Sheng, JIANG Jiguo.Comparative Study of Intrusion Detection Method Based on Machine Learning[J]. Netinfo Security, 2018, 18(5): 1-11.
	和湘,刘晟,姜吉国.基于机器学习的入侵检测方法对比研究[J].信息网络安全,2018,18(5):1-11.
[6]	SUN Yanmin, KAMEL M S, WONG A K C, et al. Cost-Sensitive Boosting for Classification of Imbalanced Data[J]. Pattern Recognition, 2007, 40(12): 3358-3378.
[7]	ESTABROOKS A, JO T, JAPKOWICZ N.A Multiple Resampling Method for Learning from Imbalanced Datasets[J]. Computational Intelligence, 2004, 20(1): 18-36.
[8]	ZHAI Yun, WANG Shupeng, MA Nan, et al.A Data Mining Method for Imbalanced Datasets Based on One-sided Link and Distribution Density of Instance[J]. Acta Electronica Sinica, 2014, 42(7): 1311-1319.
	翟云,王树鹏,马楠,等.基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法[J].电子学报,2014,42(7):1311-1319.
[9]	GU Xiaoqing, JIANG Yizhang, WANG Shitong.Zero-order TSK-type Fuzzy System for Imbalanced Data Classification[J]. Acta Automation Sinica, 2017, 43(10): 1773-1788.
	顾晓清,蒋亦樟,王士同.用于不平衡数据分类的0阶TSK型模糊系统[J].自动化学报,2017,43(10):1773-1788.
[10]	IMAM T, TING K M, KAMRUZZAMAN J. z-SVM: An SVM for Improved Classification of Imbalanced Data [EB/OL]. https://link.springer.com/chapter/10.1007/11941439_30, 2017-5-1.
[11]	WANG Junhong, DUAN Bingqian.Research on the SMOTE Method Based on Density[J]. CAAI Transactions on Intelligent System, 2017, 12(6): 865-872.
	王俊红,段冰倩.一种基于密度的SMOTE方法研究[J].智能系统学报,2017,12(6):865-872.
[12]	LOU Xiaojun, SUN Yuxuan, LIU Haitao.Clustering Boundary Over-sampling Classification Method for Imbalanced Data Sets[J]. Journal of Zhejiang University(Engineering Science), 2013, 47(6): 944-950.
	楼晓俊,孙雨轩,刘海涛.聚类边界过采样不平衡数据分类方法[J].浙江大学学报(工学版),2013,47(6): 944-950.
[13]	ZHONG Dunhao, ZHANG Dongmei, ZHANG Yu.A Method of Intrusion Detection in Wireless Sensor Network Based on Similarity Algorithm[J]. Netinfo Security, 2016, 16(2): 22-27.
	钟敦昊,张冬梅,张玉.一种基于相似度计算的无线传感器网络入侵检测方法[J].信息网络安全,2016,16(2):22-27.
[14]	CHAWLA N V, BOWYER K W, HALL L O, et al.SMOTE: Synthetic Minority Over-sampling Technique[J]. Journal of Artificial Intelligence Research, 2011, 16(1) : 321-357.
[15]	EZ J, KRAWCZYK B, NIAK M.Analyzing the Oversampling of Different Classes and Types of Examples in Multi-class Imbalanced Datasets[J]. Pattern Recognition, 2016, 57(C): 164-178.
[16]	HAN H, WANG W Y, MAO B H. Borderline-SMOTE: A New Over-sampling Method in Imbalanced Datasets Learning [EB/OL]. https://link.springer.com/chapter/10.1007%2F11538059_91, 2017-5-1.
[17]	XIA Yuming, HU Shaoyong, ZHU Shaomin, et al.Research on the Method of Network Attrack Detection Based on Convolution Neural Network[J]. Netinfo Security, 2017, 17(11): 32-36.
	夏玉明,胡绍勇,朱少民,等.基于卷积神经网络的网络攻击检测方法研究[J].信息网络安全,2017,17(11):32-36.
[18]	HINTON G E, SALAKHUTDINOV R R, Reducing the Dimensionality of Data with Neutral Networks[J]. Science, 2006, 313(5786): 504-507.
[19]	DONG Y, LI D.Deep Learning and Its Applications to Signal and Information Processing[J]. IEEE Signal Processing Magazine, 2011, 28(1): 145-154.
[20]	AREL I,ROSE D C,KARNOWSKI T P.Deep Machine Learning—A New Frontier in Artificial Intelligent Research[J]. IEEE Computational Intelligent Magazine, 2010, 5(4): 13-18.
[21]	CHEN Hong, WAN Guangxue, XIAO Zhenjiu.Instrusion Detection Method of Deep Belief Network Model Based on Optimization of Data Processing[J]. Journal of Computer Application, 2017, 37(6): 1636-1643.
	陈虹,万广雪,肖振久.基于优化数据处理的深度信念网络模型的入侵检测方法[J].计算机应用,2017,37(6):1636-1643.
[22]	FRIEDMAN J H.Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2000, 29(5): 1189-1232.
[23]	ZHANG Chongsheng, PENG Guowen, YU Keke.Facial Points Detection Based on GBDT and HOG[J]. Journal of Henan University(Natural Science), 2018, 48(2): 214-222.
	张重生,彭国雯,于珂珂.基于GBDT和HOG特征的人脸关键点定位[J].河南大学学报(自然科学版),2018,48(2):214-222.
[24]	ZHANG Yu, LIU Yudong, JI Zhao.Vector Similarity Measurement Method[J]. Journal of Acoustic Technique, 2009, 28(4): 532-536.
	张宇,刘雨东,计钊.向量相似度测度方法[J].声学技术,2009,28(4):532-536.
[25]	DHANABAL L, SHANTHARAJAH S P.A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms[J]. International Journal of Advanced Research in Computer and Communication Engineering, 2015, 4(6): 446-452.
[26]	LI Xiongfei, LI Jun, DONG Yuanfang, et al.A New Learning Algorithm for Imbalanced Data—PCBoost[J].Chinese Journal of Computers, 2012, 35(2): 202-209.
	李雄飞,李军,董元方,等.一种新的不平衡数据学习算法PCBoost[J].计算机学报,2012,35(2):202-209.
[27]	NAGANJANEYULU S, KUPPA M R.A Novel Framework for Class Imbalance Learning Using Intelligent Under-sampling[J]. Progress in Artificial Intelligence, 2013, 2(1): 73-84.
[28]	JIANG K, LU J, XIA K.A Novel Algorithm for Imbalance Data Classification Based on Genetic Algorithm Improved SMOTE[J]. Arabian Journal for Science and Engineering, 2016, 41(8): 3255-3266.

类别	攻击类型
Normal	normal
DoS	back,land,neptune,pod,smurf,teardrop,apache2,updstorm, processtable,worm
Probe	satan,ipsweep,nmap,portsweep,mscan,saint
R2L	guess_password,ftp_write,imap,phf,multihop,warezmaster, warezclient,spy,xlock,xsnoop,snmpguess,snmpgetattack, httptunnel,sendmail,named
U2R	buffer_overflow,loadmodule,rootkit,perl,sqlattack,xtem,ps

类别	攻击子类型	KDDTrain+_20Percent	KDDTest-21
Normal	normal	13449	2152
Probe	ipsweep	710	141
	mscan	0	996
	nmap	301	73
	portsweep	587	156
	saint	0	309
	satan	691	727
DoS	apache2	0	737
	back	196	359
	land	1	7
	mailbomb	0	0
	neptune	8282	1579
	pod	38	41
	processtable	0	685
	smurf	529	627
	teardrop	188	12
	udpstorm	0	2
U2R	buffer_overflow	6	20
	httptunnel	0	133
	loadmodule	1	2
	perl	0	2
	ps	0	15
	rootkit	4	13
	sqlattack	0	2
	xterm	0	13
R2L	ftp_write	1	3
	guess_passwd	10	1231
	imap	5	1
	multihop	2	18
	named	0	17
	phf	2	2
	sendmail	0	14
	snmpgetattack	0	178
	snmpguess	0	331
	spy	1	0
	warezclient	181	0
	warezmaster	7	944
	worm	0	2
	xlock	0	9
	xsnoop	0	4
数据集样本总数		25192	11850

	预测正类	预测负类
实际正类	TP	FN
实际负类	FP	TN

邻域半径	Precision	Recall	F₁	CE	MA	PR
0.3ε	86.87%	49.06%	62.71%	47.75%	50.94%	33.41%
0.5ε	87.35%	53.45%	66.32%	44.43%	46.55%	34.89%
0.7ε	88.05%	52.80%	66.02%	44.49%	47.19%	32.29%
ε	95.00%	53.65%	68.57%	40.24%	46.35%	12.73%
1.3ε	90.16%	49.16%	63.63%	47.53%	50.83%	32.62%
1.5ε	86.93%	48.58%	62.32%	48.06%	51.42%	32.90%
1.7ε	85.97%	41.78%	56.23%	53.22%	58.21%	30.71%

分类方法	Precision	Recall	F₁	CE	MA	PR
改进SMOTE+ DBN+GBDT	98.99%	99.23%	99.11%	0.9914%	0.7728%	0.9672%
DBN+GBDT	98.88%	98.03%	98.45%	1.430%	1.962%	1.2648%