信息网络安全 ›› 2019, Vol. 19 ›› Issue (3): 61-71.doi: 10.3969/j.issn.1671-1122.2019.03.008
Hong CHEN, Yue XIAO(), Chenglong XIAO, Jianhu CHEN
基于机器学习的入侵检测方法应用于非平衡入侵数据集时,大多专注于提升整体检测率与降低整体漏报率,但少数类的检测率较低,在实际应用中良好的少数类分类性能同样具有重要意义。因此,文章提出一种基于最大相异系数密度的SMOTE(Synthetic Minority Oversampling Technique)算法与深度信念网络(DBN)和梯度提升决策树(GBDT)的入侵检测方法。其核心思想为:在数据预处理阶段,应用基于最大相异系数密度的SMOTE算法进行数据过采样及深度信念网络进行特征提取,提高少数类样本数量同时降低样本维数;在生成的平衡数据集上,训练梯度提升决策树分类器,并利用NSLKDD数据集进行了实验验证。实验结果表明,所提方法在保持较高的整体检测率的同时,少数类检测效果提升明显,提升了入侵检测方法对于少数类攻击的检测能力。
陈虹, 肖越, 肖成龙, 陈建虎. 融合最大相异系数密度的SMOTE算法的入侵检测方法[J]. 信息网络安全, 2019, 19(3): 61-71.
Hong CHEN, Yue XIAO, Chenglong XIAO, Jianhu CHEN. The Intrusion Detection Method of SMOTE Algorithm with Maximum Dissimilarity Coefficient Density[J]. Netinfo Security, 2019, 19(3): 61-71.
类别 | 特征 |
nominal | protocol_type(2),server_type(3),flag(4) |
numeric | duration(1),src_bytes(5),dst_bytes(6),land(7),wrong_fragment(8),urgent(9),hot(10),num_failed_logins(11),logged_in(12),num_compromised(13),root_shell(14),su_attempted(15),num_root(16),num_file_creations(17),num_shells(18),num_access_files(19),num_outbound_cmds(20),is_host_login(21),is_guest_login(22),count(23),srv_count(24),serror_rate(25),srv_serror_rate(26),rerror_rate(27),srv_rerror_rate(28),same_srv_rate(29),diff_srv_rate(30),srv_diff_host_rate(31),dst_host_count(32),dst_host_srv_count(33),dst_host_same_srv_rate(34),dst_host_diff_srv_rate(35),dst_host_same_src_port_rate(36),dst_host_srv_diff_host_rate(37),st_host_serror_rate(38),dst_host_srv_serror_rate(39),dst_host_rerror_rate(40),dst_host_srv_rerror_rate(41) |
类别 | 攻击类型 |
Normal | normal |
DoS | back,land,neptune,pod,smurf,teardrop,apache2,updstorm, processtable,worm |
Probe | satan,ipsweep,nmap,portsweep,mscan,saint |
R2L | guess_password,ftp_write,imap,phf,multihop,warezmaster, warezclient,spy,xlock,xsnoop,snmpguess,snmpgetattack, httptunnel,sendmail,named |
U2R | buffer_overflow,loadmodule,rootkit,perl,sqlattack,xtem,ps |
类别 | 攻击子类型 | KDDTrain+_20Percent | KDDTest-21 |
Normal | normal | 13449 | 2152 |
Probe | ipsweep | 710 | 141 |
mscan | 0 | 996 | |
nmap | 301 | 73 | |
portsweep | 587 | 156 | |
saint | 0 | 309 | |
satan | 691 | 727 | |
DoS | apache2 | 0 | 737 |
back | 196 | 359 | |
land | 1 | 7 | |
mailbomb | 0 | 0 | |
neptune | 8282 | 1579 | |
pod | 38 | 41 | |
processtable | 0 | 685 | |
smurf | 529 | 627 | |
teardrop | 188 | 12 | |
udpstorm | 0 | 2 | |
U2R | buffer_overflow | 6 | 20 |
httptunnel | 0 | 133 | |
loadmodule | 1 | 2 | |
perl | 0 | 2 | |
ps | 0 | 15 | |
rootkit | 4 | 13 | |
sqlattack | 0 | 2 | |
xterm | 0 | 13 | |
R2L | ftp_write | 1 | 3 |
guess_passwd | 10 | 1231 | |
imap | 5 | 1 | |
multihop | 2 | 18 | |
named | 0 | 17 | |
phf | 2 | 2 | |
sendmail | 0 | 14 | |
snmpgetattack | 0 | 178 | |
snmpguess | 0 | 331 | |
spy | 1 | 0 | |
warezclient | 181 | 0 | |
warezmaster | 7 | 944 | |
worm | 0 | 2 | |
xlock | 0 | 9 | |
xsnoop | 0 | 4 | |
数据集样本总数 | 25192 | 11850 |
邻域半径 | Precision | Recall | F1 | CE | MA | PR |
0.3ε | 86.87% | 49.06% | 62.71% | 47.75% | 50.94% | 33.41% |
0.5ε | 87.35% | 53.45% | 66.32% | 44.43% | 46.55% | 34.89% |
0.7ε | 88.05% | 52.80% | 66.02% | 44.49% | 47.19% | 32.29% |
ε | 95.00% | 53.65% | 68.57% | 40.24% | 46.35% | 12.73% |
1.3ε | 90.16% | 49.16% | 63.63% | 47.53% | 50.83% | 32.62% |
1.5ε | 86.93% | 48.58% | 62.32% | 48.06% | 51.42% | 32.90% |
1.7ε | 85.97% | 41.78% | 56.23% | 53.22% | 58.21% | 30.71% |
分类方法 | Normal | DoS | Probe | R2L | U2R |
改进SMOTE+ DBN+GBDT | 89.27% | 65.34% | 76.03% | 27.95% | 12.03% |
DS-SMOTE+ DBN+GBDT | 87.63% | 66.99% | 58.15% | 9.330% | 8.00% |
SVM | 91.21% | 56.77% | 47.83% | 7.800% | 0.0% |
DBN+SVM | 89.03% | 54.14% | 60.28% | 6.310% | 0.0% |
GBDT | 85.22% | 71.46% | 56.91% | 7.610% | 0.50% |
DBN+BP | 87.44% | 57.11% | 67.77% | 10.67% | 2.010% |
DBN+GBDT | 89.10% | 67.27% | 64.82% | 12.38% | 2.140% |
分类方法 | Precision | Recall | F1 | CE | MA | PR |
改进SMOTE+ DBN+GBDT | 95.00% | 53.65% | 68.57% | 40.24% | 36.35% | 10.23% |
DS-SMOTE+ DBN+GBDT | 94.50% | 47.21% | 62.97% | 45.44% | 52.78% | 12.36% |
SVM | 94.13% | 39.48% | 55.63% | 51.12% | 60.51% | 18.78% |
DBN+SVM | 93.39% | 39.96% | 55.97% | 52.30% | 59.03% | 19.25% |
GBDT | 91.38% | 46.32% | 61.47% | 48.61% | 53.68% | 14.77% |
DBN+BP | 90.92% | 55.43% | 62.12% | 49.43% | 53.10% | 22.90% |
DBN+GBDT | 93.92% | 45.43% | 61.23% | 47.75% | 54.57% | 10.97% |
[1] | CHENG Dongmei, YAN Biao, WEN Hui, et al.The Design and Implement of Rule Matching-based Distributed Intrusion Detection Framework for Industry Control System[J]. Netinfo Security, 2017, 17(7): 45-51. |
程东梅,严彪,文辉,等.基于规则匹配的分布式工控入侵检测系统设计与实现[J].信息网络安全,2017,17(7):45-51. | |
[2] | HUO Yudan, GU Qiong, CAI Zhihua, et al.Classification Method of Imbalance Dataset Based on Genetic Algorithm Improved Synthetic Minority Over-sampling Technique[J]. Journal of Computer Application, 2015, 35(1): 121-124,139. |
霍玉丹,谷穷,蔡之华,等.基于遗传算法改进的少数类样本合成过采样技术的非平衡数据集分类算法[J].计算机应用,2015,35(1):121-124,139. | |
[3] | XUE Limin, LI Zhong, LAN Wanwan.Research on Network Security Situation Prediction Technique Based on Online Learning RBFNN[J]. Netinfo Security, 2016, 16(4): 23-30. |
薛丽敏,李忠,蓝湾湾.基于在线学习RBFNN的网络安全态势预测技术研究[J].信息网络安全,2016,16(4):23-30. | |
[4] | TANG Chenghua, LIU Pengcheng, TANG Shensheng.Anomaly Intrusion Behavior Detection Based on Fuzzy Clustering and Features Selection[J]. Journal of Compute Research and Development, 2015, 52(3): 718-728. |
唐成华,刘鹏程,汤申生.基于特征选择的模糊聚类异常入侵行为检测[J].计算机研究与发展,2015,52(3):718-728. | |
[5] | HE Xiang, LIU Sheng, JIANG Jiguo.Comparative Study of Intrusion Detection Method Based on Machine Learning[J]. Netinfo Security, 2018, 18(5): 1-11. |
和湘,刘晟,姜吉国.基于机器学习的入侵检测方法对比研究[J].信息网络安全,2018,18(5):1-11. | |
[6] | SUN Yanmin, KAMEL M S, WONG A K C, et al. Cost-Sensitive Boosting for Classification of Imbalanced Data[J]. Pattern Recognition, 2007, 40(12): 3358-3378. |
[7] | ESTABROOKS A, JO T, JAPKOWICZ N.A Multiple Resampling Method for Learning from Imbalanced Datasets[J]. Computational Intelligence, 2004, 20(1): 18-36. |
[8] | ZHAI Yun, WANG Shupeng, MA Nan, et al.A Data Mining Method for Imbalanced Datasets Based on One-sided Link and Distribution Density of Instance[J]. Acta Electronica Sinica, 2014, 42(7): 1311-1319. |
翟云,王树鹏,马楠,等.基于单边选择链和样本分布密度融合机制的非平衡数据挖掘方法[J].电子学报,2014,42(7):1311-1319. | |
[9] | GU Xiaoqing, JIANG Yizhang, WANG Shitong.Zero-order TSK-type Fuzzy System for Imbalanced Data Classification[J]. Acta Automation Sinica, 2017, 43(10): 1773-1788. |
顾晓清,蒋亦樟,王士同.用于不平衡数据分类的0阶TSK型模糊系统[J].自动化学报,2017,43(10):1773-1788. | |
[10] | IMAM T, TING K M, KAMRUZZAMAN J. z-SVM: An SVM for Improved Classification of Imbalanced Data [EB/OL]., 2017-5-1. |
[11] | WANG Junhong, DUAN Bingqian.Research on the SMOTE Method Based on Density[J]. CAAI Transactions on Intelligent System, 2017, 12(6): 865-872. |
王俊红,段冰倩.一种基于密度的SMOTE方法研究[J].智能系统学报,2017,12(6):865-872. | |
[12] | LOU Xiaojun, SUN Yuxuan, LIU Haitao.Clustering Boundary Over-sampling Classification Method for Imbalanced Data Sets[J]. Journal of Zhejiang University(Engineering Science), 2013, 47(6): 944-950. |
楼晓俊,孙雨轩,刘海涛.聚类边界过采样不平衡数据分类方法[J].浙江大学学报(工学版),2013,47(6): 944-950. | |
[13] | ZHONG Dunhao, ZHANG Dongmei, ZHANG Yu.A Method of Intrusion Detection in Wireless Sensor Network Based on Similarity Algorithm[J]. Netinfo Security, 2016, 16(2): 22-27. |
钟敦昊,张冬梅,张玉.一种基于相似度计算的无线传感器网络入侵检测方法[J].信息网络安全,2016,16(2):22-27. | |
[14] | CHAWLA N V, BOWYER K W, HALL L O, et al.SMOTE: Synthetic Minority Over-sampling Technique[J]. Journal of Artificial Intelligence Research, 2011, 16(1) : 321-357. |
[15] | EZ J, KRAWCZYK B, NIAK M.Analyzing the Oversampling of Different Classes and Types of Examples in Multi-class Imbalanced Datasets[J]. Pattern Recognition, 2016, 57(C): 164-178. |
[16] | HAN H, WANG W Y, MAO B H. Borderline-SMOTE: A New Over-sampling Method in Imbalanced Datasets Learning [EB/OL]., 2017-5-1. |
[17] | XIA Yuming, HU Shaoyong, ZHU Shaomin, et al.Research on the Method of Network Attrack Detection Based on Convolution Neural Network[J]. Netinfo Security, 2017, 17(11): 32-36. |
夏玉明,胡绍勇,朱少民,等.基于卷积神经网络的网络攻击检测方法研究[J].信息网络安全,2017,17(11):32-36. | |
[18] | HINTON G E, SALAKHUTDINOV R R, Reducing the Dimensionality of Data with Neutral Networks[J]. Science, 2006, 313(5786): 504-507. |
[19] | DONG Y, LI D.Deep Learning and Its Applications to Signal and Information Processing[J]. IEEE Signal Processing Magazine, 2011, 28(1): 145-154. |
[20] | AREL I,ROSE D C,KARNOWSKI T P.Deep Machine Learning—A New Frontier in Artificial Intelligent Research[J]. IEEE Computational Intelligent Magazine, 2010, 5(4): 13-18. |
[21] | CHEN Hong, WAN Guangxue, XIAO Zhenjiu.Instrusion Detection Method of Deep Belief Network Model Based on Optimization of Data Processing[J]. Journal of Computer Application, 2017, 37(6): 1636-1643. |
陈虹,万广雪,肖振久.基于优化数据处理的深度信念网络模型的入侵检测方法[J].计算机应用,2017,37(6):1636-1643. | |
[22] | FRIEDMAN J H.Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2000, 29(5): 1189-1232. |
[23] | ZHANG Chongsheng, PENG Guowen, YU Keke.Facial Points Detection Based on GBDT and HOG[J]. Journal of Henan University(Natural Science), 2018, 48(2): 214-222. |
张重生,彭国雯,于珂珂.基于GBDT和HOG特征的人脸关键点定位[J].河南大学学报(自然科学版),2018,48(2):214-222. | |
[24] | ZHANG Yu, LIU Yudong, JI Zhao.Vector Similarity Measurement Method[J]. Journal of Acoustic Technique, 2009, 28(4): 532-536. |
张宇,刘雨东,计钊.向量相似度测度方法[J].声学技术,2009,28(4):532-536. | |
[25] | DHANABAL L, SHANTHARAJAH S P.A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms[J]. International Journal of Advanced Research in Computer and Communication Engineering, 2015, 4(6): 446-452. |
[26] | LI Xiongfei, LI Jun, DONG Yuanfang, et al.A New Learning Algorithm for Imbalanced Data—PCBoost[J].Chinese Journal of Computers, 2012, 35(2): 202-209. |
李雄飞,李军,董元方,等.一种新的不平衡数据学习算法PCBoost[J].计算机学报,2012,35(2):202-209. | |
[27] | NAGANJANEYULU S, KUPPA M R.A Novel Framework for Class Imbalance Learning Using Intelligent Under-sampling[J]. Progress in Artificial Intelligence, 2013, 2(1): 73-84. |
[28] | JIANG K, LU J, XIA K.A Novel Algorithm for Imbalance Data Classification Based on Genetic Algorithm Improved SMOTE[J]. Arabian Journal for Science and Engineering, 2016, 41(8): 3255-3266. |
[1] | 王蓉, 马春光, 武朋. 基于联邦学习和卷积神经网络的入侵检测方法[J]. 信息网络安全, 2020, 20(4): 47-54. |
[2] | 罗文华, 许彩滇. 基于改进MajorClust聚类的网络入侵行为检测[J]. 信息网络安全, 2020, 20(2): 14-21. |
[3] | 康健, 王杰, 李正旭, 张光妲. 物联网中一种基于多种特征提取策略的入侵检测模型[J]. 信息网络安全, 2019, 19(9): 21-25. |
[4] | 冯文英, 郭晓博, 何原野, 薛聪. 基于前馈神经网络的入侵检测模型[J]. 信息网络安全, 2019, 19(9): 101-105. |
[5] | 饶绪黎, 徐彭娜, 陈志德, 许力. 基于不完全信息的深度学习网络入侵检测[J]. 信息网络安全, 2019, 19(6): 53-60. |
[6] | 刘敬浩, 毛思平, 付晓梅. 基于ICA算法与深度神经网络的入侵检测模型[J]. 信息网络安全, 2019, 19(3): 1-10. |
[7] | 田峥, 李树, 孙毅臻, 黎曦. 一种面向S7协议的工控系统入侵检测模型[J]. 信息网络安全, 2019, 19(11): 8-13. |
[8] | 张阳, 姚原岗. 基于Xgboost算法的网络入侵检测研究[J]. 信息网络安全, 2018, 18(9): 102-105. |
[9] | 张戈琳, 李勇. 非负矩阵分解算法优化及其在入侵检测中的应用[J]. 信息网络安全, 2018, 18(8): 73-78. |
[10] | 魏书宁, 陈幸如, 焦永, 王进. AR-OSELM算法在网络入侵检测中的应用研究[J]. 信息网络安全, 2018, 18(6): 1-6. |
[11] | 和湘, 刘晟, 姜吉国. 基于机器学习的入侵检测方法对比研究[J]. 信息网络安全, 2018, 18(5): 1-11. |
[12] | 刘超玲, 张棪, 杨慧然, 吴宏晶. 基于DPDK的虚拟化网络入侵防御系统设计与实现[J]. 信息网络安全, 2018, 18(5): 41-51. |
[13] | 陈红松, 王钢, 宋建林. 基于云计算入侵检测数据集的内网用户异常行为分类算法研究[J]. 信息网络安全, 2018, 18(3): 1-7. |
[14] | 翟继强, 肖亚军, 杨海陆, 王健. 改进的人工蜂群结合优化的随机森林的U2R攻击检测研究[J]. 信息网络安全, 2018, 18(12): 38-45. |
[15] | 赵旭, 黄光球, 崔艳鹏, 王明明. 基于改进选择算子的NIDS多媒体包多线程择危处理模型[J]. 信息网络安全, 2018, 18(10): 45-50. |
阅读次数 | ||||||
全文 |
摘要 |