基于主动学习的工业互联网入侵检测研究

doi:10.3969/j.issn.1671-1122.2021.01.010

信息网络安全 ›› 2021, Vol. 21 ›› Issue (1): 80-87.doi: 10.3969/j.issn.1671-1122.2021.01.010

基于主动学习的工业互联网入侵检测研究

沈也明¹^,², 李贝贝¹(), 刘晓洁¹, 欧阳远凯¹

1.四川大学网络空间安全学院,成都 610207
2.96795部队,银川 750000

收稿日期:2020-11-04 出版日期:2021-01-10 发布日期:2021-02-23
通讯作者: 李贝贝 E-mail:libeibei@scu.edu.cn
作者简介:沈也明（1985—）,男,四川,硕士研究生,主要研究方向为信息物理系统安全|李贝贝（1992—）,男,陕西,副教授,博士,主要研究方向为信息物理系统安全|刘晓洁（1965—）,女,江苏,教授,硕士,主要研究方向为数据保护技术、数字虚拟资产保护技术|欧阳远凯（1998—）,男,四川,硕士研究生,主要研究方向为信息物理系统安全
基金资助:
国家自然科学基金(U1736212);国家自然科学基金(U19A2068);四川省重点研发项目(2018GZ0183);四川省重点研发项目(20ZDYF3145);中国博士后科学基金(2019TQ0217);中央高校基本科研业务费(YJ201933)

Research on Active Learning-based Intrusion Detection Approach for Industrial Internet

SHEN Yeming¹^,², LI Beibei¹(), LIU Xiaojie¹, OUYANG Yuankai¹

1. College of Cyber Security, Sichuan University, Chengdu 610207, China
2. Troops 96795, Yinchuan 750000, China

Received:2020-11-04 Online:2021-01-10 Published:2021-02-23
Contact: LI Beibei E-mail:libeibei@scu.edu.cn

摘要/Abstract

摘要：

针对工业互联网结构复杂和已知攻击样本少导致的入侵检测准确率低的问题,文章提出一种基于主动学习的入侵检测系统（Active Learning-based Intrusion Detection System,ALIDS）。该系统将专家标注引入到入侵检测过程中,将主动学习查询策略与LightGBM结合,解决了训练样本稀缺情况下入侵检测系统准确率低的问题。首先从工业互联网原始网络流和载荷中提取特征,通过最近邻方法对缺失数据进行填补;再通过不确定性采样,选择最有价值的训练样本交由人工专家标注;然后将已标注样本加入训练集,同时使用贝叶斯优化对LightGBM模型进行超参数优化;最后在数据集上进行二分类和多分类实验,验证了ALIDS对入侵检测的有效性。

关键词: 工业互联网, 入侵检测, 主动学习, 不确定性采样, LightGBM

Abstract:

Aiming at the problem of low accuracy of intrusion detection caused by complex industrial Internet structure and few known attack samples, an active learning-based intrusion detection system for Industrial Internet is proposed. The system introduces expert tagging into the process of intrusion detection, combines active learning query strategy with LightGBM, and solves the problem of low accuracy of intrusion detection system when training samples are scarce. Firstly, the system extracts features from the original network flow and the payload of the Industrial Internet and fills the missing data by the nearest neighbor method. Secondly, sampling with uncertainty, the most valuable training samples are selected to be labeled by experts. Then, the labeled samples are added to the training set, and Bayesian Optimization is used to optimize the hyper parameters of the LightGBM model. Finally, the validity of the intrusion detection is verified by the binary classification and multi-classification experiments on the data set.

Key words: Industrial Internet, intrusion detection, active learning, uncertainty sampling, LightGBM

中图分类号:

TP309

沈也明, 李贝贝, 刘晓洁, 欧阳远凯. 基于主动学习的工业互联网入侵检测研究[J]. 信息网络安全, 2021, 21(1): 80-87.

SHEN Yeming, LI Beibei, LIU Xiaojie, OUYANG Yuankai. Research on Active Learning-based Intrusion Detection Approach for Industrial Internet[J]. Netinfo Security, 2021, 21(1): 80-87.

图/表 12

图1

图2

图3

表1

图4

图5

图6

图7

表2

图8

图9

表3

参考文献 18

[1]	WANG Defu, WANG Xiaojuan, ZHANG Yong, et al. Detection of Power Grid Disturbances and Cyber-attacks Based on Machine Learning[J]. Journal of Information Security and Applications, 2019,2(8), 42-52.
[2]	HUDA S, MIAH S, HASSAN M M, et al. Defending Unknown Attacks on Cyber-Physical Systems by Semi-supervised Approach and Available Unlabeled Data[J]. Information Sciences, 2016,9(41), 211-228.
[3]	HASSAN M, GUMAEI A, HUDA S, et al. Increasing the Trustworthiness in the Industrial Iot Networks Through A Reliable Cyber-attack Detection Model[J]. IEEE Transactions on Industrial Informatics, 2020,16(9):6154-6162. doi: 10.1109/TII.9424 URL
[4]	LI Guangxia, SHEN Yulong, ZHAO Peilin, et al. Detecting Cyberattacks in Industrial Control Systems Using Online Learning Algorithms[J]. Neurocomputing, 2019,7(364), 338-348.
[5]	YAO Haipeng, GAO Pengcheng, ZHANG Peiying, et al. Hybrid Intrusion Detection System for Edge-based IIoT Relying on Machine-Learning-Aided Detection[J]. IEEE Network, 2019,33(5), 75-81. doi: 10.1109/MNET.65 URL
[6]	ZHOU Zhihua, ZHANG Minling, HUANG Shengjun, et al. Multi-instance Multi-label Learning[J]. Artificial Intelligence, 2012,176(1), 2291-2320. doi: 10.1016/j.artint.2011.10.002 URL
[7]	HUISMAN M. Imputation of Missing Item Responses: Some Simple Techniques[J]. Quality and Quantity, 2000,34(4), 331-351.
[8]	ZHOU Xiaohua, ECKERT G J, TIERNEY W M. Multiple Imputation in Public Health Research[J]. Statistics in Medicine, 2001,20(10), 1541-1549.
[9]	ZAINURI N A, JEMAIN A A, MUDA N. A Comparison of Various Imputation Methods for Missing Values in Air Quality Data[J]. Sains Malaysiana, 2015,44(3), 449-456.
[10]	MALARVIZHI M R, THANAMANI A S. K-nearest Neighbor in Missing Data Imputation[J]. International Journal of Engineering Research and Development, 2012,5(1), 5-7.
[11]	LEWIS D D, CATLETT J. Heterogeneous Uncertainty Sampling for Supervised Learning[C] //Rutgers University, The Eleventh International Conference on Machine Learning, July 10-13, 1994, New Brunswick. San Francisco: Elsevier, 1994: 148-156.
[12]	SEUNG H S, OPPER M, SOMPOLINSKY H. Query by Committee[C] //ACM. The 5th Annual Workshop on Computational Learning Theory, July 27-29, 1992, Pittsburgh PA USA. New York: ACM, 1992: 287-294.
[13]	ROY N, MCCALLUM A. Toward Optimal Active Learning Through Sampling Estimation of Error Reduction[C] //ICML. The 8th International Conference on Machine Learning, June 28-July 1, 2001, Williamstown, MA, USA. San Francisco: Morgan Kaufmann, 2001: 441-448.
[14]	YAN Yifan. Multi-label Active Learning with Hierarchical Label Structure[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2019.
	颜逸凡. 面向层次化标记结构的多标记主动学习研究[D]. 南京:南京航空航天大学, 2019.
[15]	KE Guolin, MENG Qi, FINLEY T, et al. Lightgbm: A Highly Efficient Gradient Boosting Decision Tree[C] //NIPS17. The 31st International Conference on Neural Information Processing Systems, December 4-9, 2017, New York, USA. New York, Curran Associates Inc. 2017: 3146-3154.
[16]	SRINIVAS N, KRAUSE A, KAKADE S M, et al. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design[C]/ /ICML. The 27th International Conference on International Conference on Machine Learning, June 21-24, 2010, Haifa, Israel. Madison: Omnipress, 2010: 1015-1022.
[17]	GUTTORP P, GNEITING T. Studies in The History Of Probability And Statistics Xlix on the Matern Correlation Family[J]. Biometrika, 2006,93(4), 989-995.
[18]	MORRIS T, GAO Wei. Industrial Control System Traffic Data Sets for Intrusion Detection Research[J]. IFIP Advances in Information and Communication Technology, 2014,3(441), 65-78.

标签名称	值	描述	数量
Normal	0	正常数据	19503
NMRI	1	简单恶意响应注入攻击	1198
CMRI	2	复杂恶意响应注入攻击	1457
MSCI	3	恶意状态命令注入攻击	209
MPCI	4	恶意参数命令注入攻击	410
MFCI	5	恶意功能命令注入攻击	155
DoS	6	拒绝服务攻击	135
Reconnaissance	7	侦察攻击	4132

评估指标	二分类		多分类
评估指标	优化前/%	优化后/%	优化前/%	优化后/%
准确率	96.94	97.73	96.92	97.38
查准率	97.05	98.11	97.31	97.90
召回率	96.94	98.73	96.92	98.45
F1	96.97	98.41	97.07	98.17

基于主动学习的工业互联网入侵检测研究

Research on Active Learning-based Intrusion Detection Approach for Industrial Internet

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 18

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李桥, 龙春, 魏金侠, 赵静. 一种基于LMDR和CNN的混合入侵检测模型[J]. 信息网络安全, 2020, 20(9): 117-121.
[2]	余果, 王冲华, 陈雪鸿, 李俊. 认证视角下的工业互联网标识解析安全[J]. 信息网络安全, 2020, 20(9): 77-81.
[3]	徐国天. 网络入侵检测中K近邻高速匹配算法研究[J]. 信息网络安全, 2020, 20(8): 71-80.
[4]	姜楠, 崔耀辉, 王健, 吴晋超. 基于上下文特征的IDS告警日志攻击场景重建方法[J]. 信息网络安全, 2020, 20(7): 1-10.
[5]	张晓宇, 王华忠. 基于改进Border-SMOTE的不平衡数据工业控制系统入侵检测[J]. 信息网络安全, 2020, 20(7): 70-76.
[6]	彭中联, 万巍, 荆涛, 魏金侠. 基于改进CGANs的入侵检测方法研究[J]. 信息网络安全, 2020, 20(5): 47-56.
[7]	王蓉, 马春光, 武朋. 基于联邦学习和卷积神经网络的入侵检测方法[J]. 信息网络安全, 2020, 20(4): 47-54.
[8]	边玲玉, 张琳琳, 赵楷, 石飞. 基于LightGBM的以太坊恶意账户检测方法[J]. 信息网络安全, 2020, 20(4): 73-80.
[9]	罗文华, 许彩滇. 基于改进MajorClust聚类的网络入侵行为检测[J]. 信息网络安全, 2020, 20(2): 14-21.
[10]	何泾沙, 韩松, 朱娜斐, 葛加可. 基于改进V-detector算法的入侵检测研究与优化[J]. 信息网络安全, 2020, 20(12): 19-27.
[11]	王冲华, 李俊, 陈雪鸿. 工业互联网平台安全防护体系研究[J]. 信息网络安全, 2019, 19(9): 6-10.
[12]	康健, 王杰, 李正旭, 张光妲. 物联网中一种基于多种特征提取策略的入侵检测模型[J]. 信息网络安全, 2019, 19(9): 21-25.
[13]	冯文英, 郭晓博, 何原野, 薛聪. 基于前馈神经网络的入侵检测模型[J]. 信息网络安全, 2019, 19(9): 101-105.
[14]	饶绪黎, 徐彭娜, 陈志德, 许力. 基于不完全信息的深度学习网络入侵检测[J]. 信息网络安全, 2019, 19(6): 53-60.
[15]	刘敬浩, 毛思平, 付晓梅. 基于ICA算法与深度神经网络的入侵检测模型[J]. 信息网络安全, 2019, 19(3): 1-10.