信息网络安全 ›› 2021, Vol. 21 ›› Issue (1): 80-87.doi: 10.3969/j.issn.1671-1122.2021.01.010

• 技术研究 • 上一篇    下一篇

基于主动学习的工业互联网入侵检测研究

沈也明1,2, 李贝贝1(), 刘晓洁1, 欧阳远凯1   

  1. 1.四川大学网络空间安全学院,成都 610207
    2.96795部队,银川 750000
  • 收稿日期:2020-11-04 出版日期:2021-01-10 发布日期:2021-02-23
  • 通讯作者: 李贝贝 E-mail:libeibei@scu.edu.cn
  • 作者简介:沈也明(1985—),男,四川,硕士研究生,主要研究方向为信息物理系统安全|李贝贝(1992—),男,陕西,副教授,博士,主要研究方向为信息物理系统安全|刘晓洁(1965—),女,江苏,教授,硕士,主要研究方向为数据保护技术、数字虚拟资产保护技术|欧阳远凯(1998—),男,四川,硕士研究生,主要研究方向为信息物理系统安全
  • 基金资助:
    国家自然科学基金(U1736212);国家自然科学基金(U19A2068);四川省重点研发项目(2018GZ0183);四川省重点研发项目(20ZDYF3145);中国博士后科学基金(2019TQ0217);中央高校基本科研业务费(YJ201933)

Research on Active Learning-based Intrusion Detection Approach for Industrial Internet

SHEN Yeming1,2, LI Beibei1(), LIU Xiaojie1, OUYANG Yuankai1   

  1. 1. College of Cyber Security, Sichuan University, Chengdu 610207, China
    2. Troops 96795, Yinchuan 750000, China
  • Received:2020-11-04 Online:2021-01-10 Published:2021-02-23
  • Contact: LI Beibei E-mail:libeibei@scu.edu.cn

摘要:

针对工业互联网结构复杂和已知攻击样本少导致的入侵检测准确率低的问题,文章提出一种基于主动学习的入侵检测系统(Active Learning-based Intrusion Detection System,ALIDS)。该系统将专家标注引入到入侵检测过程中,将主动学习查询策略与LightGBM结合,解决了训练样本稀缺情况下入侵检测系统准确率低的问题。首先从工业互联网原始网络流和载荷中提取特征,通过最近邻方法对缺失数据进行填补;再通过不确定性采样,选择最有价值的训练样本交由人工专家标注;然后将已标注样本加入训练集,同时使用贝叶斯优化对LightGBM模型进行超参数优化;最后在数据集上进行二分类和多分类实验,验证了ALIDS对入侵检测的有效性。

关键词: 工业互联网, 入侵检测, 主动学习, 不确定性采样, LightGBM

Abstract:

Aiming at the problem of low accuracy of intrusion detection caused by complex industrial Internet structure and few known attack samples, an active learning-based intrusion detection system for Industrial Internet is proposed. The system introduces expert tagging into the process of intrusion detection, combines active learning query strategy with LightGBM, and solves the problem of low accuracy of intrusion detection system when training samples are scarce. Firstly, the system extracts features from the original network flow and the payload of the Industrial Internet and fills the missing data by the nearest neighbor method. Secondly, sampling with uncertainty, the most valuable training samples are selected to be labeled by experts. Then, the labeled samples are added to the training set, and Bayesian Optimization is used to optimize the hyper parameters of the LightGBM model. Finally, the validity of the intrusion detection is verified by the binary classification and multi-classification experiments on the data set.

Key words: Industrial Internet, intrusion detection, active learning, uncertainty sampling, LightGBM

中图分类号: