信息网络安全 ›› 2022, Vol. 22 ›› Issue (8): 81-89.doi: 10.3969/j.issn.1671-1122.2022.08.010

• 理论研究 • 上一篇    下一篇

基于GAN-Cross的工控系统类不平衡数据异常检测

顾兆军1,2, 刘婷婷1,2, 高冰1,2, 隋翯3()   

  1. 1.中国民航大学信息安全测评中心,天津 300300
    2.中国民航大学计算机科学与技术学院,天津 300300
    3.中国民航大学航空工程学院,天津 300300
  • 收稿日期:2021-05-26 出版日期:2022-08-10 发布日期:2022-09-15
  • 通讯作者: 隋翯 E-mail:hsui@cauc.edu.cn
  • 作者简介:顾兆军(1966—)男,山东,教授,博士,主要研究方向为网络与信息安全、民航信息系统|刘婷婷(1996—),女,宁夏,硕士研究生,主要研究方向为工业控制系统网络与信息安全|高冰(1995—),男,河南,硕士研究生,主要研究方向为工业控制系统网络与信息安全|隋翯(1987—),男,吉林,讲师,博士,主要研究方向为工业控制系统网络与信息安全。
  • 基金资助:
    国家自然科学基金(61601467);民航安全能力建设基金(PESA2019073);民航安全能力建设基金(PESA2019074);民航安全能力建设基金(PESA2020100)

Anomaly Detection of Imbalanced Data in Industrial Control System Based on GAN-Cross

GU Zhaojun1,2, LIU Tingting1,2, GAO Bing1,2, SUI He3()   

  1. 1. Information Security Evaluation Center, Civil Aviation University of China, Tianjin 300300, China
    2. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
    3. College of Aeronautical Engineering, Civil Aviation University of China, Tianjin 300300, China
  • Received:2021-05-26 Online:2022-08-10 Published:2022-09-15
  • Contact: SUI He E-mail:hsui@cauc.edu.cn

摘要:

工业控制系统异常检测存在类不平衡问题,导致通用分类器很难实现异常数据的精准识别。目前,针对类不平衡数据,常用采样方法实现各类数据的平衡,以提高分类器性能。但传统采样方法对数据集特征敏感,采样效果稳定性差,异常检测精度波动大。文章基于生成式对抗网络(Generative Adversarial Network,GAN),提出一种GAN-Cross采样模型,该模型可以学习目标数据的概率分布,并生成相似概率分布的数据,从而改善数据的平衡性。同时,文章在生成器和判别器中增加了交叉层,从而更好地实现特征提取。最后文章将该模型与随机森林、K-近邻、高斯朴素贝叶斯和支持向量机4种经典分类器进行组合,在4个公开类不平衡数据集上与其他4种常规采样方法进行比较。实验结果表明,与传统采样方法相比,该模型能够显著提高分类器对类不平衡数据的异常检测能力。

关键词: 工业控制系统, 类不平衡数据, 生成式对抗网络, 采样方法, 异常检测

Abstract:

Industrial control system anomaly detection has a class imbalance problem, which makes it difficult for general classifiers to accurately identify abnormal data. At present, for class imbalanced data, sampling methods are commonly used to achieve the balance of various types of data to improve the performance of the classifier. However, traditional sampling methods are sensitive to the characteristics of the data set, resulting in poor stability of the sampling effect and fluctuations in the accuracy of anomaly detection. Based on the generative adversarial network(GAN), this paper proposed a GAN-Cross sampling model. The model could learn the probability distribution of the target data and generate data with similar probability distributions, so as to achieve the sampling effect. At the same time, in order to achieve better feature extraction, this paper applied a cross layer in the generator and discriminator. Finally, the model was combined with four classic classifiers: random forest, K-nearest neighbor, Gaussian Naive Bayes, and support vector machine, and compared with other four conventional sampling methods on four public imbalanced data sets. Experimental results show that compared with traditional sampling methods, this model can significantly improve the anomaly detection performance of the classifier on imbalanced data.

Key words: industrial control system, imbalanced data, generative adversarial network, sampling method, anomaly detection

中图分类号: