信息网络安全 ›› 2019, Vol. 19 ›› Issue (10): 50-56.doi: 10.3969/j.issn.1671-1122.2019.10.007

• 技术研究 • 上一篇    下一篇

基于数据特征学习的网络安全数据分类方法研究

刘延华1,2, 高晓玲1,2(), 朱敏琛1,2, 苏培煌1,2   

  1. 1.福州大学数学与计算机科学学院,福建福州 350108
    2.福建省网络计算与智能信息处理重点实验室,福建福州 350108
  • 收稿日期:2019-06-03 出版日期:2019-10-10 发布日期:2020-05-11
  • 通讯作者: 高晓玲 E-mail:214833246@qq.com
  • 作者简介:

    作者简介:刘延华(1972—),男,山东,副教授,博士,主要研究方向为网络信息安全及网络内容分析;高晓玲(1995—),女,福建,硕士研究生,主要研究方向为网络信息安全;朱敏琛(1961—),女,上海,教授,主要研究方向为模式识别和智能计算;苏培煌(1993—),男,福建,硕士研究生,主要研究方向为网络信息安全。

  • 基金资助:
    国家自然科学基金[61772136];福建省科技厅重点项目[2014H0024];福建省科技创新平台建设项目[2014H2005];福州大学科技项目[XRC-18007]

Research on Classification Method of Network Security Data Based on Data Feature Learning

Yanhua LIU1,2, Xiaoling GAO1,2(), Minchen ZHU1,2, Peihuang SU1,2   

  1. 1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350108, China
    2. Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou Fujian 350108, China
  • Received:2019-06-03 Online:2019-10-10 Published:2020-05-11
  • Contact: Xiaoling GAO E-mail:214833246@qq.com

摘要:

数据分类在网络安全防护与监测预警中发挥着重要作用。随着网络系统规模的扩大、网络速度的提高以及网络安全事件的增多,安全数据的数量急剧增加,极大影响了数据分类的准确性,从而给入侵检测、安全评估、攻击意图识别等安全应用带来极大挑战。文章提出一种结合SMOTE-SVM算法和XGBoost算法的数据分类模型。首先,针对数据不平衡的情况,采用过采样和下采样相结合的方法,设计一种基于SMOTE-SVM算法的数据特征平衡方法,提高了训练数据分布的合理性和训练精度。然后,针对多源异构的安全数据的多样性特点,采用独热编码技术实现数据的规范化。最后,基于XGBoost算法对数据集进行特征提取和分类。实验结果表明,该方法在数据分类查准率、召回率和综合有效性方面具有明显优势,能有效提高网络安全大数据的分析能力,对网络安全态势感知具有重要的应用意义。

关键词: 网络空间安全, 不平衡数据, SMOTE, XGBoost

Abstract:

Data classification plays an important role in cyberspace security situational awareness applications. However, with the expansion of network system scale, the increase of network speed, and the increase of network security incidents, the number of security data increases dramatically, which greatly affects the accuracy of data classification, thus bringing great challenges to security applications such as intrusion detection, security assessment and attack intention recognition. This paper proposes a data classification model integrating SMOTE-SVM algorithm and XGBoost algorithm. Firstly, in view of the data imbalance situation, by combining with up-sampling and down-sampling, a data feature balance method based on SMOTE-SVM algorithm is designed to improve the rationality of training data distribution and training accuracy. Then, in view of the diversity of multi-source heterogeneous security data, single-hot coding technology is used to standardize the data. Finally, based on XGBoost algorithm, feature extraction and classification of data sets are carried out. Experimental results show that the proposed method has obvious advantages in data classification accuracy, recall rate and comprehensive effectiveness. It can effectively improve the analysis ability of large data of network security, and has important application significance for network security situational awareness.

Key words: cyberspace security, imbalanced data, SMOTE, XGBoost

中图分类号: