Netinfo Security ›› 2019, Vol. 19 ›› Issue (10): 50-56.doi: 10.3969/j.issn.1671-1122.2019.10.007

Previous Articles     Next Articles

Research on Classification Method of Network Security Data Based on Data Feature Learning

Yanhua LIU1,2, Xiaoling GAO1,2(), Minchen ZHU1,2, Peihuang SU1,2   

  1. 1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350108, China
    2. Fujian Provincial Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou Fujian 350108, China
  • Received:2019-06-03 Online:2019-10-10 Published:2020-05-11
  • Contact: Xiaoling GAO E-mail:214833246@qq.com

Abstract:

Data classification plays an important role in cyberspace security situational awareness applications. However, with the expansion of network system scale, the increase of network speed, and the increase of network security incidents, the number of security data increases dramatically, which greatly affects the accuracy of data classification, thus bringing great challenges to security applications such as intrusion detection, security assessment and attack intention recognition. This paper proposes a data classification model integrating SMOTE-SVM algorithm and XGBoost algorithm. Firstly, in view of the data imbalance situation, by combining with up-sampling and down-sampling, a data feature balance method based on SMOTE-SVM algorithm is designed to improve the rationality of training data distribution and training accuracy. Then, in view of the diversity of multi-source heterogeneous security data, single-hot coding technology is used to standardize the data. Finally, based on XGBoost algorithm, feature extraction and classification of data sets are carried out. Experimental results show that the proposed method has obvious advantages in data classification accuracy, recall rate and comprehensive effectiveness. It can effectively improve the analysis ability of large data of network security, and has important application significance for network security situational awareness.

Key words: cyberspace security, imbalanced data, SMOTE, XGBoost

CLC Number: