Netinfo Security ›› 2021, Vol. 21 ›› Issue (6): 52-62.doi: 10.3969/j.issn.1671-1122.2021.06.007

Previous Articles     Next Articles

Multiple Classification Detection Method for Malware Based on XGBoost and Stacking Fusion Model

XU Guotian*(), SHEN Yaotong   

  1. Cyber Crime Investigation Department,Criminal Investigation Police University of China,Shenyang 110854, China
  • Received:2021-03-08 Online:2021-06-10 Published:2021-07-01
  • Contact: XU Guotian* E-mail:459536384@qq.com

Abstract:

Current in the field of malicious programs more classification test, the traditional static and dynamic testing methods are greatly influenced by reverse forensics technology; the new detection method based on network traffic, because of various kinds of malicious program flow characteristics of the similarity is bigger, the data extracted using artificial flow characteristics and the traditional machine learning method can not obtain higher accuracy. Aiming at the above problems, this paper proposes a malicious program multi-classification detection method based on XGBoost and Stacking fusion model. In acquiring target malware external traffic and automatically extract the initial network characteristics, preprocessing and multiple feature selection of the initial data set, and then use based on the characteristics of the XGBoost create algorithm, in the initial features advanced automatic generation based on set, and connecting with the Stacking integration algorithm more fusion model to enhance the malicious program classification accuracy of detection. In this process, in order to reduce the time to find the optimal parameter combination, the Bayesian optimization method is used to determine the optimal parameter combination of each model, and a variety of regularization strategies are adopted to solve the problem of model overfitting. Experimental results show that, compared with other traditional methods, the proposed method has a higher accuracy in multi-classification of malicious programs.

Key words: multiple categories of malicious programs, Multi-level feature selection, extreme gradient boosting, Stacking integration, Bayesian optimization

CLC Number: