Netinfo Security ›› 2020, Vol. 20 ›› Issue (12): 72-82.doi: 10.3969/j.issn.1671-1122.2020.12.010

Previous Articles     Next Articles

Malware Familial Classification of Deep Auto-encoder Based on Mixed Features

TAN Yang, LIU Jiayong, ZHANG Lei()   

  1. College of Cybersecurity, Sichuan University, Chengdu 610065, China
  • Received:2020-09-19 Online:2020-12-10 Published:2021-01-12
  • Contact: ZHANG Lei E-mail:zhanglei2018@scu.edu.cn

Abstract:

Malware authors usually evolve software versions to form malware families. The existing malware family classification methods need to be improved in terms of the robustness of feature selection, the effectiveness and accuracy of classification algorithms. To this end, this paper proposes a deep auto-encoder malware classification method based on mixed features. Firstly, by extracting the dynamic API sequence features and static byte entropy features of the malicious samples as mixed features, the global structure of the malicious samples can be obtained; then, the deep auto-encoder is used to reduce the dimensionality of the high-dimensional features; finally, the resulting low-dimensional features are input into the XGBoost algorithm classifier to obtain the malware's family classification. The experimental results show that this method can correctly and effectively distinguish different families, the micro average AUC reaches 98.3%, and the macro average AUC of the classification reaches 97.9%.

Key words: deep auto-encoder, malware, XGBoost, API sequence, byte entropy

CLC Number: