Netinfo Security ›› 2019, Vol. 19 ›› Issue (4): 20-28.doi: 10.3969/j.issn.1671-1122.2019.04.003

Previous Articles     Next Articles

Malware Classification Method Based on Word Vector of Assembly Instruction and CNN

Yanchen QIAO1,2(), Qingshan JIANG1, Liang GU2, Xiaoming WU3   

  1. 1. Shenzhen Institutes of Advanced Technology, Chinese Academy of Science, Shenzhen Guangdong 518000, China
    2. Sangfor Technologies Inc, Shenzhen Guangdong 518000, China
    3. Unit 31436 of PLA, Shenyang Liaoning 110001, China
  • Received:2018-12-10 Online:2019-04-10 Published:2020-05-11

Abstract:

In view of the fact that the features used in the current malware classification method are too dependent on expert experience and high complexity problems caused by high feature dimensions, this paper proposes a classification based on word vector of assembly instruction and Convolutional Neural Network (CNN). This paper considers the assembly code file of the executable malware sample as a document, in which the assembly instruction is treated as a word, thereby converting a sample into a document, and using Word2Vec method for each document to calculate the word vector of different instructions on the document. Each sample is then converted into a matrix based on the Top100 assembly instruction sequence counted in the training sample set. Finally, CNN is used to train the classification model on the training sample set. The experimental evaluations shows that the average accuracy of the method is 98.56%.

Key words: malware, classification, Word2Vec, CNN

CLC Number: