Netinfo Security ›› 2024, Vol. 24 ›› Issue (9): 1409-1421.doi: 10.3969/j.issn.1671-1122.2024.09.009

Previous Articles     Next Articles

Lightweight Malicious Code Detection Architecture Based on Vision Transformer

HUANG Baohua1(), YANG Chanjuan1, XIONG Yu2, PANG Si1   

  1. 1. School of Computer and Electronic Information, Guangxi University, Nanning 530004, China
    2. Wuhan Digital Engineer Institute, Wuhan 430070, China
  • Received:2024-06-01 Online:2024-09-10 Published:2024-09-27

Abstract:

With the rapid development of the information society, the number of malware variants is increasing, posing challenges to existing detection methods. To improve the accuracy and efficiency of detecting malware variants, this paper proposed a new hybrid architecture called FasterMalViT. This architecture enhanced the Vision Transformer (ViT) by integrating partial convolutional structures, significantly improving its performance in malware detection. To address the issue of increased parameter count due to the introduction of convolutional operations, the paper employed a separable self-attention mechanism instead of traditional multi-head attention, effectively reducing the number of parameters and computational cost. To tackle the problem of imbalanced sample distribution in malware datasets, the paper introduced a class-balanced focal loss function, guiding the model to pay more attention to categories with fewer samples during training, thus improving performance on hard-to-classify categories. Experimental results on the Microsoft BIG, Malimg, and MalwareBazaar datasets demonstrate that FasterMalViT exhibits good detection performance and generalization ability.

Key words: malicious code, ViT, partial convolution, separable self-attention

CLC Number: