Netinfo Security ›› 2024, Vol. 24 ›› Issue (12): 1922-1932.doi: 10.3969/j.issn.1671-1122.2024.12.010

Previous Articles     Next Articles

Research on Malicious URL Detection Using a Multi-Channel Neural Network that Integrates Adversarial Training with BERT-CNN-BiLSTM

LIU Zhuoxian1, WANG Jingya1(), SHI Tuo2   

  1. 1. Information and Network Security College, People’s Public Security University of China, Beijing 100038, China
    2. Department of Public Security Management, Beijing Police College, Beijing 102202, China
  • Received:2024-06-12 Online:2024-12-10 Published:2025-01-10

Abstract:

Malicious URL are identifiers used to locate network resources and are frequently exploited to execute malicious activities such as fraud, extortion, and data theft. They have become critical mediums for numerous cyberattacks in recent years, causing significant harm to victims. Given the increasing prevalence of malicious URL attacks and the inherent complexity, ambiguity, and deceptive nature of malicious URL characteristics, along with the limitations of existing research in terms of insufficient feature extraction and inadequate focus on model robustness and generalization, this paper proposed a malicious URL detection model that integrates adversarial training with a BERT-CNN-BiLSTM multi-channel neural network. The proposed model treated URLs as textual sequences, leveraging the BERT model for preprocessing to extract semantic features, followed by the CNN layer to capture local features and the BiLSTM layer to extract contextual sequential features. Furthermore, adversarial training using the Fast Gradient Method(FGM) introduced perturbations to the embedding layer, enhancing the model’s accuracy and robustness. Experimental results on public datasets demonstrate that the model achieves a classification accuracy of 97.2% on the binary classification task of URL detection. Ablation studies and comparative experiments further validate the model’s significant advantages across multiple evaluation metrics. Additionally, the model exhibits outstanding performance in fine-grained classification tasks of malicious URL, achieving a classification accuracy of 98.25% in a five-class URL classification task.

Key words: adversarial training, BERT, multi-channel neural network, malicious URL detection

CLC Number: