Netinfo Security ›› 2020, Vol. 20 ›› Issue (9): 6-11.doi: 10.3969/j.issn.1671-1122.2020.09.002

Previous Articles     Next Articles

Generation of Malicious Domain Training Data Based on Improved Char-RNN Model

WU Jing, LU Tianliang(), DU Yanhui   

  1. College of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
  • Received:2020-07-16 Online:2020-09-10 Published:2020-10-15
  • Contact: Tianliang LU E-mail:lutianliang@ppsuc.edu.cn

Abstract:

In recent years, new botnets have begun to use DGA (Domain Generation Algorithm) to communicate with C&C(Command and Control) servers. Aiming at the problem that the detection models based on deep learning lack the ability to recognize new DGA variants, combined with the idea of text generation, this paper improved the original character-level recurrent neural network (Char-RNN) by using LSTM and attention mechanism, which can generate malicious domain names for simulating unknown DGA variants. Experiment results showed that the domain names generated by this method is highly similar to the real data in character composition structure and frequency. Also, the detection models using the generated data as the training set maintains good performance. This verified the validity of generated data and the feasibility of using it as the training data to predict unknown DGA variants.

Key words: malicious domains, DGA, text generation, deep learning

CLC Number: