信息网络安全 ›› 2020, Vol. 20 ›› Issue (9): 6-11.doi: 10.3969/j.issn.1671-1122.2020.09.002

• 优秀论文 • 上一篇    下一篇

基于Char-RNN改进模型的恶意域名训练数据生成技术

吴警, 芦天亮(), 杜彦辉   

  1. 中国人民公安大学信息网络安全学院,北京 100038
  • 收稿日期:2020-07-16 出版日期:2020-09-10 发布日期:2020-10-15
  • 通讯作者: 芦天亮 E-mail:lutianliang@ppsuc.edu.cn
  • 作者简介:吴警(1996—),男,江苏,硕士研究生,主要研究方向为网络信息安全、网络攻防|芦天亮(1985—),男,河北,副教授,博士,主要研究方向为网络信息安全、恶意代码分析与检测|杜彦辉(1969—),男,山西,教授,博士,主要研究方向为网络信息安全、人工智能
  • 基金资助:
    国家自然科学基金(61602489);“十三五”国家密码发展基金密码理论研究重点课题(MMJJ20180108)

Generation of Malicious Domain Training Data Based on Improved Char-RNN Model

WU Jing, LU Tianliang(), DU Yanhui   

  1. College of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
  • Received:2020-07-16 Online:2020-09-10 Published:2020-10-15
  • Contact: Tianliang LU E-mail:lutianliang@ppsuc.edu.cn

摘要:

近年来,新型僵尸网络开始使用域名生成算法(DGA)和命令与控制(C&C)服务器通信。针对基于深度学习的检测模型缺少对新出现的DGA变体域名的识别能力等问题,结合文本生成的思想,文章对原始Char-RNN模型进行改进,使用长短期记忆网络(LSTM)构建模型并引入注意力机制,从而生成用于模拟未知变体算法的恶意域名。实验证明,基于该方法生成的域名数据与真实数据在字符组成结构和频率方面具有高度相似性,且以生成数据作为训练集的检测模型保持了较好的性能,验证了基于文本生成模型的数据有效性以及将其作为训练数据集来预测未知DGA变体的可行性。

关键词: 恶意域名, DGA, 文本生成, 深度学习

Abstract:

In recent years, new botnets have begun to use DGA (Domain Generation Algorithm) to communicate with C&C(Command and Control) servers. Aiming at the problem that the detection models based on deep learning lack the ability to recognize new DGA variants, combined with the idea of text generation, this paper improved the original character-level recurrent neural network (Char-RNN) by using LSTM and attention mechanism, which can generate malicious domain names for simulating unknown DGA variants. Experiment results showed that the domain names generated by this method is highly similar to the real data in character composition structure and frequency. Also, the detection models using the generated data as the training set maintains good performance. This verified the validity of generated data and the feasibility of using it as the training data to predict unknown DGA variants.

Key words: malicious domains, DGA, text generation, deep learning

中图分类号: