信息网络安全 ›› 2023, Vol. 23 ›› Issue (10): 8-15.doi: 10.3969/j.issn.1671-1122.2023.10.002
收稿日期:
2023-05-09
出版日期:
2023-10-10
发布日期:
2023-10-11
通讯作者:
姜波
E-mail:jiangbo@iie.ac.cn
作者简介:
叶桓荣(1988—),男,四川,硕士研究生,CCF会员,主要研究方向为网络攻防技术、网络威胁态势感知|李牧远(1988—),男,山东,硕士研究生,主要研究方向为信息智能处理、自然语言处理|姜波(1985—),男,安徽,副研究员,博士,CCF会员,主要研究方向为态势感知、行为分析、信息智能处理
基金资助:
YE Huanrong1,2, LI Muyuan1,3, JIANG Bo4,5()
Received:
2023-05-09
Online:
2023-10-10
Published:
2023-10-11
摘要:
域名生成算法已被广泛运用在各类网络攻击中,其存在样本变化快、变种多、获取难等特点,导致现有传统模型检测精度不高,预警能力差。针对该情况,文章提出一种基于迁移学习和威胁情报的DGA恶意域名检测方法,通过构建双向长短时记忆神经网络和Transformer的组合模型,提取恶意域名上下文及语义关系特征,利用公开大样本恶意域名数据集进行预训练,迁移训练参数至新型未知小样本恶意域名进行模型检测性能测试。实验结果表明,该模型在多个APT组织使用的恶意域名小样本数据集中能达到96.14%的平均检测精度,检测性能表现良好。
中图分类号:
叶桓荣, 李牧远, 姜波. 基于迁移学习和威胁情报的DGA恶意域名检测方法研究[J]. 信息网络安全, 2023, 23(10): 8-15.
YE Huanrong, LI Muyuan, JIANG Bo. Research on DGA Malicious Domain Name Detection Method Based on Transfer Learning and Threat Intelligence[J]. Netinfo Security, 2023, 23(10): 8-15.
表3
样本数据集描述
类型 | 描述 | 样本示例 | 数量/个 |
---|---|---|---|
合法域名 | Alexa | google.com、amazon.com | 1000000 |
预训练恶意域名集 | banjori、rovnix、tinba、pykspa_v1、simda、flubot、bazardoor、ramnit、ranbyus、gameover、mydoom、virutmurofet、necurs等31个家族 | isrfbvs.info、vhpkiktk.net、agfsfafsuf.net | 1037812 |
调参恶意 域名集 | matsnu、vawtrak、pykspa_v2_fake、dircrypt、tordwm、enviserv等20个家族 | 9c8e924f.top、ufkkuxxedanldohyjyae.com | 7289 |
威胁情报小样本恶意 域名集 | Donot、APT35等19个APT组织所使用的Moqhao、VileRAT等 工具实施C&C的DGA恶意域名 | nms***dis.com、eu***tek.info | 3236 |
表4
31类DGA恶意域名数量
DGA | 数量/个 | DGA | 数量/个 |
---|---|---|---|
banjori | 483028 | wauchos | 5940 |
rovnix | 179996 | ngioweb | 5250 |
tinba | 102108 | qakbot | 5000 |
pykspa_v1 | 44588 | symmi | 4256 |
simda | 30275 | necro | 2962 |
flubot | 30000 | tempedreve | 2786 |
bazardoor | 28410 | shifu | 2537 |
ramnit | 20064 | monerominer | 2495 |
ranbyus | 16120 | suppobox | 2251 |
gameover | 12000 | qadars | 2200 |
mydoom | 9928 | locky | 1178 |
virut | 9734 | bigviktor | 1000 |
murofet | 8560 | dyre | 1000 |
necurs | 8190 | chinad | 1000 |
shiotob | 8004 | cryptolocker | 1000 |
emotet | 5952 | — | — |
表7
31类样本数据量充足的家族DGA恶意域名检测性能
家族 | Accuracy | Precision | FPR | FNR | 家族 | Accuracy | Precision | FPR | FNR |
---|---|---|---|---|---|---|---|---|---|
banjori | 97.66% | 98.59% | 2.08% | 2.87% | wauchos | 95.62% | 94.46% | 5.91% | 6.48% |
rovnix | 97.72% | 95.49% | 3.22% | 2.54% | ngioweb | 94.48% | 93.79% | 4.08% | 5.13% |
tinba | 95.08% | 97.27% | 4.88% | 5.46% | qakbot | 92.31% | 93.53% | 7.38% | 8.06% |
pykspa_v1 | 94.81% | 95.92% | 5.12% | 5.71% | symmi | 97.82% | 96.29% | 2.59% | 2.02% |
simda | 90.14% | 92.73% | 8.18% | 7.91% | necro | 94.19% | 95.47% | 6.03% | 6.94% |
flubot | 94.42% | 94.87% | 6.40% | 6.25% | tempedreve | 97.65% | 96.43% | 2.23% | 2.97% |
bazardoor | 95.46% | 97.28% | 5.45% | 4.15% | shifu | 95.71% | 96.98% | 6.05% | 5.49% |
ramnit | 96.35% | 97.46% | 4.67% | 4.73% | monerominer | 94.44% | 92.28% | 5.7% | 4.48% |
ranbyus | 96.53% | 95.59% | 4.93% | 4.23% | suppobox | 94.27% | 94.96% | 6.38% | 5.12% |
gameover | 95.95% | 96.15% | 4.14% | 4.54% | gadars | 95.81% | 95.32% | 3.43% | 5.47% |
mydoom | 91.78% | 93.47% | 7.54% | 6.82% | locky | 95.82% | 95.73% | 5.91% | 5.01% |
virut | 97.84% | 96.67% | 2.33% | 2.64% | bigviktor | 94.97% | 92.66% | 5.74% | 6.53% |
murofet | 95.96% | 97.03% | 3.34% | 4.28% | dyre | 97.99% | 96.35% | 2.91% | 3.89% |
necurs | 93.51% | 95.08% | 6.32% | 5.19% | chinad | 97.16% | 96.25% | 2.36% | 3.37% |
shiotob | 96.58% | 97.28% | 2.43% | 3.16% | cryptolocker | 95.86% | 96.22% | 6.63% | 6.42% |
emotet | 94.13% | 94.29% | 6.14% | 6.55% | — | — | — | — | — |
表8
小样本数据量APT组织使用DGA恶意域名检测性能
APT 组织 | Accuracy | Precision | FPR | FNR | APT 组织 | Accuracy | Precision | FPR | FNR |
---|---|---|---|---|---|---|---|---|---|
APT35 | 97.37% | 94.47% | 4.71% | 6.49% | ColdRiver | 96.49% | 95.03% | 5.51% | 4.06% |
DeathStalker | 97.76% | 96.84% | 3.56% | 4.30% | Donot | 94.14% | 94.88% | 6.04% | 6.26% |
DomantColor | 96.22% | 95.37% | 4.94% | 3.67% | Evilnum | 94.67% | 92.71% | 5.66% | 7.15% |
FIN7 | 97.71% | 96.89% | 2.44% | 2.46% | Gamaredon | 94.74% | 95.45% | 3.92% | 3.38% |
GoolPJAR | 97.56% | 97.96% | 2.01% | 2.62% | GreenSpot | 96.54% | 95.79% | 6.44% | 5.93% |
Hagga | 95.62% | 94.86% | 4.78% | 4.44% | Kimsuky | 95.18% | 94.63% | 6.51% | 5.48% |
Lazarus | 96.79% | 95.11% | 4.06% | 3.72% | Pig Butchering | 97.78% | 96.69% | 2.04% | 2.93% |
RomCom | 94.89% | 95.57% | 4.76% | 6.61% | SideWinder | 92.44% | 91.52% | 8.89% | 9.42% |
Transparent Tribe | 96.46% | 97.37% | 3.07% | 4.56% | Void Balaur | 96.48% | 97.22% | 3.02% | 2.54% |
Water Labbu | 97.73% | 95.55% | 2.17% | 2.29% | — | — | — | — | — |
[1] | DAVUTH N, KIM S R. Classification of Malicious Domain Names Using Support Vector Machine and Bi-Gram Method[J]. International Journal of Security and Its Applications, 2013, 7(1): 51-58. |
[2] | YADAV S, REDDY A K K, REDDY A L N, et al. Detecting Algorithmically Generated Malicious Domain Names[C]// ACM. Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. New York: ACM, 2010: 48-61. |
[3] | SCHIAVONI S, MAGGI F, CAVALLARO L, et al. Phoenix: DGA-Based Botnet Tracking and Intelligence[C]// Springer. Detection of Intrusions and Malware, and Vulnerability Assessment:11th International Conference. Berlin:Springer, 2014: 192-211. |
[4] |
TRAN D, MAC H, TONG V, et al. A LSTM Based Framework for Handling Multiclass Imbalance in DGA Botnet Detection[J]. Neurocomputing, 2018, 275: 2401-2413.
doi: 10.1016/j.neucom.2017.11.018 URL |
[5] | YU Bin, PAN Jie, HU Jiaming, et al. Character Level Based Detection of DGA Domain Names[C]// IEEE. 2018 International Joint Conference on Neural Networks (IJCNN). New York: IEEE, 2018: 1-8. |
[6] | ZHANG Xin, CHENG Hua, FANG Yiquan. A DGA Domain Name Detection Method Based on Transformer[J]. Computer Engineering & Science, 2020, 42(3): 411-417. |
张鑫, 程华, 房一泉. 基于Transformer的DGA域名检测方法[J]. 计算机工程与科学, 2020, 42(3): 411-417. | |
[7] |
CAGLAYAN A, TOOTHAKER M, DRAPEAU D, et al. Behavioral Analysis of Botnets for Threat Intelligence[J]. Information Systems and E-Business Management, 2012, 10: 491-519.
doi: 10.1007/s10257-011-0171-7 URL |
[8] | LI Juntao, SHI Yong, XUE Zhi. APT Detection Based on DNS Traffic and Threat Intelligence[J]. Information Security and Communications, 2016(7): 84-88. |
[9] |
CHIBA D, AKIYAMA M, YAGI T, et al. DomainChroma: Building Actionable Threat Intelligence from Malicious Domain Names[J]. Computers & Security, 2018, 77: 138-161.
doi: 10.1016/j.cose.2018.03.013 URL |
[10] | WANG Xin, WU Yang, LU Zhigang. Study on Malicious URL Detection Based on Threat Intelligence Platform[J]. Computer Science, 2018, 45(3): 126-132, 172. |
汪鑫, 武杨, 卢志刚. 基于威胁情报平台的恶意URL检测研究[J]. 计算机科学, 2018, 45(3): 126-132, 172. | |
[11] |
SURYOTRISONGKO H, MUSASHI Y, TSUNEDA A, et al. Robust Botnet DGA Detection: Blending XAI and OSINT for Cyber Threat Intelligence Sharing[J]. IEEE Access, 2022, 10: 34613-34624.
doi: 10.1109/ACCESS.2022.3162588 URL |
[12] |
ALSAEDI M, GHALEB F A, SAEED F, et al. Cyber Threat Intelligence-Based Malicious URL Detection Model Using Ensemble Learning[J]. Sensors, 2022, 22(9): 3373-3382.
doi: 10.3390/s22093373 URL |
[13] |
GU Zhaojun, YANG Wenjin, ZHOU Jingxian. Small Sample DGA Malicious Domain Names Detection Method Based on Transfer Learning[J]. Computer Engineering and Applications, 2021, 57(14): 103-109.
doi: 10.3778/j.issn.1002-8331.2004-0209 |
顾兆军, 杨文瑾, 周景贤. 基于迁移学习的小样本DGA恶意域名检测方法[J]. 计算机工程与应用, 2021, 57(14): 103-109.
doi: 10.3778/j.issn.1002-8331.2004-0209 |
|
[14] | ZHAO Fan, ZHAO Hong, CHANG Zhaobin. Small Sample Malicious Domain Names Detection Method Based on Transfer Learning[J]. Computer Engineering and Design, 2022, 43(12): 3381-3387. |
赵凡, 赵宏, 常兆斌. 基于迁移学习的小样本恶意域名检测[J]. 计算机工程与设计, 2022, 43(12): 3381-3387. | |
[15] | RAJALAKSHMI R, RAMRAJ S, RAMESH Kannan R. Transfer Learning Approach for Identification of Malicious Domain Names[C]// SSCC. International Symposium on Security in Computing and Communication. Springer, 2018: 656-666. |
[16] | TRUONG D T, TRAN D T, HUYNH B. Detecting Malicious Fast-Flux Domains Using Feature-Based Classification Techniques[J]. Journal of Internet Technology, 2020, 21(4): 1061-1072. |
[17] | HUANG Zhuofan, ZHANG Yangsen, DUAN Ruixue, et al. Research on Malicious URL Identification and Analysis for Network Security[C]// IEEE. 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC). New York: IEEE, 2021: 418-422. |
[18] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All You Need[J]. Advances in Neural Information Processing Systems, 2017, 30(2): 1-11. |
[19] |
YANG Peng, ZHAO Guangzhen, ZENG Peng. Phishing Website Detection Based on Multidimensional Features Driven By Deep Learning[J]. IEEE Access, 2019, 7: 15196-15209.
doi: 10.1109/ACCESS.2019.2892066 |
[20] |
ZHAO Hong, WANG Le, WANG Weijie. Text Sentiment Analysis Based on Serial Hybrid Model of Bi-Directional Long Short-Term Memory and Convolutional Neural Network[J]. Journal of Computer Applications, 2020, 40(1): 16-22.
doi: 10.11772/j.issn.1001-9081.2019060968 |
赵宏, 王乐, 王伟杰. 基于BiLSTM-CNN串行混合模型的文本情感分析[J]. 计算机应用, 2020, 40(1): 16-22.
doi: 10.11772/j.issn.1001-9081.2019060968 |
[1] | 吴尚远, 申国伟, 郭春, 陈意. 威胁情报驱动的动态威胁狩猎方法[J]. 信息网络安全, 2023, 23(6): 91-103. |
[2] | 姚远, 樊昭杉, 王青, 陶源. 基于多元时序特征的恶意域名检测方法[J]. 信息网络安全, 2023, 23(11): 1-8. |
[3] | 冯景瑜, 张琪, 黄文华, 韩刚. 基于跨链交互的网络安全威胁情报共享方案[J]. 信息网络安全, 2022, 22(5): 21-29. |
[4] | 郎波, 谢冲, 陈少杰, 刘宏宇. 基于多模态特征融合的Fast-Flux恶意域名检测方法[J]. 信息网络安全, 2022, 22(4): 20-29. |
[5] | 徐硕, 张睿, 夏辉. 基于数据属性修改的联邦学习隐私保护策略[J]. 信息网络安全, 2022, 22(1): 55-63. |
[6] | 郭烜臻, 潘祖烈, 沈毅, 陈远超. 一种基于被动DNS数据分析的DNS重绑定攻击检测技术[J]. 信息网络安全, 2021, 21(3): 87-95. |
[7] | 郭向民, 梁广俊, 夏玲玲. 基于HMM的Domain-Flux恶意域名检测及分析[J]. 信息网络安全, 2021, 21(12): 1-8. |
[8] | 徐国天, 盛振威. 基于融合CNN与LSTM的DGA恶意域名检测方法[J]. 信息网络安全, 2021, 21(10): 41-47. |
[9] | 马骁, 蔡满春, 芦天亮. 基于CNN改进模型的恶意域名训练数据生成技术[J]. 信息网络安全, 2021, 21(10): 69-75. |
[10] | 程顺航, 李志华. 基于MRC的威胁情报实体识别方法研究[J]. 信息网络安全, 2021, 21(10): 76-82. |
[11] | 吴警, 芦天亮, 杜彦辉. 基于Char-RNN改进模型的恶意域名训练数据生成技术[J]. 信息网络安全, 2020, 20(9): 6-11. |
[12] | 罗峥, 张学谦. 基于思维进化算法优化S-Kohonen神经网络的恶意域名检测模型[J]. 信息网络安全, 2020, 20(6): 82-89. |
[13] | 张永生, 王志, 武艺杰, 杜振华. 基于Conformal Prediction的威胁情报繁殖方法[J]. 信息网络安全, 2020, 20(6): 90-95. |
[14] | 郭春, 陈长青, 申国伟, 蒋朝惠. 一种基于可视化的勒索软件分类方法[J]. 信息网络安全, 2020, 20(4): 31-39. |
[15] | 王长杰, 李志华, 张叶. 一种针对恶意软件家族的威胁情报生成方法[J]. 信息网络安全, 2020, 20(12): 83-90. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||