Netinfo Security ›› 2024, Vol. 24 ›› Issue (12): 1922-1932.doi: 10.3969/j.issn.1671-1122.2024.12.010
Previous Articles Next Articles
LIU Zhuoxian1, WANG Jingya1(), SHI Tuo2
Received:
2024-06-12
Online:
2024-12-10
Published:
2025-01-10
CLC Number:
LIU Zhuoxian, WANG Jingya, SHI Tuo. Research on Malicious URL Detection Using a Multi-Channel Neural Network that Integrates Adversarial Training with BERT-CNN-BiLSTM[J]. Netinfo Security, 2024, 24(12): 1922-1932.
Add to citation manager EndNote|Ris|BibTeX
URL: http://netinfo-security.org/EN/10.3969/j.issn.1671-1122.2024.12.010
恶意URL内容 | 类型 |
---|---|
| Defacement |
| Mailware |
| Phishing |
| Spam |
模型 | accuracy | F1-score | recall | precision | loss |
---|---|---|---|---|---|
BERT-CNN-BiLSTM | 96.25% | 96.26% | 96.25% | 96.25% | 0.1221 |
ROBERTA-CNN-BiLSTM | 95.92% | 95.92% | 95.92% | 95.93% | 0.1338 |
SPANBERT-CNN-BiLSTM | 95.83% | 95.83% | 95.83% | 95.83% | 0.1884 |
Word2Vec-CNN-BiLSTM(C) | 83.25% | 45.43% | 50.00% | 41.63% | 0.6747 |
Word2Vec-CNN-BiLSTM(S) | 83.25% | 45.43% | 50.00% | 41.63% | 0.6707 |
TF-IDF-CNN-BiLSTM | 83.13% | 45.39% | 50.00% | 41.56% | 0.6427 |
CNN-BiLSTM (无预处理模型) | 53.58% | 40.67% | 51.56% | 50.87% | 0.6749 |
模型 | accuracy | F1-score | recall | precision | loss |
---|---|---|---|---|---|
BERT-CNN-BiLSTM | 97.00% | 97.48% | 97.00% | 96.62% | 0.1866 |
ROBERTA-CNN-BiLSTM | 94.58% | 93.35% | 94.58% | 94.85% | 0.1852 |
SPANBERT-CNN-BiLSTM | 96.83% | 96.91% | 96.83% | 97.32% | 0.3756 |
Word2Vec-CNN-BiLSTM(C) | 83.54% | 76.05% | 83.54% | 69.79% | 1.6094 |
Word2Vec-CNN-BiLSTM(S) | 82.42% | 74.47% | 82.42% | 67.93% | 0.8434 |
TF-IDF-CNN-BiLSTM | 77.50% | 74.82% | 77.50% | 72.34% | 0.9292 |
CNN-BiLSTM (无预处理模型) | 51.08% | 59.18% | 51.08% | 71.98% | 1.5233 |
[1] | KASPERSKY. Kaspersky Security Bulletin 2023 Statistics[EB/OL]. (2023-12-04)[2024-05-30]. https://securelist.com/ksb-2023-statistics/111156/. |
[2] | NAGAONKAR A R, KULKARNI U L. Finding the Malicious URLs Using Search Engines[C]// IEEE. 2016 the 3rd International Conference on Computing for Sustainable Global Development (INDIACom). New York: IEEE, 2016: 3692-3694. |
[3] | LE A, MARKOPOULOU A, FALOUTSOS M. Phishdef: URL Names Say It All[C]// IEEE. 2011 Proceedings IEEE INFOCOM. New York: IEEE, 2011: 191-195. |
[4] | MA J, SAUL L K, SAVAGE S, et al. Learning to Detect Malicious URLs[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): 1-24. |
[5] | AFZAL S, ASIM M, JAVED A R, et al. URDeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models[J]. Journal of Network and Systems Management, 2021, 29: 1-27. |
[6] |
LI Xiaodong, SONG Yuanfeng, LI Yuqiang. A Domain Flex Botnet Detection Method that Integrates Word and Word Dual Channels[J]. Computer Science, 2023, 50(12): 337-342.
doi: 10.11896/jsjkx.221000179 |
李晓冬, 宋元凤, 李育强. 一种融合字词双通道的Domain-Flux僵尸网络检测方法[J]. 计算机科学, 2023, 50(12): 337-342.
doi: 10.11896/jsjkx.221000179 |
|
[7] | HUANG Yu. Design and Implementation of XSS and SQL Injection Vulnerability Detectors[D]. Kunming: Yunnan University, 2017. |
黄煜. XSS及SQL注入漏洞检测器的设计与实现[D]. 昆明: 云南大学, 2017. | |
[8] | BANIYA T, GAUTAM D, KIM Y. Safeguarding Web Surfing with URL Blacklisting[C]// IEEE. 2015 the 12th International Conference on Information Technology-New Generations. New York: IEEE, 2015: 157-162. |
[9] | NGUYEN L A T, TO B L, NGUYEN H K, et al. Detecting Phishing Websites: A Heuristic URL-Based Approach[C]// IEEE. 2013 International Conference on Advanced Technologies for Communications (ATC 2013). New York: IEEE, 2013: 597-602. |
[10] | KIM S, KIM J, KANG B. Malicious URL Protection Based on Attackers’ Habitual Behavioral Analysis[J]. Computers & Security, 2018, 77: 790-806. |
[11] | ZHAO Dunyu, ZHANG Zhaoxin. Phishing Website Recognition Algorithm Based on URL Text Features and Link Relationships[J]. High Technology Communication, 2017, 27(8): 708-717. |
赵蹲宇, 张兆心. 基于URL文本特征及链接关系的钓鱼网站识别算法[J]. 高技术通讯, 2017, 27(8): 708-717. | |
[12] | MOHAMMAD R M, THABTAH F, MCCLUSKEY L. Intelligent Rule-Based Phishing Websites Classification[J]. IET Information Security, 2014, 8(3): 153-160. |
[13] | MOGHIMI M, VARJANI A Y. New Rule-Based Phishing Detection Method[J]. Expert Systems with Applications, 2016, 53: 231-242. |
[14] | DAI Linlin, ZHANG Chenyang, MIAO Fan, et al. Research on Fast Matching Algorithms for Blacklists[J]. Railway Computer Applications, 2014, 23(3): 17-20. |
戴琳琳, 张晨阳, 苗凡, 等. 黑名单快速匹配算法的研究[J]. 铁路计算机应用, 2014, 23(3): 17-20. | |
[15] | YU Kai, JIA Lei, CHEN Yuqiang, et al. Deep Learning: Yesterday, Today, and Tomorrow[J]. Journal of Computer Research and Development, 2013, 50(9): 1799-1804. |
余凯, 贾磊, 陈雨强, 等. 深度学习的昨天、今天和明天[J]. 计算机研究与发展, 2013, 50(9): 1799-1804. | |
[16] | LIU Jianwei, LIU Yuan, LUO Xionglin. Research Progress in Deep Learning[J]. Computer Application Research, 2014, 31(7): 1921-1930,1942. |
刘建伟, 刘媛, 罗雄麟. 深度学习研究进展[J]. 计算机应用研究, 2014, 31(7): 1921-1930,1942. | |
[17] | ZHANG Kaihong, LIU Yi. A Malicious URL Detection Method Based on FTCNN-BILSTM[J]. Computer Applications and Software, 2023, 40(11): 295-301. |
张凯洪, 柳毅. 一种基于FTCNN-BILSTM的恶意URLs检测方法[J]. 计算机应用与软件, 2023, 40(11): 295-301. | |
[18] | ZUO Wen. Research and Design of Malicious URL Detection Algorithm Based on Deep Learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2019. |
左雯. 基于深度学习的恶意URL检测算法研究与设计[D]. 北京: 北京邮电大学, 2019. | |
[19] | WANG Huanhuan. Research on Malicious URL Detection Based on Deep Learning Algorithms[D]. Urumqi: Xinjiang University, 2020. |
王欢欢. 基于深度学习算法的恶意URL检测研究[D]. 乌鲁木齐: 新疆大学, 2020. | |
[20] | YUAN H, YANG Z, CHEN X, et al. URL2Vec: URL Modeling with Character Embeddings for Fast and Accurate Phishing Website Detection[C]// IEEE. 2018 IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom). New York: IEEE, 2018: 265-272. |
[21] | ZHANG Xiang. Research and Design of a Malicious Website Detection System[D]. Beijing: Beijing University of Posts and Telecommunications, 2015. |
张翔. 一种恶意网址检测系统的研究与设计[D]. 北京: 北京邮电大学, 2015. | |
[22] | ZHAO Yi. Research and Implementation of Malicious Code Analysis System[D]. Nanjing: Southeast University, 2016. |
赵毅. 恶意代码分析系统的研究与实现[D]. 南京: 东南大学, 2016. | |
[23] | LEI Chijun. Research and Implementation of Malicious Code Detection System Based on Heuristic Algorithms[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2012. |
雷迟骏. 基于启发式算法的恶意代码检测系统研究与实现[D]. 南京: 南京邮电大学, 2012. | |
[24] | NGUYEN L, TO B, NGUYEN H, et al. A Novel Approach for Phishing Detection Using URL-Based Heuristic[C]// IEEE. 2014 International Conference on Computing, Management and Telecommunications (ComManTel). New York: IEEE, 2014: 298-303. |
[25] | AL-RUSHDAN H, SHURMAN M, ALNABELSI S H, et al. Zero-Day Attack Detection and Prevention in Software-Defined Networks[C]// IEEE. 2019 International Arab Conference on Information Technology (ACIT). New York: IEEE, 2019: 278-282. |
[26] | HERNANDEZ I, RIVERO C. R, RUIZ D, et al. On the Character of URL-Based Web Page Clustering: A Statistical Approach[C]// ACM. Proceedings of the 21st International Conference on World Wide Web. New York: ACM, 2012: 525-526. |
[27] | VERMA R, DYER K. On the Character of Phishing URLs: Accurate and Robust Statistical Learning Classifiers[C]// ACM. Proceedings of the 5th ACM Conference on Data and Application Security and Privacy. New York: ACM, 2015: 111-122. |
[28] | ZHANG Yongbin, ZHANG Yanning. Malicious Software Detection Method Based on Host Behavior Characteristics[J]. Computer Application Research, 2014, 31(2): 547-550, 554. |
张永斌, 张艳宁. 基于主机行为特征的恶意软件检测方法[J]. 计算机应用研究, 2014, 31(2): 547-550, 554. | |
[29] | LIU Weiwei, SHI Yong, GUO Yu, et al. A Malicious Code Recognition Method Based on Comprehensive Behavioral Features[J]. Journal of Electronics, 2009, 37(4): 696-700. |
刘巍伟, 石勇, 郭煜, 等. 一种基于综合行为特征的恶意代码识别方法[J]. 电子学报, 2009, 37(4): 696-700. | |
[30] | BABIC B, NESIC N, MILJKOVIC Z. A Review of Automated Feature Recognition with Rule-Based Pattern Recognition[J]. Computers in Industry, 2008, 59(4): 321-337. |
[31] | VERMA R, DAS A. What’s in a URL: Fast Feature Extraction and Malicious URL Detection[C]// ACM. Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics. New York: ACM, 2017: 55-63. |
[32] | LECUN Y, BENGIO Y, HINTON G. Deep Learning[J]. Nature, 2015, 521: 436-444. |
[33] | SARKER I H. Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective[J]. SN Computer Science, 2021, 2(5): 377-386. |
[34] | DARGAN S, KUMAR M, AYYAGARI M R, et al. A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning[J]. Archives of Computational Methods in Engineering, 2020, 27: 1071-1092. |
[35] | ZHU Kenan, YIN Baolin, MAO Yaming, et al. Classification of Malicious Code Based on Effective Windows and Naive Bayes[J]. Computer Research and Development, 2014, 51 (2): 373-381. |
朱克楠, 尹宝林, 冒亚明, 等. 基于有效窗口和朴素贝叶斯的恶意代码分类[J]. 计算机研究与发展, 2014, 51(2): 373-381. | |
[36] | ZHANG Fuyong, QI Deyu, HU Jinglin. Embedded Malicious Code Detection Method Based on C4.5 Decision Tree[J]. Journal of South China University of Technology (Natural Science Edition), 2011, 39 (5): 68-72. |
张福勇, 齐德昱, 胡镜林. 基于C4.5决策树的嵌入型恶意代码检测方法[J]. 华南理工大学学报(自然科学版), 2011, 39(5): 68-72.
doi: 10.3969/j.issn.1000-565X.2011.05.012 |
|
[37] | ZOUINA M, OUTTAJ B. A Novel Lightweight URL Phishing Detection System Using SVM and Similarity Index[J]. Human-Centric Computing and Information Sciences, 2017, 7(1): 17-29. |
[38] | SAHU K, SHRIVASTAVA S K. Kernel K-Means Clustering for Phishing Website and Malware Categorization[J]. International Journal of Computer Applications, 2015, 111(9): 20-25. |
[39] | LI Shaojie, WANG Chen, SHI Yin. Malicious Code Detection Based on Multi Feature Random Forest[J]. Computer Applications and Software, 2020, 37 (10): 328-333. |
李劭杰, 王晨, 史崯. 基于多特征随机森林的恶意代码检测[J]. 计算机应用与软件, 2020, 37(10): 328-333. | |
[40] | JIAO Licheng, YANG Shuyuan, LIU Fang, et al. Neural Networks in Seventy Years: Review and Outlook[J]. Journal of Computer Science, 2016, 39 (8): 1697-1716. |
焦李成, 杨淑媛, 刘芳, 等. 神经网络七十年:回顾与展望[J]. 计算机学报, 2016, 39(8): 1697-1716. | |
[41] | YANG Xiaoxiao. Malicious URL Detection and Research Based on Deep Learning[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2022. |
杨晓晓. 基于深度学习的恶意URL检测与研究[D]. 南京: 南京邮电大学, 2022. | |
[42] | AL-MILLI N, HAMMO B H. A Convolutional Neural Network Model to Detect Illegitimate URLs[C]// IEEE. 2020 11th International Conference on Information and Communication Systems (ICICS). New York: IEEE, 2020: 220-225. |
[43] | HUANG Yongjie, YANG Qiping, QIN Jinghui, et al. Phishing URL Detection via CNN and Attention-Based Hierarchical RNN[C]// IEEE. 2019 the 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. New York: IEEE, 2019: 112-119. |
[44] | LIANG Yuchen, DENG Jiangdong, CUI Baojiang. Bidirectional LSTM: An Innovative Approach for Phishing URL Identification[C]// Springer. Innovative Mobile and Internet Services in Ubiquitous Computing. Heidelberg: Springer, 2020: 326-337. |
[45] | PENG Yongfang, TIAN Shengwei, YU Long, et al. A Joint Approach to Detect Malicious URL Based on Attention Mechanism[J]. International Journal of Computational Intelligence and Applications, 2019, 18(3): 1950021-1950034. |
[46] | LIU Yanhua, LI Jiaqi, OU Zhengui, et al. Anti Training Driven Malicious Code Detection Enhancement Method[J]. Journal of Communications, 2022, 43 (9): 169-180. |
刘延华, 李嘉琪, 欧振贵, 等. 对抗训练驱动的恶意代码检测增强方法[J]. 通信学报, 2022, 43(9): 169-180.
doi: 10.11959/j.issn.1000-436x.2022171 |
|
[47] | ZHANG Lei, CUI Yong, LIU Jing, et al. Application of Machine Learning in Cyberspace Security Research[J]. Journal of Computer Science, 2018, 41 (9): 1943-1975. |
张蕾, 崔勇, 刘静, 等. 机器学习在网络空间安全研究中的应用[J]. 计算机学报, 2018, 41(9): 1943-1975. | |
[48] | WU Lifa, HONG Zheng. Principles of Computer Network Security[M]. Beijing: Electronic Industry Press, 2020. |
吴礼发, 洪征. 计算机网络安全原理[M]. 北京: 电子工业出版社, 2020. | |
[49] | ZHAO Jingsheng, SONG Mengxue, GAO Xiang, et al. Research on Text Representation in Natural Language Processing[J]. Journal of Software, 2022, 33 (1): 102-128. |
赵京胜, 宋梦雪, 高祥, 等. 自然语言处理中的文本表示研究[J]. 软件学报, 2022, 33(1): 102-128. | |
[50] | FU Yixian, LU Tianliang, MA Zeliang. CNN Malicious Code Detection Technology Based on One Hot[J]. Computer Applications and Software, 2020, 37 (1): 304-308, 333. |
傅依娴, 芦天亮, 马泽良. 基于One-Hot的CNN恶意代码检测技术[J]. 计算机应用与软件, 2020, 37(1): 304-308,333. | |
[51] | XIN Rong. Word2vec Parameter Learning Explained[EB/OL]. (2016-07-05)[2024-04-01]. https://arxiv.org/abs/1411.2738. |
[52] | DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[EB/OL]. (2019-05-24)[2024-04-05]. https://arxiv.org/abs/1810.04805. |
[53] | LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-Based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. |
[54] |
HOCHREITER S, SCHMIDHUBER J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276 |
[55] | GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and Harnessing Adversarial Examples[EB/OL]. (2015-03-20)[2024-04-01]. https://arxiv.org/abs/1412.6572. |
[56] | LUPART S, CLINCHANT S. A Study on FGSM Adversarial Training for Neural Retrieval[EB/OL]. (2023-01-25)[2024-04-01]. https://arxiv.org/abs/2301.10576. |
[57] | FAIZANN24. Using Machine Learning to Detect Malicious URLs[EB/OL]. (2017-02-18)[2024-04-01]. https://github.com/faizann24/Using-machine-learning-to-detect-malicious-URLs. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||