信息网络安全 ›› 2023, Vol. 23 ›› Issue (10): 77-82.doi: 10.3969/j.issn.1671-1122.2023.10.011
赵欣荷1,2, 谢永恒3,4(), 万月亮3,4, 汪金苗3,4
收稿日期:
2023-06-26
出版日期:
2023-10-10
发布日期:
2023-10-11
通讯作者:
谢永恒
E-mail:yongheng@bjrun.com
基金资助:
ZHAO Xinhe1,2, XIE Yongheng3,4(), WAN Yueliang3,4, WANG Jinmiao3,4
Received:
2023-06-26
Online:
2023-10-10
Published:
2023-10-11
摘要:
文章提出一种基于多模态数据的博彩网站检测识别模型,首先构建基于文本特征的Bert特征提取模型和基于图像特征的VGG19特征提取模型;然后通过特征融合及改变损失函数的方式提升博彩网站检测识别分类效果。在自建的正负样本1:5、1:10和1:20的数据集上对模型进行验证,实验结果表明,正负样本不均衡情况越明显,该模型的优势越明显,越能高效检测识别博彩网站。
中图分类号:
赵欣荷, 谢永恒, 万月亮, 汪金苗. 基于多模态数据的博彩网站检测识别模型[J]. 信息网络安全, 2023, 23(10): 77-82.
ZHAO Xinhe, XIE Yongheng, WAN Yueliang, WANG Jinmiao. Detection and Identification Model of Gambling Websites Based on Multi-Modal Data[J]. Netinfo Security, 2023, 23(10): 77-82.
表3
不同样本分布空间实验对比结果
正负样本 比例 | 模型 | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|
1:1 | Bert-VGG19 | 0.951 | 0.954 | 0.948 | 0.951 |
Bert-VGG19-FL | 0.956 | 0.959 | 0.954 | 0.956 | |
1:5 | Bert-VGG19 | 0.939 | 0.821 | 0.813 | 0.817 |
Bert-VGG19-FL | 0.976 | 0.938 | 0.919 | 0.928 | |
1:10 | Bert-VGG19 | 0.937 | 0.615 | 0.808 | 0.698 |
Bert-VGG19-FL | 0.978 | 0.857 | 0.909 | 0.882 | |
1:20 | Bert-VGG19 | 0.928 | 0.389 | 0.700 | 0.500 |
Bert-VGG19-FL | 0.985 | 0.818 | 0.900 | 0.857 |
表4
混淆矩阵对比结果
Bert | 真实结果 | ||
---|---|---|---|
正例(博彩) | 反例(正常) | ||
预测结果 | 正例(博彩) | TP:160 | FP:80 |
反例(正常) | FN:38 | TN:910 | |
Bert-VGG19 | 真实结果 | ||
正例(博彩) | 反例(正常) | ||
预测结果 | 正例(博彩) | TP:161 | FP:35 |
反例(正常) | FN:37 | TN:955 | |
VGG19 | 真实结果 | ||
正例(博彩) | 反例(正常) | ||
预测结果 | 正例(博彩) | TP:155 | FP:101 |
反例(正常) | FN:43 | TN:889 | |
Bert-VGG19-FL | 真实结果 | ||
正例(博彩) | 反例(正常) | ||
预测结果 | 正例(博彩) | TP:182 | FP:12 |
反例(正常) | FN:16 | TN:978 |
[1] | The Supreme People’s Procuratorate of the People’s Republic of China. The Supreme People’s Procuratorate of the People’s Republic of China Held the Press Conference Named “Performing Procuratorial Functions According to Law and Severely Punishing the Crime of Opening Casinos”[EB/OL]. (2021-11-29) [2023-04-20]. https://www.spp.gov.cn/spp/cyczksdcfz/xwfbh.shtml. |
最高人民检察院. 最高检举行“依法履行检察职能,从严惩治开设赌场犯罪”新闻发布会[EB/OL]. (2021-11-29) [2023-04-20]. https://www.spp.gov.cn/spp/cyczksdcfz/xwfbh.shtml. | |
[2] | LIU Jiayin, YIN Jie, NIU Bowei, et al. Capture Method of Gambling Related Illegal Website in Massive Website[J]. Journal of Data Acquisition and Processing, 2021, 36(5): 1050-1061. |
刘家银, 印杰, 牛博威, 等. 海量网站中博彩类违法网站的捕获方法[J]. 数据采集与处理, 2021, 36(5):1050-1061. | |
[3] | QI Xiaoguang, DAVISION B D. Web Page Classification: Features and Algorithms[J]. ACM Computing Surveys(CSUR), 2009, 41(2): 1-31. |
[4] | BANNUR S N, SAUL L K, SAVAGE S. Judging a Site by Its Content: Learning the Textual, Structural, and Visual Features of Malicious Web Pages[C]// ACM. The ACM Conference on Computer and Communications Security. New York: ACM, 2011: 1-10. |
[5] | CANALI D, COVA M, VIGNA G, et al. Prophiler: A Fast Filter for the Large-Scale Detection of Malicious Web Pages[C]// ACM. 20th International World Wide Web Conference. New York: ACM, 2011: 197-206. |
[6] | ESHETE B, VILLAFIORITA A, WELDEMARIAM K. Binspect: Holistic Analysis and Detection of Malicious Web Pages[C]// Springer. Security and Privacy in Communication Networks:8th International ICST Conference. Heidelberg: Springer, 2013: 149-166. |
[7] | ZHAO Jiaqi. Reseach on Phishing Website Detection Based on Data Mining Classification Algorithm[D]. Hohhot: Inner Mongolia University of Finance and Economics, 2018. |
赵佳琪. 基于数据挖掘分类算法的钓鱼网站检测研究[D]. 呼和浩特: 内蒙古财经大学, 2018. | |
[8] | LIU Tianyi, ZHANG Ruxian, YUAN Yi, et al. Website Identification Programme Based on Machine Learning[J]. Network Security Technology & Application, 2020(7): 62-63. |
刘天一, 张汝娴, 袁艺, 等. 基于机器学习的网站识别方案[J]. 网络安全技术与应用, 2020(7):62-63. | |
[9] | CHO D, NGUYEN H D, NIKOLAEVICH V N. Malicious URL Detection Based on Machine Learning[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(1): 148-153. |
[10] | ZHANG Qiao, BU Youjun, CHEN Bo, et al. Phishing URL Detection Method Based on MPAN[J]. Journal of Information Engineering University, 2021, 22(4): 443-449. |
张桥, 卜佑军, 陈博, 等. 一种基于MPAN的钓鱼URL检测方法[J]. 信息工程大学学报, 2021, 22(4):443-449. | |
[11] |
LI Yunkun, YANG Zhenguo, CHEN Xu, et al. A Stacking Model Using URL and HTML Features for Phishing Webpage Detection[J]. Future Generations Computer Systems, 2019, 94: 27-39.
doi: 10.1016/j.future.2018.11.004 URL |
[12] | CHEN Yang, ZHENG Rongfeng, ZHOU Anmin, et al. Automatic Detection of Pornographic and Gambling Websites Based on Visual And Textual Content Using a Decision Mechanism[EB/OL]. (2020-07-17) [2023-06-10]. https://www.mdpi.com/1424-8220/20/14/3989. |
[13] | YANG Rundong, ZHENG Kangfeng, WU Bin, et al. Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning[EB/OL]. (2021-12-10) [2023-06-10]. https://pubmed.ncbi.nlm.nih.gov/34960375/. |
[14] | LI Guojing, YIN Tianyang, ZHANG Xingrui. A Detection Method Gambling Websites Based on PAM[J]. Computer Applications and Sofware, 2021, 38(9): 167-172. |
李国静, 尹天阳, 张兴睿. 基于PAM概率主题模型的赌博网站检测方法[J]. 计算机应用与软件, 2021, 38(9):167-172. | |
[15] | FU A Y, LIU Wenyin, DENG Xiaotie. Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover’s Distance(EMD)[ J]. Dependable and Secure Computing, 2006, 3(4): 301-311. |
[16] | DENG Li, DU Xin, SHEN Jizhong. Web Page Classification Based on Heterogeneous Features and a Combination of Multiple Classifiers[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(7): 995-1004. |
[17] | DEVLIN J, CHANG Mingwei, LEE K, et al. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding[EB/OL]. (2019-05-24) [2023-06-10]. https://arxiv.org/abs/1810.04805. |
[18] | ZHANG Mingquan, ZHOU Hui, CAO Jingang. Reseach on Dual BERT Directed Sentiment Text Classification Based on Attention Mechanism[J]. CAAI Transactions on Intelligent Systems, 2022, 17(6): 1220-1227. |
张铭泉, 周辉, 曹锦纲. 基于注意力机制的双BERT有向情感文本分类研究[J]. 智能系统学报, 2022, 17(6):1220-1227. | |
[19] | LIU Bo, PU Yifei. BERT-Based Approach for Long Document Classification[J]. Journal of Sichuan University(Natural Science Edition), 2023, 60(2): 81-88. |
刘博, 蒲亦非. 基于BERT 的长文本分类方法[J]. 四川大学学报(自然科学版), 2023, 60(2):81-88. | |
[20] | SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[EB/OL]. (2015-04-10) [2023-06-10]. https://arxiv.org/abs/1409.1556. |
[21] | LI Zhenbo, LI Meng, ZHAO Yuanyang, et al. Iced Pomfret Freshness Evaluation Method Based on Improved VGG-19 Convolutional Neural Networks[J]. Transactions of the Chinese Society of Agricultural Engineering, 2021, 37(22): 286-294. |
李振波, 李萌, 赵远洋, 等. 基于改进VGG-19卷积神经网络的冰鲜鲳鱼新鲜度评估方法[J]. 农业工程学报, 2021, 37(22):286-294. | |
[22] | ZHU Yimin, GUO Ruyan, JU Jiaji, et al. A Boosting Tree Classification Algorithm for Imbalanced Dataset Combined with Focal Loss[J]. Software Guide, 2021, 20(11): 65-69. |
朱翌民, 郭茹燕, 巨家骥, 等. 一种结合Focal Loss的不平衡数据集提升树分类算法[J]. 软件导刊, 2021, 20(11):65-69. | |
[23] | MAO Hao, LI Xinli, WANG Xiaowei, et al. Reaseach on Semantic Segmentaion of Transformer Substation Image Based on Multi-Category Focal Loss Fuction[J]. Journal of North China Electric Power University(Natural Science Edition), 2022, 49(5): 84-92. |
毛昊, 李新利, 王孝伟, 等. 基于多类别Focal Loss损失函数的变电站场景图像语义分割研究[J]. 华北电力大学学报(自然科学版), 2022, 49(5):84-92. |
[1] | 张玉臣, 张雅雯, 吴越, 李程. 基于时频图与改进E-GraphSAGE的网络流量特征提取方法[J]. 信息网络安全, 2023, 23(9): 12-24. |
[2] | 赵小林, 王琪瑶, 赵斌, 薛静锋. 基于机器学习的匿名流量分类方法研究[J]. 信息网络安全, 2023, 23(5): 1-10. |
[3] | 刘高扬, 吴伟玲, 张锦升, 王琛. 多模态对比学习中的靶向投毒攻击[J]. 信息网络安全, 2023, 23(11): 69-83. |
[4] | 秦一方, 张健, 梁晨. 基于神经网络的电子病历数据特征提取技术研究[J]. 信息网络安全, 2023, 23(10): 70-76. |
[5] | 郎波, 谢冲, 陈少杰, 刘宏宇. 基于多模态特征融合的Fast-Flux恶意域名检测方法[J]. 信息网络安全, 2022, 22(4): 20-29. |
[6] | 康健, 王杰, 李正旭, 张光妲. 物联网中一种基于多种特征提取策略的入侵检测模型[J]. 信息网络安全, 2019, 19(9): 21-25. |
[7] | 李辉, 倪时策, 肖佳, 赵天忠. 面向互联网在线视频评论的情感分类技术[J]. 信息网络安全, 2019, 19(5): 61-68. |
[8] | 王旭东, 余翔湛, 张宏莉. 面向未知协议的流量识别技术研究[J]. 信息网络安全, 2019, 19(10): 74-83. |
[9] | 文伟平, 李经纬, 焦英楠, 李海林. 一种基于随机探测算法和信息聚合的漏洞检测方法[J]. 信息网络安全, 2019, 19(1): 1-7. |
[10] | 鲁刚, 郭荣华, 周颖, 王军. 恶意流量特征提取综述[J]. 信息网络安全, 2018, 18(9): 1-9. |
[11] | 段桂华, 申卓祥, 申东杰, 李智. 一种基于特征提取的有效下载链接识别方案研究[J]. 信息网络安全, 2018, 18(10): 31-36. |
[12] | 徐燕. 基于数据挖掘的网络链接预测研究[J]. 信息网络安全, 2017, 17(6): 30-34. |
[13] | 高川, 严寒冰, 贾子骁. 基于特征的网络漏洞态势感知方法研究[J]. 信息网络安全, 2016, 16(12): 28-33. |
[14] | 裘玥. 匿名网络的安全监管隐患与信息获取技术研究[J]. 信息网络安全, 2015, 15(9): 106-108. |
[15] | 李旬, 徐剑, 焦英楠, 严寒冰. 基于异常特征的社交网页检测技术研究[J]. 信息网络安全, 2015, 15(5): 41-46. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||