基于多模态数据的博彩网站检测识别模型

doi:10.3969/j.issn.1671-1122.2023.10.011

摘要/Abstract

摘要：

文章提出一种基于多模态数据的博彩网站检测识别模型，首先构建基于文本特征的Bert特征提取模型和基于图像特征的VGG19特征提取模型；然后通过特征融合及改变损失函数的方式提升博彩网站检测识别分类效果。在自建的正负样本1:5、1:10和1:20的数据集上对模型进行验证，实验结果表明，正负样本不均衡情况越明显，该模型的优势越明显，越能高效检测识别博彩网站。

关键词: 多模态, 博彩网站, 特征提取

Abstract:

This paper proposed a gambling website detection and recognition model based on multimodal data. Firstly, it constructed a Bert feature extraction model based on text features and a VGG19 feature extraction model based on image features; secondly, the method improved the classification effect of gambling website detection and recognition based on feature fusion and changing the loss function; lastly, this paper validated the method on self-constructed positive and negative samples of 1:5, 1:10, and 1:20 datasets. The experimental results indicate that the more obvious the imbalance of positive and negative samples is, the more obvious the advantage of the proposed method is, and it can detect and recognise gambling websites well.

Key words: multi-modal, gambling website, feature extraction

中图分类号:

TP309

赵欣荷, 谢永恒, 万月亮, 汪金苗. 基于多模态数据的博彩网站检测识别模型[J]. 信息网络安全, 2023, 23(10): 77-82.

ZHAO Xinhe, XIE Yongheng, WAN Yueliang, WANG Jinmiao. Detection and Identification Model of Gambling Websites Based on Multi-Modal Data[J]. Netinfo Security, 2023, 23(10): 77-82.

图/表 7

图1

图2

图3

表1

表2

表3

表4

参考文献 23

[1]	The Supreme People’s Procuratorate of the People’s Republic of China. The Supreme People’s Procuratorate of the People’s Republic of China Held the Press Conference Named “Performing Procuratorial Functions According to Law and Severely Punishing the Crime of Opening Casinos”[EB/OL]. (2021-11-29) [2023-04-20]. https://www.spp.gov.cn/spp/cyczksdcfz/xwfbh.shtml.
	最高人民检察院. 最高检举行“依法履行检察职能,从严惩治开设赌场犯罪”新闻发布会[EB/OL]. (2021-11-29) [2023-04-20]. https://www.spp.gov.cn/spp/cyczksdcfz/xwfbh.shtml.
[2]	LIU Jiayin, YIN Jie, NIU Bowei, et al. Capture Method of Gambling Related Illegal Website in Massive Website[J]. Journal of Data Acquisition and Processing, 2021, 36(5): 1050-1061.
	刘家银, 印杰, 牛博威, 等. 海量网站中博彩类违法网站的捕获方法[J]. 数据采集与处理, 2021, 36(5):1050-1061.
[3]	QI Xiaoguang, DAVISION B D. Web Page Classification: Features and Algorithms[J]. ACM Computing Surveys(CSUR), 2009, 41(2): 1-31.
[4]	BANNUR S N, SAUL L K, SAVAGE S. Judging a Site by Its Content: Learning the Textual, Structural, and Visual Features of Malicious Web Pages[C]// ACM. The ACM Conference on Computer and Communications Security. New York: ACM, 2011: 1-10.
[5]	CANALI D, COVA M, VIGNA G, et al. Prophiler: A Fast Filter for the Large-Scale Detection of Malicious Web Pages[C]// ACM. 20th International World Wide Web Conference. New York: ACM, 2011: 197-206.
[6]	ESHETE B, VILLAFIORITA A, WELDEMARIAM K. Binspect: Holistic Analysis and Detection of Malicious Web Pages[C]// Springer. Security and Privacy in Communication Networks:8th International ICST Conference. Heidelberg: Springer, 2013: 149-166.
[7]	ZHAO Jiaqi. Reseach on Phishing Website Detection Based on Data Mining Classification Algorithm[D]. Hohhot: Inner Mongolia University of Finance and Economics, 2018.
	赵佳琪. 基于数据挖掘分类算法的钓鱼网站检测研究[D]. 呼和浩特: 内蒙古财经大学, 2018.
[8]	LIU Tianyi, ZHANG Ruxian, YUAN Yi, et al. Website Identification Programme Based on Machine Learning[J]. Network Security Technology & Application, 2020(7): 62-63.
	刘天一, 张汝娴, 袁艺, 等. 基于机器学习的网站识别方案[J]. 网络安全技术与应用, 2020(7):62-63.
[9]	CHO D, NGUYEN H D, NIKOLAEVICH V N. Malicious URL Detection Based on Machine Learning[J]. International Journal of Advanced Computer Science and Applications, 2020, 11(1): 148-153.
[10]	ZHANG Qiao, BU Youjun, CHEN Bo, et al. Phishing URL Detection Method Based on MPAN[J]. Journal of Information Engineering University, 2021, 22(4): 443-449.
	张桥, 卜佑军, 陈博, 等. 一种基于MPAN的钓鱼URL检测方法[J]. 信息工程大学学报, 2021, 22(4):443-449.
[11]	LI Yunkun, YANG Zhenguo, CHEN Xu, et al. A Stacking Model Using URL and HTML Features for Phishing Webpage Detection[J]. Future Generations Computer Systems, 2019, 94: 27-39. doi: 10.1016/j.future.2018.11.004 URL
[12]	CHEN Yang, ZHENG Rongfeng, ZHOU Anmin, et al. Automatic Detection of Pornographic and Gambling Websites Based on Visual And Textual Content Using a Decision Mechanism[EB/OL]. (2020-07-17) [2023-06-10]. https://www.mdpi.com/1424-8220/20/14/3989.
[13]	YANG Rundong, ZHENG Kangfeng, WU Bin, et al. Phishing Website Detection Based on Deep Convolutional Neural Network and Random Forest Ensemble Learning[EB/OL]. (2021-12-10) [2023-06-10]. https://pubmed.ncbi.nlm.nih.gov/34960375/.
[14]	LI Guojing, YIN Tianyang, ZHANG Xingrui. A Detection Method Gambling Websites Based on PAM[J]. Computer Applications and Sofware, 2021, 38(9): 167-172.
	李国静, 尹天阳, 张兴睿. 基于PAM概率主题模型的赌博网站检测方法[J]. 计算机应用与软件, 2021, 38(9):167-172.
[15]	FU A Y, LIU Wenyin, DENG Xiaotie. Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover’s Distance(EMD)[ J]. Dependable and Secure Computing, 2006, 3(4): 301-311.
[16]	DENG Li, DU Xin, SHEN Jizhong. Web Page Classification Based on Heterogeneous Features and a Combination of Multiple Classifiers[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(7): 995-1004.
[17]	DEVLIN J, CHANG Mingwei, LEE K, et al. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding[EB/OL]. (2019-05-24) [2023-06-10]. https://arxiv.org/abs/1810.04805.
[18]	ZHANG Mingquan, ZHOU Hui, CAO Jingang. Reseach on Dual BERT Directed Sentiment Text Classification Based on Attention Mechanism[J]. CAAI Transactions on Intelligent Systems, 2022, 17(6): 1220-1227.
	张铭泉, 周辉, 曹锦纲. 基于注意力机制的双BERT有向情感文本分类研究[J]. 智能系统学报, 2022, 17(6):1220-1227.
[19]	LIU Bo, PU Yifei. BERT-Based Approach for Long Document Classification[J]. Journal of Sichuan University(Natural Science Edition), 2023, 60(2): 81-88.
	刘博, 蒲亦非. 基于BERT 的长文本分类方法[J]. 四川大学学报(自然科学版), 2023, 60(2):81-88.
[20]	SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[EB/OL]. (2015-04-10) [2023-06-10]. https://arxiv.org/abs/1409.1556.
[21]	LI Zhenbo, LI Meng, ZHAO Yuanyang, et al. Iced Pomfret Freshness Evaluation Method Based on Improved VGG-19 Convolutional Neural Networks[J]. Transactions of the Chinese Society of Agricultural Engineering, 2021, 37(22): 286-294.
	李振波, 李萌, 赵远洋, 等. 基于改进VGG-19卷积神经网络的冰鲜鲳鱼新鲜度评估方法[J]. 农业工程学报, 2021, 37(22):286-294.
[22]	ZHU Yimin, GUO Ruyan, JU Jiaji, et al. A Boosting Tree Classification Algorithm for Imbalanced Dataset Combined with Focal Loss[J]. Software Guide, 2021, 20(11): 65-69.
	朱翌民, 郭茹燕, 巨家骥, 等. 一种结合Focal Loss的不平衡数据集提升树分类算法[J]. 软件导刊, 2021, 20(11):65-69.
[23]	MAO Hao, LI Xinli, WANG Xiaowei, et al. Reaseach on Semantic Segmentaion of Transformer Substation Image Based on Multi-Category Focal Loss Fuction[J]. Journal of North China Electric Power University(Natural Science Edition), 2022, 49(5): 84-92.
	毛昊, 李新利, 王孝伟, 等. 基于多类别Focal Loss损失函数的变电站场景图像语义分割研究[J]. 华北电力大学学报(自然科学版), 2022, 49(5):84-92.

编辑推荐 0

Metrics

阅读次数

全文

204

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	42	0	0	162

来源	本网站	其他网站

次数	204	0
比例	100%	0%

摘要

341

最新录用	在线预览	正式出版

0	0	341

	来源	本网站

	次数	341
	比例	100%

软硬件名称	软硬件配置
操作系统	Windows 11
CPU	11th Gen Intel(R) Core(TM) i7-11800H @2.30 GHz (16 CPUs), 2.3 GHz
GPU	NVIDIA GeForce RTX 3070 Laptop GPU
内存	32 GB
IDE	PyTorch 1.12.1+cu113+PyCharm 2023.1.1
开发语言及版本	Python 3.8
第三方包	torch、nn、numpy、sklearn等

模型	Accuracy	Precision	Recall	F1
Bert	0.932	0.932	0.931	0.931
VGG16	0.850	0.849	0.851	0.850
VGG19	0.873	0.870	0.879	0.874
Bert-VGG16	0.942	0.944	0.940	0.942
Bert-VGG19	0.951	0.954	0.948	0.951

正负样本比例	模型	Accuracy	Precision	Recall	F1
1:1	Bert-VGG19	0.951	0.954	0.948	0.951
1:1	Bert-VGG19-FL	0.956	0.959	0.954	0.956
1:5	Bert-VGG19	0.939	0.821	0.813	0.817
1:5	Bert-VGG19-FL	0.976	0.938	0.919	0.928
1:10	Bert-VGG19	0.937	0.615	0.808	0.698
1:10	Bert-VGG19-FL	0.978	0.857	0.909	0.882
1:20	Bert-VGG19	0.928	0.389	0.700	0.500
1:20	Bert-VGG19-FL	0.985	0.818	0.900	0.857

Bert		真实结果
Bert		正例（博彩）	反例（正常）
预测结果	正例（博彩）	TP：160	FP：80
预测结果	反例（正常）	FN：38	TN：910
Bert-VGG19		真实结果
Bert-VGG19		正例（博彩）	反例（正常）
预测结果	正例（博彩）	TP：161	FP：35
预测结果	反例（正常）	FN：37	TN：955
VGG19		真实结果
VGG19		正例（博彩）	反例（正常）
预测结果	正例（博彩）	TP：155	FP：101
预测结果	反例（正常）	FN：43	TN：889
Bert-VGG19-FL		真实结果
Bert-VGG19-FL		正例（博彩）	反例（正常）
预测结果	正例（博彩）	TP：182	FP：12
预测结果	反例（正常）	FN：16	TN：978