基于全局特征学习的挖矿流量检测方法

doi:10.3969/j.issn.1671-1122.2024.10.004

摘要/Abstract

摘要：

挖矿流量检测属于变长数据分类任务，现有的检测方案如关键字匹配、N-gram特征签名等基于局部特征的分类方法未能充分利用流量的全局特征。使用深度学习模型对挖矿流量进行建模，可以提取挖矿流量的全局特征，提高挖矿流量检测的准确率。文章提出的流量分类模型，使用Transformer编码器提取流量全局特征，然后使用序列总结器处理编码结果，获得用于分类的定长表示。由于挖矿样本在数据集中占比低于3%，使用准确率衡量模型的分类效果偏差较大，因此，文章综合考虑了模型的精确率和召回率，使用F1分数对模型的分类效果进行评估。在模型的编码器中使用正余弦位置编码可使模型在测试集上取得99.84%的F1分数，精确率达到100%。

关键词: 挖矿木马, 流量分类, 深度学习, 序列处理

Abstract:

Mining traffic detection is a variable-length data classification task. Existing detection schemes, such as keyword matching and N-gram feature signatures, which are based on local feature classification methods, fail to fully utilize the global features of traffic. By employing deep learning models to model mining traffic, global features within the mining traffic are extracted to enhance the accuracy of mining traffic detection. The traffic classification model proposed in the article first employed a Transformer encoder to extract global features of the traffic, followed by a sequence summarizer to process the encoded results, obtaining a fixed-length representation for classification. Due to the mining samples accounting for less than 3% in the dataset, using accuracy to measure the classification effect of the model leads to significant bias. Therefore, the article comprehensively considered the precision and recall of the model, and employed the F1 score to evaluate the classification performance. Utilizing sinusoidal positional encoding in the model’s encoder enables the model to achieve an F1 score of 99.84% on the test set, with a precision rate of 100%.

Key words: mining malware, traffic classification, deep learning, sequence processing

中图分类号:

TP309

魏金侠, 黄玺章, 付豫豪, 李婧, 龙春. 基于全局特征学习的挖矿流量检测方法[J]. 信息网络安全, 2024, 24(10): 1506-1514.

WEI Jinxia, HUANG Xizhang, FU Yuhao, LI Jing, LONG Chun. Mining Traffic Detection Method Based on Global Feature Learning[J]. Netinfo Security, 2024, 24(10): 1506-1514.

图/表 9

图1

图2

图3

表1

图4

图5

表2

表3

表4

参考文献 26

[1]	NAKAMOTO S. Bitcoin: A Peer-to-Peer Electronic Cash System[J]. Bitcoin, 2008, 4(2): 15-23.
[2]	WANG Weibing, SUN Xiulan. Research on the Money Laundering of Digital Currency and Analysis of the Difficulties in Cracking Down on It[J]. Cyberspace Security, 2021, 12 (2): 1-7.
	王伟兵, 孙秀兰. 数字货币洗钱黑色产业研究与打击难点分析[J]. 网络空间安全, 2021, 12 (2): 1-7.
[3]	LIU Feng, JIANG Jiaqi, HUANG Hao. Security Overview of Cryptocurrency Trading Media and Processes[J]. Netinfo Security, 2024, 24(3): 330-351.
	刘峰, 江佳齐, 黄灏. 面向加密货币交易介质及过程的安全综述[J]. 信息网络安全, 2024, 24(3): 330-351.
[4]	MA Jingyu, LI Quanlin. Mining Incentive and Reward Analysis in Blockchain Systems[EB/OL]. [2024-05-23]. https://sysmath.cjoe.ac.cn/jweb_xtkxysx/CN/10.12341/jssms23673.
	马静宇, 李泉林. 区块链系统中的挖矿激励与报酬分析[EB/OL]. [2024-05-23]. https://sysmath.cjoe.ac.cn/jweb_xtkxysx/CN/10.12341/jssms23673.
[5]	ZHENG Rui, WANG Qiuyun, LIN Zhuopang, et al. Cryptojacking Malware Hunting: A Method Based on Ensemble Learning of Hierarchical Threat Intelligence Feature[J]. Acta Electronica Sinica, 2022, 50(11): 2707-2715. doi: 10.12263/DZXB.20211333
	郑锐, 汪秋云, 林卓庞, 等. 一种基于威胁情报层次特征集成的挖矿恶意软件检测方法[J]. 电子学报, 2022, 50(11): 2707-2715. doi: 10.12263/DZXB.20211333
[6]	SHI Boxuan, LIN Shenwen, MAO Hongliang. Research on Mining Behavior Detection and Identification Technology Based on Network Traffic[J]. Application Research of Computers, 2022, 39(7): 1956-1960.
	史博轩, 林绅文, 毛洪亮. 基于网络流量的挖矿行为检测识别技术研究[J]. 计算机应用研究, 2022, 39(7): 1956-1960.
[7]	RAYNOR D B. Bitcoin Energy Consumption Worldwide from February 2017 to June 20, 2024[EB/OL]. (2024-06-21)[2024-06-21]. https://www.statista.com/statistics/881472/worldwide-bitcoin-energy-consumption/.
[8]	CNCERT. 2021 Malicious Mining Report[EB/OL]. (2022-05-07)[2024-05-23]. https://www.cert.org.cn/publish/main/upload/File/2021MaliciousMiningReport.pdf.
	CNCERT. 2021年恶意挖矿威胁趋势分析报告[EB/OL]. (2022-05-07)[2024-05-23]. https://www.cert.org.cn/publish/main/upload/File/2021MaliciousMiningReport.pdf.
[9]	WU Haiyan, HU Jinkun, CHEN Yaliang. Research on the Characteristics of Mining Trojan Gang Situations Based on Network Traffic[J]. New Telecommunications, 2023, 25(12): 13-15.
	吴海燕, 胡金坤, 陈亚亮. 基于网络流量的挖矿木马团伙态势特征研究[J]. 中国新通信, 2023, 25(12): 13-15.
[10]	CAO Chuanbo, GUO Chun, SHEN Guowei, et al. Cryptomining Malware Early Detection Method in Behavioral Diversity Period[J]. Acta Electronica Sinica, 2023, 51(7): 1850-1858. doi: 10.12263/DZXB.20220926
	曹传博, 郭春, 申国伟, 等. 面向行为多样期的挖矿恶意软件早期检测方法[J]. 电子学报, 2023, 51(7): 1850-1858. doi: 10.12263/DZXB.20220926
[11]	ZHONG Kai, GUO Chun, LI Xianchao, et al. Cryptomining Malware Early Detection Method Based on SDR[EB/OL]. [2024-05-23]. https://www.cnki.com.cn/Article/CJFDTotal-JSJA20240510006.htm.
	钟凯, 郭春, 李显超, 等. 基于SDR句嵌入的挖矿恶意软件早期检测方法[EB/OL]. [2024-05-23]. https://www.cnki.com.cn/Article/CJFDTotal-JSJA20240510006.htm.
[12]	XIN Yi, GAO Zelin, HUANG Weiqiang. Detection and Protection Technological Analysis of Crypto-Mining Trojan[J]. Cyberspace Security, 2022, 13(1): 41-46.
	辛毅, 高泽霖, 黄伟强. 挖矿木马的检测与防护技术分析[J]. 网络空间安全, 2022, 13(1): 41-46.
[13]	ZHOU Jingying, LI Yu, HUANG Kun, et al. Research and Implementation of Cryptocurrency Miner Detection Based on DNS Traffic Analysis[J]. Designing Techniques of Posts and Telecommunications, 2023(8): 48-52.
	周婧莹, 黎宇, 黄坤, 等. 基于DNS流量分析识别加密货币矿工的研究和实现[J]. 邮电设计技术, 2023(8): 48-52.
[14]	ZHANG Shize, WANG Zhiliang, YANG Jiahai, et al. MineHunter: A Practical Cryptomining Traffic Detection Algorithm Based on Time Series Tracking[C]// ACM. Annual Computer Security Applications Conference. New York: ACM, 2021: 1051-1063.
[15]	TONG Ruiqian, HU Xianan, LIU Youran, et al. Mining Traffic Detection Based on Automated Private Protocol Identification[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(7): 2304-2313.
	童瑞谦, 胡夏南, 刘优然, 等. 基于自动化私有协议识别的挖矿流量检测[J]. 北京航空航天大学学报, 2024, 50(7): 2304-2313.
[16]	ZHAO Ruitao, SONG Jinjie. An Abnormal Traffic Detection Method Based on Deep Learning[J]. Microcomputer Applications, 2024, 40(3): 11-14.
	赵瑞韬, 宋金杰. 基于深度学习的异常流量检测方法[J]. 微型电脑应用, 2024, 40(3): 11-14.
[17]	TU Xiaohan, ZHANG Chuanhao, LIU Mengran. Design and Implementation of Malicious Traffic Detection Model[J]. Netinfo Security, 2024, 24(4): 520-533.
	屠晓涵, 张传浩, 刘孟然. 恶意流量检测模型设计与实现[J]. 信息网络安全, 2024, 24(4): 520-533.
[18]	TIAN Aibao, WEI Jiaojiao, XIAO Junbi. Research on Network Traffic Prediction Based on Transformer[J]. Information Technology, 2024, 48(4): 156-160.
	田爱宝, 魏娇娇, 肖军弼. 基于Transformer的网络流量预测研究[J]. 信息技术, 2024, 48(4): 156-160.
[19]	SHI Boxuan, MAO Hongliang, LIN Shenwen. Analysis and Research on Monero Active Mining and Passive Mining[J]. Cyberspace Security, 2024, 15(1): 56-61.
	史博轩, 毛洪亮, 林绅文. 门罗币类挖矿主被动的分析与研究[J]. 网络空间安全, 2024, 15(1): 56-61.
[20]	FU Jihan, SHEN Wei. Characteristic Analysis Method of Web-Based Cryptojacking Based on Chrome DevTools Protocol[J]. Software Guide, 2022, 21(11): 110-115.
	傅继晗, 沈炜. 基于Chrome DevTools Protocol的网页挖矿劫持攻击特征分析方法[J]. 软件导刊, 2022, 21(11): 110-115.
[21]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All You Need[J]. Advances in Neural Information Processing Systems, 2017, 30: 5998-6008.
[22]	LIN Tianyang, WANG Yuxin, LIU Xiangyang, et al. A Survey of Transformers[J]. AI Open, 2022, 3: 111-132.
[23]	HUA Weizhe, DAI Zihang, LIU Hanxiao, et al. Transformer Quality in Linear Time[C]// PMLR. International Conference on Machine Learning. New York: PMLR, 2022: 9099-9117.
[24]	BHOJANAPALLI S, YUN C, RAWAT A S, et al. Low-Rank Bottleneck in Multi-Head Attention Models[C]// PMLR. International Conference on Machine Learning. New York: PMLR, 2020: 864-873.
[25]	SHAZEER N, LAN Zhenzhong, CHENG Youlong, et al. Talking-Heads Attention[EB/OL]. (2020-03-05)[2024-05-23]. https://arxiv.org/abs/2003.02436.
[26]	SU Jianlin, AHMED M, LU Yu, et al. RoFormer: Enhanced Transformer with Rotary Position Embedding[EB/OL]. (2023-11-24)[2024-05-23]. https://www.sciencedirect.com/science/article/abs/pii/S0925231223011864?via%3Dihub.

组号	0	1	2	3	4	5	6	7
训练集样本数	70227	11708	4371	2259	971	916	597	3089
训练集挖矿数	2720	678	237	30	12	10	3	41
测试集样本数	17548	2965	1097	532	251	288	144	797
测试集挖矿数	678	163	68	7	6	4	3	19

实验类型	正余弦位置编码	旋转位置编码	可学习位置编码	无位置编码
完整词表	99.92%	99.93%	99.93%	99.95%
完整混合	99.89%	99.73%	99.87%	99.88%
公共词表	99.99%	100%	100%	99.92%
公共混合	99.91%	99.91%	99.92%	99.81%

实验类型	正余弦位置编码	旋转位置编码	可学习位置编码	无位置编码
完整词表	99.63%	99.84%	99.68%	99.58%
完整混合	99.52%	99.47%	99.37%	99.36%
公共词表	99.84%	99.84%	99.74%	99.53%
公共混合	99.84%	99.63%	99.63%	99.68%

分类方法	精确率	召回率	F1分数
本文方法	100%	99.68%	99.84%
文献[14]方法	97.00%	99.70%	98.33%
文献[15]方法	—	98.50%	小于等于99.20%