基于Stacking模型融合的勒索软件动态检测算法

doi:10.3969/j.issn.1671-1122.2020.02.008

信息网络安全 ›› 2020, Vol. 20 ›› Issue (2): 57-57.doi: 10.3969/j.issn.1671-1122.2020.02.008

基于Stacking模型融合的勒索软件动态检测算法

吕宗平¹, 赵春迪^1,²(), 顾兆军¹, 周景贤¹

1.中国民航大学信息安全测评中心,天津 300300
2.中国民航大学计算机科学与技术学院,天津 300300

收稿日期:2019-09-09 出版日期:2020-02-10 发布日期:2020-05-11
作者简介:
作者简介：吕宗平（1964—）,男,湖北,研究员,硕士,主要研究方向为网络与信息安全、民航信息系统;赵春迪（1993—）,女,山东,硕士研究生,主要研究方向为网络与信息安全;顾兆军（1966—）,男,山东,教授,博士,主要研究方向为网络与信息安全、民航信息系统;周景贤（1981—）,男,河南,副研究员,博士,主要研究方向为大数据与网络安全。
基金资助:
国家自然科学基金[61601467, U1533104];民航科技基金[MHRD20140205, MHRD20150233];民航安全能力建设基金[PESA170003, PESA2018079, PESA2018082, PESA2019073, PESA2019074]

Dynamic Detection of Ransomware Based on Stacking Model Fusion

LÜ Zongping¹, ZHAO Chundi^1,²(), GU Zhaojun¹, ZHOU Jingxian¹

1. Information Security Evaluation Center, Civil Aviation University of China, Tianjin 300300, China
2. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China

Received:2019-09-09 Online:2020-02-10 Published:2020-05-11

摘要/Abstract

摘要：

针对勒索软件多态变形的特点,动态行为特征分析检测方法被广泛使用,但该方法存在检测特征单一和机器学习算法过拟合的问题。在Stacking模型融合方法的基础上,文章提出一种勒索软件检测算法——XRLStacking。首先,提取出勒索软件的全部原始动态特征并进行去冗余化处理,只保留每个样本调用的API名称、线程编号及顺序编号3类特征;然后,采用融合N-gram与TF-IDF算法去除对分类作用小的特征,以确保每个样本API之间具有有效的调用序列关系;最后,基于Stacking模型融合算法进行分类,以多特征组合来识别勒索软件。基于Cuckoo Sandbox沙盒产生大量真实数据的实验表明,文中算法具有较高的识别准确率,与XGBoost和RF算法相比,该算法能在一定程度上避免过拟合。

关键词: 勒索软件, 动态检测, XRLStacking算法, API, 序列关系

Abstract:

In view of the characteristics of polymorphic deformation of ransomware, dynamic behavior feature analysis and detection method is widely used, but this method also has the problems of single detection feature and over-fitting of machine learning algorithm. Based on the Stacking model fusion method, a new ransomware detection algorithm, XRLStacking, is proposed. Firstly, all the original dynamic features of ransomware are extracted and de-redundant processing is carried out. Only three kinds of features, namely API name, thread number and sequence number, are retained for each sample call. Then, the features with little effect on classification are optimized by using fusion of N-gram and TF-IDF algorithm to ensure that each sample API has some features. Finally, classification based on Stacking model fusion algorithm and multi-feature combination are used to identify blackmail software. Experiments based on Cuckoo Sandbox to generate a large number of real data show that the proposed algorithm has high recognition accuracy. At the same time, compared with XGBoost and random forest, this algorithm can avoid over-fitting to some extent.

Key words: ransomware, dynamic detection, XRLStacking algorithm, API, sequence relationship

中图分类号:

TP309

吕宗平, 赵春迪, 顾兆军, 周景贤. 基于Stacking模型融合的勒索软件动态检测算法[J]. 信息网络安全, 2020, 20(2): 57-57.

LÜ Zongping, ZHAO Chundi, GU Zhaojun, ZHOU Jingxian. Dynamic Detection of Ransomware Based on Stacking Model Fusion[J]. Netinfo Security, 2020, 20(2): 57-57.

图/表 9

图1

表1

图2

图3

表2

表3

图4

图5

表4

参考文献 22

[1]	FreeBuf. Blackmail: Backup Files Can Prevent Blackmail Software? It’s not Simple![EB/OL]. , 2019-5-11.
	FreeBuf.黑客勒索:备份文件就能防勒索软件?没那么简单[EB/OL]. , 2019-5-11.
[2]	LIN Jianbao, CUI Xiang, ZHANG Fangjiao.Blackmail Attacks are Becoming More and More Intense, and Security Awareness Needs to be Strengthened[J]. China Information Security, 2016(8): 79-83.
	林建宝,崔翔,张方娇.勒索攻击愈演愈烈,安全意识亟待加强[J]. 中国信息安全, 2016(8):79-83.
[3]	WANG Zihan.Research on the Tracing and Tracing Technology of Extortion Software[D]. Beijing: Beijing University of Posts and Telecommunication, 2019.
	王梓晗. 勒索软件追踪溯源技术研究[D].北京:北京邮电大学,2019.
[4]	Tencent Yujian Threat Intelligence Center. Special Report on Blackmail Virus in the First Half of 2019[EB/OL]. , 2019-5-11.
	腾讯御见威胁情报中心. 2019上半年勒索病毒专题报告[EB/OL]. , 2019-5-11.
[5]	ANDRONIO N, ZANERO S, MAGGI F.HelDroid: Dissecting and Detecting Mobile Ransomware[C]//Springer. International Symposium on Recent Advances in Intrusion Detection, November 2-4, 2015, Kyoto, Japan. Heidelberg: Springer, 2015: 382-404.
[6]	SGANDURRA D, MUÑOZ-GONZÁLEZ L, MOHSEN R, et al. Lupu: Automated Dynamic Analysis of Ransomware: Benefits, Limitations and Use for Detection[EB/OL]. , 2019-5-12.
[7]	CHANDRASEKAR R, MANOHARAN R. Malware Detection Using Windows API Sequence and Machine Learning[EB/OL]. , 2019-5-18.
[8]	HAN Lansheng, GAO Kunlun, ZHAO Baohua, et al.Behavior Detection of Malware Based on Combination of API Function and Its Parameters[J]. Application Research of Computers, 2013, 30(11): 3407-3410, 3425.
	韩兰胜,高昆仑,赵保华,等.基于API函数及其参数相结合的恶意软件行为检测[J]. 计算机应用研究,2013,30(11):3407-3410,3425.
[9]	ALAZAB M, LAYTON R, VENKATARAMAN S, et al. Malware Detection Based on Structural and Behavioural Features of API Calls[EB/OL]. , 2019-5-24.
[10]	KHARRAZ A, ROBERTSON W, BALZAROTTI D, et al. Cutting the Gordian Knot: A Look under the Hood of Ransomware Attacks[EB/OL]. , 2019-5-12.
[11]	CHEN Qian, ROBERT A. Bridges: Automated Behavioral Analysis of Malware: A Case Study of WannaCry Ransomware[EB/OL]. , 2019-5-13.
[12]	HAMPTON N, BAIG Z A, ZEADALLY S. Ransomware Behavioural Analysis on Windows Platforms[EB/OL]. , 2019-5-13.
[13]	GONG Qi, CAO Jinxuan, LU Tianliang, et al.Research on Detecting Ransomware Based on Characteristic Frequencies[J]. Application Research of Computers, 2018, 35(8): 2435-2438.
	龚琪,曹金璇,芦天亮,等.基于特征频繁度的勒索软件检测方法研究[J]. 计算机应用研究,2018,35(8):2435-2438.
[14]	LI Meng, JIA Xiaoqi, WANG Rui, et al.A Feature Selection and Modelling Method for Malicious Code[J]. Computer Applications and Software, 2015, 32(8): 266-271.
	李盟,贾晓启,王蕊,等.一种恶意代码特征选取和建模方法[J]. 计算机应用与软件,2015,32(8):266-271.
[15]	TIAN Dandan.Large Scale Web Page Classification Algorithm Based on Spectral Hashing[J]. Software Engineering and Applications, 2016, 5(1): 65-74.
	田郸郸. 基于谱哈希的大规模网页分类算法[J]. 软件工程与应用,2016,5(1):65-74.
[16]	XU Bing, LIU Xiaojie, LI Shuai.Detection Method of Encrypted Extortion Software Based on File Characteristics[J]. Data Communications, 2019(2): 5-8, 34.
	徐兵,刘晓洁,李帅.基于文件特征的加密型勒索软件检测方法[J]. 数据通信, 2019(2):5-8,34.
[17]	RIECK K, HOLZ T, WILLEMS C, et al.Learning and Classification of Malware Behavior[C]//Springer. 5th Conference on Detection of Intrusions and Malware and Vulnerability Assessment, July 10-11, Paris, France. Heidelberg: Springer, 2008: 108-125.
[18]	CHEN Tianqi, GUESTRIN C.XGBoost: A Scalable Tree Boosting System[C]//ACM. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 13-17, 2016, San Francisco, CA, USA. New York: ACM, 2016: 785-794.
[19]	FRIEDMAN N, GEIGER D, GOLDSZMID M.Bayesian Network Classifier[J]. Machine Learning, 1997, 29(1): 131-163.
[20]	SHEEN S, YADAV A. Ransomware Detection by Mining API Call Usage[EB/OL]. , 2019-5-18.
[21]	CHAWLA N V, BOWYER K W, HALL L O, et al. Smote: Synthetic Minority Over-sampling Technique[EB/OL]. , 2019-5-19.
[22]	XIANG Zihao, QIU Weidong.Detection Method of Ransomware Based on Machine Learning[J]. Information Technology, 2018(5): 79-82, 89.
	项子豪,邱卫东.基于机器学习的勒索软件检测方法[J]. 信息技术, 2018(5):79-82,89.

API名称	线程编号	顺序标号
InternetOpenA	2332	47
InternetOpenUrlA	2332	48
OpenSCManagerA	2332	33
CreateServiceA	2332	49
StartServiceA	2332	50

真实结果	预测值为勒索软件	预测值为良性软件
勒索软件	TP	FN
良性软件	FP	TN

r	能表示的勒索软件比例/%	能表示的良性软件比例/%
0.44	100	84.15
0.45	100	83.27
0.46	100	83.05
0.47	99.6	82.16

算法	P	R	F1	AUC	Logloss
XGBoost	0.9915	0.9375	0.967742	0.971772	0.030412
RF	0.9905	0.953125	0.976	0.974648	0.052767
XRLStacking	0.996	0.9375	0.967742	0.981643	0.026351

基于Stacking模型融合的勒索软件动态检测算法

Dynamic Detection of Ransomware Based on Stacking Model Fusion

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 22

相关文章 15

编辑推荐

Metrics

本文评价

[1]	郭春, 陈长青, 申国伟, 蒋朝惠. 一种基于可视化的勒索软件分类方法[J]. 信息网络安全, 2020, 20(4): 31-39.
[2]	殷明, 贾世杰. 一种局域网中基于SSD的防范勒索软件攻击技术[J]. 信息网络安全, 2019, 19(9): 71-75.
[3]	喻志彬, 马程, 李思其, 王淼. 基于Web应用层的DDoS攻击模型研究[J]. 信息网络安全, 2019, 19(5): 84-90.
[4]	蔡林, 陈铁明. Android移动恶意代码检测的研究概述与展望[J]. 信息网络安全, 2016, 16(9): 218-222.
[5]	向林波, 刘川意. 针对内部威胁的可控云计算关键技术研究与实现[J]. 信息网络安全, 2016, 16(3): 53-58.
[6]	胡雪, 封化民, 陈迎亚, 吴阳阳. 一种增强WAPI安全性的改进方法[J]. 信息网络安全, 2015, 15(8): 47-52.
[7]	杨春晖, 严承华. 基于进程管理的安全策略分析[J]. 信息网络安全, 2014, 14(8): 61-66.
[8]	. 基于 ISAPI 过滤器的 Web 防护系统[J]. , 2014, 14(7): 35-.
[9]	. Windows环境下进程空间信息深度挖掘方法研究[J]. , 2014, 14(4): 31-.
[10]	. 基于虚拟机与API调用监控技术的APT木马取证研究[J]. , 2014, 14(4): 78-.
[11]	苗俊峰, 马春光, 黄予洛, 李晓光. 3G-WLAN安全接入方案的研究与分析[J]. 信息网络安全, 2014, 14(10): 24-30.
[12]	刘文卓;丁杰;罗继明;李昕. 防键盘鼠标记录器的设计与实现[J]. , 2013, 13(3): 0-0.
[13]	刘浏;陈晓梅. 基于测试套优化的DHCP协议一致性可扩展测试系统设计[J]. , 2012, 12(8): 0-0.
[14]	郑焕鑫;叶小平. 基于API拦截的主动防御系统[J]. , 2012, 12(7): 0-0.
[15]	李康;辛阳;朱洪亮. Snort入侵检测系统中数据包捕获模块的分析与设计[J]. , 2012, 12(10): 0-0.