信息网络安全 ›› 2020, Vol. 20 ›› Issue (2): 57-57.doi: 10.3969/j.issn.1671-1122.2020.02.008

• • 上一篇    下一篇

基于Stacking模型融合的勒索软件动态检测算法

吕宗平1, 赵春迪1,2(), 顾兆军1, 周景贤1   

  1. 1.中国民航大学信息安全测评中心,天津 300300
    2.中国民航大学计算机科学与技术学院,天津 300300
  • 收稿日期:2019-09-09 出版日期:2020-02-10 发布日期:2020-05-11
  • 作者简介:

    作者简介:吕宗平(1964—),男,湖北,研究员,硕士,主要研究方向为网络与信息安全、民航信息系统;赵春迪(1993—),女,山东,硕士研究生,主要研究方向为网络与信息安全;顾兆军(1966—),男,山东,教授,博士,主要研究方向为网络与信息安全、民航信息系统;周景贤(1981—),男,河南,副研究员,博士,主要研究方向为大数据与网络安全。

  • 基金资助:
    国家自然科学基金[61601467, U1533104];民航科技基金[MHRD20140205, MHRD20150233];民航安全能力建设基金[PESA170003, PESA2018079, PESA2018082, PESA2019073, PESA2019074]

Dynamic Detection of Ransomware Based on Stacking Model Fusion

LÜ Zongping1, ZHAO Chundi1,2(), GU Zhaojun1, ZHOU Jingxian1   

  1. 1. Information Security Evaluation Center, Civil Aviation University of China, Tianjin 300300, China
    2. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
  • Received:2019-09-09 Online:2020-02-10 Published:2020-05-11

摘要:

针对勒索软件多态变形的特点,动态行为特征分析检测方法被广泛使用,但该方法存在检测特征单一和机器学习算法过拟合的问题。在Stacking模型融合方法的基础上,文章提出一种勒索软件检测算法——XRLStacking。首先,提取出勒索软件的全部原始动态特征并进行去冗余化处理,只保留每个样本调用的API名称、线程编号及顺序编号3类特征;然后,采用融合N-gram与TF-IDF算法去除对分类作用小的特征,以确保每个样本API之间具有有效的调用序列关系;最后,基于Stacking模型融合算法进行分类,以多特征组合来识别勒索软件。基于Cuckoo Sandbox沙盒产生大量真实数据的实验表明,文中算法具有较高的识别准确率,与XGBoost和RF算法相比,该算法能在一定程度上避免过拟合。

关键词: 勒索软件, 动态检测, XRLStacking算法, API, 序列关系

Abstract:

In view of the characteristics of polymorphic deformation of ransomware, dynamic behavior feature analysis and detection method is widely used, but this method also has the problems of single detection feature and over-fitting of machine learning algorithm. Based on the Stacking model fusion method, a new ransomware detection algorithm, XRLStacking, is proposed. Firstly, all the original dynamic features of ransomware are extracted and de-redundant processing is carried out. Only three kinds of features, namely API name, thread number and sequence number, are retained for each sample call. Then, the features with little effect on classification are optimized by using fusion of N-gram and TF-IDF algorithm to ensure that each sample API has some features. Finally, classification based on Stacking model fusion algorithm and multi-feature combination are used to identify blackmail software. Experiments based on Cuckoo Sandbox to generate a large number of real data show that the proposed algorithm has high recognition accuracy. At the same time, compared with XGBoost and random forest, this algorithm can avoid over-fitting to some extent.

Key words: ransomware, dynamic detection, XRLStacking algorithm, API, sequence relationship

中图分类号: