信息网络安全 ›› 2023, Vol. 23 ›› Issue (5): 62-75.doi: 10.3969/j.issn.1671-1122.2023.05.007

• 技术研究 • 上一篇    下一篇

基于Siamese架构的恶意软件隐藏函数识别方法

陈梓彤, 贾鹏(), 刘嘉勇   

  1. 四川大学网络空间安全学院,成都 610065
  • 收稿日期:2022-12-15 出版日期:2023-05-10 发布日期:2023-05-15
  • 通讯作者: 贾鹏 E-mail:pengjia@scu.edu.cn
  • 作者简介:陈梓彤(1997—),男,广西,硕士研究生,主要研究方向为二进制安全|贾鹏(1988—),男,河南,副教授,博士,主要研究方向为漏洞挖掘、软件动静态分析|刘嘉勇(1962—),男,四川,教授,博士,主要研究方向为网络应用安全、信息内容安全
  • 基金资助:
    国家自然科学基金(61902265)

Identification Method of Malicious Software Hidden Function Based on Siamese Architecture

CHEN Zitong, JIA Peng(), LIU Jiayong   

  1. School of Cyber Science and Engineering, Sichuan University, Chengdu 610065, China
  • Received:2022-12-15 Online:2023-05-10 Published:2023-05-15
  • Contact: JIA Peng E-mail:pengjia@scu.edu.cn

摘要:

目前,隐藏技术已被普遍应用于恶意软件中,以避免反病毒引擎的检测及研究人员的反向分析,所以有效识别恶意软件中的隐藏函数对于恶意软件代码检测和深度分析具有重要意义。但在该领域上,现有方法不同程度都存在一些问题,如无法取得高准确性、对样本量少或者样本类别分布不平衡的数据集的鲁棒性较差等。为实现实用的针对恶意软件隐藏函数的检测方法,文章提出一种新颖的基于Siamese架构的识别方法来检测隐藏函数的类型。该方法可以有效提高隐藏函数识别的准确性,Siamese架构的引入改善了小样本量数据集鲁棒性差的问题。针对从恶意软件中提取的15种常见类型的隐藏函数的数据集进行实验,结果表明,该方法生成的嵌入向量较嵌入神经网络SAFE具有更好的质量,该方法较几种常用的隐藏函数检测工具有更高的检测精度。

关键词: 二进制分析, 隐藏函数检测, 神经网络, 指令嵌入

Abstract:

At present, hiding technology has been widely used in malware to avoid the detection of anti-virus engines and reverse analysis by researchers. Therefore, effective identification of hidden functions in malware is of great significance for malware code detection and in-depth analysis. However, in this field, the existing methods have more or less problems, such as inability to obtain high accuracy, poor robustness to data sets with small sample size or unbalanced distribution of sample categories. In order to implement a practical detection method for malicious software hidden functions, a novel identification method based on Siamese architecture is proposed to detect the type of hidden functions. This method can effectively improve the accuracy of hidden function recognition, and the introduction of Siamese architecture improves the problem of poor robustness of small sample size data sets. For the dataset of 15 common types of hidden functions extracted from malicious software, the experimental results show that the embedded vector generated by this method has better quality than the nearest embedded neural network SAFE, and this method has higher detection accuracy than several common hidden function detection tools.

Key words: binary analysis, hidden function detection, neural network, instruction embedding

中图分类号: