信息网络安全 ›› 2021, Vol. 21 ›› Issue (10): 54-62.doi: 10.3969/j.issn.1671-1122.2021.10.008

• 入选论文 • 上一篇    下一篇

基于行为图谱筛的恶意代码可视化分类算法

朱朝阳1, 周亮1, 朱亚运1, 林晴雯2,3()   

  1. 1.中国电力科学研究院有限公司信息通信研究所,北京 100192
    2.北京华夏信安科技有限公司,北京 100876
    3.北京邮电大学移动互联网安全相关技术国家工程实验室,北京 100876
  • 收稿日期:2021-06-25 出版日期:2021-10-10 发布日期:2021-10-14
  • 通讯作者: 林晴雯 E-mail:2019140856@bupt.edu.cn
  • 作者简介:朱朝阳(1974—),男,江西,正高级工程师,博士,主要研究方向为电力工控安全|周亮(1980—),男,湖北,高级工程师,博士,主要研究方向为电力系统信息安全|朱亚运(1990—),男,山西,高级工程师,博士,主要研究方向为电力工控安全|林晴雯(1997—),女,浙江,硕士研究生,主要研究方向为网络安全、软件安全
  • 基金资助:
    国家电网公司总部科技项目(521304190004)

Malicious Code Visual Classification Algorithm Based on Behavior Knowledge Graph Sieve

ZHU Chaoyang1, ZHOU Liang1, ZHU Yayun1, LIN Qingwen2,3()   

  1. 1. Institute of Information and Communication, China Electric Power Research Institute Co., Ltd, Beijing 100192, China
    2. Beijing HXIS Technology Co. Ltd, Beijing 100876, China
    3. National Engineering Laboratory of Mobile Internet Security Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2021-06-25 Online:2021-10-10 Published:2021-10-14
  • Contact: LIN Qingwen E-mail:2019140856@bupt.edu.cn

摘要:

近年来,恶意病毒产业链逐渐形成一个组织良好的市场并涉及巨额的资金,反恶意软件面临的主要挑战是需要对大量的数据和文件样本进行评估,以确定潜在的恶意意图。基于此,文章提出了一种基于行为图谱筛的恶意代码可视化分类算法。该算法分析了恶意代码样本的汇编指令流,提取程序行为指纹,并利用知识图谱对指纹内容进行转义,从而生成指定样本的图谱筛。通过对图谱筛中的污点定位,该算法对恶意程序样本中的噪点进行清理,生成对应的筛后指纹。筛后指纹在保留原有指纹特征的前提下,达到了76.3%的压缩率。最后,该算法对筛后指纹进行了可视化分析和操作码序列分析,并利用随机森林算法进行分类工作,达到了98.8%的准确率。实验证明,基于行为图谱筛的恶意代码可视化分类算法,在恶意代码分类方面能达到更好的效果。

关键词: 知识图谱, 恶意代码分类, 可视化分类算法

Abstract:

In recent years, the virus industry has gradually formed a well-organized market and involves a huge amount of money. The main challenge facing today’s anti malware is to evaluate a large number of data and file samples to determine the potential malicious intent. Based on this, this paper proposes a visual classification algorithm of malicious code based on behavior graph sieve. The algorithm analyzes the assembly instruction flow of malicious code samples, extracts the program behavior fingerprint, and uses the knowledge map to escape the fingerprint content, so as to generate the fingerprint screen of the specified samples. By locating the spots in the fingerprint screen, the algorithm cleans up the noise in the malware samples and generates the corresponding fingerprint after screening. On the premise of retaining the original fingerprint features, the compression rate of the sifted fingerprint is 76.3%. Finally, the algorithm carries out visual analysis and opcode sequence analysis on the sifted fingerprint, and uses random forest algorithm for classification, which achieves 98.8% accuracy. Experiments show that the visual classification algorithm of malicious code based on behavior graph sieve can achieve better results in the classification of malicious code.

Key words: knowledge graph, malicious code classification, visual classification algorithm

中图分类号: