信息网络安全 ›› 2018, Vol. 18 ›› Issue (8): 1-7.doi: 10.3969/j.issn.1671-1122.2018.08.001

• •    下一篇

基于机器学习的恶意文档识别工具设计与实现

文伟平1(), 吴勃志1, 焦英楠2, 何永强1   

  1. 1.北京大学软件与微电子学院,北京102600
    2.国家计算机网络应急技术处理协调中心, 北京100029
  • 收稿日期:2018-04-09 出版日期:2018-08-20 发布日期:2020-05-11
  • 作者简介:

    作者简介:文伟平(1976—),男,湖南,教授,博士,主要研究方向为网络攻击与防范、恶意代码研究、信息系统逆向工程和可信计算技术等;吴勃志(1990—),男,广东,硕士研究生,主要研究方向为漏洞分析和漏洞挖掘;焦英楠(1983—),女,辽宁,工程师,硕士,主要研究方向为软件工程、信息安全等;何永强(1984—),男,四川,硕士研究生,主要研究方向为软件工程。

  • 基金资助:
    国家自然科学联合基金[U1736218]

Design and Implementation on Malicious Documents Detection Tool Based on Machine Learning

Weiping WEN1(), Bozhi WU1, Yingnan JIAO2, Yongqiang HE1   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 102600, China
    2. National Computer Network Emergency Response Technical Team / Coordination Center, Beijing 100029, China
  • Received:2018-04-09 Online:2018-08-20 Published:2020-05-11

摘要:

随着网络化、信息化的程度进一步提高,高级持续性威胁(Advanced Persistent Threat,APT)事件不断增多,给国家、企业的安全发展带来了严重威胁和巨大经济损失。APT攻击通过定向情报收集、单点攻击突破、控制通道构建、内部横向渗透和数据收集上传等一系列步骤对特定目标进行长期持续的网络攻击。而在单点攻击突破阶段,最常用的网络攻击技术手段是采用植入远程木马的恶意文档,所以有效检测和识别恶意文档十分必要。文章在对现状进行充分调研后,提出一种基于机器学习的恶意文档检测方法。通过结合虚拟沙箱对未知文档进行动态行为分析,设计并实现了一种恶意文档识别工具。实验证明,该工具基于机器学习方式,可以高效处理和识别大规模的恶意文档文件。

关键词: 恶意文档, 机器学习, 特征向量, 虚拟沙箱

Abstract:

With the further improvement of the degree of network and information, the advanced persistent threat (APT) events are increasing, which brings serious threat to the security development of the state and huge economic losses to enterprises. APT attack carries out a long-term continuous network attack on specific target by using a series of steps which include targeted intelligence collection, single point attack breakthrough, control channel construction, internal horizontal penetration and data collection and upload and so on. In the single point attack breakthrough stage, the most commonly used technology of network attack is to use malicious documents implanted remote Trojans, so it is necessary to detect and identify malicious documents. After fully investigating the status quo, this paper proposes a malicious document detection method based on machine learning. By analyzing dynamic behaviors of unknown documents combining with virtual sandbox, a malicious document recognition tool is designed and implemented. Experiments show that the tool can efficiently process and identify large-scale malicious documents based on machine learning.

Key words: malicious document, machine learning, feature vector, virtual sandbox

中图分类号: