Netinfo Security ›› 2018, Vol. 18 ›› Issue (8): 1-7.doi: 10.3969/j.issn.1671-1122.2018.08.001

• Orginal Article •     Next Articles

Design and Implementation on Malicious Documents Detection Tool Based on Machine Learning

Weiping WEN1(), Bozhi WU1, Yingnan JIAO2, Yongqiang HE1   

  1. 1. School of Electronics Engineering and Computer Science, Peking University, Beijing 102600, China
    2. National Computer Network Emergency Response Technical Team / Coordination Center, Beijing 100029, China
  • Received:2018-04-09 Online:2018-08-20 Published:2020-05-11

Abstract:

With the further improvement of the degree of network and information, the advanced persistent threat (APT) events are increasing, which brings serious threat to the security development of the state and huge economic losses to enterprises. APT attack carries out a long-term continuous network attack on specific target by using a series of steps which include targeted intelligence collection, single point attack breakthrough, control channel construction, internal horizontal penetration and data collection and upload and so on. In the single point attack breakthrough stage, the most commonly used technology of network attack is to use malicious documents implanted remote Trojans, so it is necessary to detect and identify malicious documents. After fully investigating the status quo, this paper proposes a malicious document detection method based on machine learning. By analyzing dynamic behaviors of unknown documents combining with virtual sandbox, a malicious document recognition tool is designed and implemented. Experiments show that the tool can efficiently process and identify large-scale malicious documents based on machine learning.

Key words: malicious document, machine learning, feature vector, virtual sandbox

CLC Number: