信息网络安全 ›› 2016, Vol. 16 ›› Issue (9): 64-68.doi: 10.3969/j.issn.1671-1122.2016.09.013

• • 上一篇    下一篇

恶意代码聚类中的特征选取研究

王毅(), 唐勇, 卢泽新, 俞昕   

  1. 国防科学技术大学计算机学院,湖南长沙 410073
  • 收稿日期:2016-07-25 出版日期:2016-09-20 发布日期:2020-05-13
  • 作者简介:

    作者简介: 王毅(1992—),男,湖南,硕士研究生,主要研究方向为恶意代码分析;唐勇(1979—),男,湖南,副研究员,博士,主要研究方向为网络与信息安全、数据挖掘;卢泽新(1963—),男,重庆,研究员,硕士,主要研究方向为计算机网络;俞昕(1992—),男,甘肃,硕士研究生,主要研究方向为网络与信息安全。

  • 基金资助:
    国家自然科学基金[61472437]

Research on Features Selection in Malware Clustering

Yi WANG(), Yong TANG, Zexin LU, Xin YU   

  1. School of Computer Science, National University of Defense Technology, Changsha Hunan 410073, China
  • Received:2016-07-25 Online:2016-09-20 Published:2020-05-13

摘要:

近几年,随着恶意代码数量的飞速增长,将聚类算法用于恶意代码新家族检测受到越来越多安全厂商的青睐。恶意代码聚类将具有相似行为或结构的样本划分到同一簇中,选取不同的特征将影响恶意代码的聚类质量。文章首先选取恶意代码聚类研究中常用的特征进行讨论比较。现有大部分研究均选取单一特征向量进行聚类,而任何单一特征向量均难以完整描述恶意代码的全部性质。针对该问题,文章接着提出利用多特征向量对的方法进行恶意代码聚类,并根据聚类结果定义特定的指标对选用的特征进行评价。最后,文章结合DBSCAN聚类算法对各种特征以及特征间的组合进行实验,结果表明,采用多特征向量对的聚类效果要优于单一特征向量。

关键词: 特征选取, 恶意代码, 聚类分析

Abstract:

The increment of malware has exploded in recent years. As a result, using cluster algorithm to detect malware families has received the favors of security vendors. Malware clustering is the task of converging sample that has similar behavior or structure in the same group (called a cluster), and features selection plays a vital role in malware clustering. Firstly this paper discusses carefully the common features used in existing study of malware clustering and compares these features with each other. The most of existing works focus on the clustering based on single feature vector, while single feature vector is not capable of describing all the characteristics of malware. To solve this problem, then multi feature vector pairs are proposed to cluster malware. Also, according to the clustering results, the specific indexes are defined to evaluate the selected feature vectors. Finally, combining with DBSCAN clustering algorithm, several feature vectors and their combinations are selected to test. The result shows that multi feature vector pairs are superior to single feature vector in identifying malware families.

Key words: features selection, malware, cluster analysis

中图分类号: