基于多尺度卷积神经网络的恶意代码分类方法

doi:10.3969/j.issn.1671-1122.2022.10.005

信息网络安全 ›› 2022, Vol. 22 ›› Issue (10): 31-38.doi: 10.3969/j.issn.1671-1122.2022.10.005

基于多尺度卷积神经网络的恶意代码分类方法

刘家银¹^,²^,³, 李馥娟¹^,²^,³^,⁴(), 马卓¹^,²^,³, 夏玲玲¹^,²^,³

1.江苏警官学院计算机信息与网络安全系，南京 210031
2.江苏省电子数据取证分析工程研究中心，南京 210031
3.江苏省公安厅数字取证重点实验室，南京 210031
4.南京大学计算机软件新技术国家重点实验室，南京 210093

收稿日期:2022-08-12 出版日期:2022-10-10 发布日期:2022-11-15
通讯作者: 李馥娟 E-mail:lifujuan@jspi.cn
作者简介:刘家银（1986—），男，重庆，讲师，博士，主要研究方向为信息安全、机器学习|李馥娟（1974—），女，陕西，教授，硕士，主要研究方向为信息安全|马卓（1993—），女，山西，讲师，博士，主要研究方向为隐私保护、时间序列分析|夏玲玲（1988—），女，江苏，副教授，博士，主要研究方向为网络安全技术、网络传播动力学
基金资助:
国家自然科学基金(62272203);江苏省市场监督管理局科技计划项目(KJ21125027);江苏省公安厅科技研究项目(2020KX008);江苏省公安厅科技研究项目(2021KX011);江苏省高等学校自然科学基金(21KJD520003);计算机软件新技术国家重点实验室（南京大学）开放课题(KFKT2022B23)

Malware Classification Method Based on Multi-Scale Convolutional Neural Network

LIU Jiayin¹^,²^,³, LI Fujuan¹^,²^,³^,⁴(), MA Zhuo¹^,²^,³, XIA Lingling¹^,²^,³

1. Department of Computer Information and Cyber Security, Jiangsu Police Institute, Nanjing 210031, China
2. Jiangsu Electronic Data Forensics and Analysis Engineering Research Center, Nanjing 210031, China
3. Key Laboratory of Digital Forensics of Jiangsu Provincial Public Security Department, Nanjing 210031,China
4. State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing 210093, China

Received:2022-08-12 Online:2022-10-10 Published:2022-11-15
Contact: LI Fujuan E-mail:lifujuan@jspi.cn

摘要/Abstract

摘要：

恶意代码文件大小差异巨大，使用传统卷积神经网络对其可视化图像进行训练时会因分辨率调整导致大量信息丢失。为此，文章提出一种基于多尺度卷积神经网络的恶意代码分类方法。该方法首先将不同大小的恶意代码生成为多种特定分辨率的图像；然后利用DenseNet网络提取特征，避免因调整至同一分辨率导致信息损失；最后通过空间金字塔模型处理多尺度特征，进而训练分类模型。实验结果表明，该方法有效提高了恶意代码分类性能。

关键词: 恶意代码分类, 空间金字塔, 多尺度, 卷积神经网络

Abstract:

Because of the huge difference in size between different malware, one has to manually unify the resolution of their visualization images while training deep neural networks for malware classification, which may in turn cause severe information loss due to resolution adjustments. To this regard, this paper proposed a novel malware classification method based on the merits of multi-scale convolutional neural networks. Specifically, this method first visualized malware of different sizes into images of various specific resolutions, and then adopted the DenseNet network for feature extraction to avoid information loss in resolution unification. Finally, multi-scale features were processed through the spatial pyramid model to train the classification model. Extensive experimental results show that the proposed method could effectively improve the performance of malware classification.

Key words: malware classification, spatial pyramid, multi-scales, convolutional neural network

中图分类号:

TP309

刘家银, 李馥娟, 马卓, 夏玲玲. 基于多尺度卷积神经网络的恶意代码分类方法[J]. 信息网络安全, 2022, 22(10): 31-38.

LIU Jiayin, LI Fujuan, MA Zhuo, XIA Lingling. Malware Classification Method Based on Multi-Scale Convolutional Neural Network[J]. Netinfo Security, 2022, 22(10): 31-38.

图/表 8

图1

图2

图3

图4

表1

表2

图5

表3

参考文献 20

[1]	CNCERT/CC. Analysis Report on China's Internet Network Security Monitoring Data in the First Halfof 2021[EB/OL]. (2021-07-31)[2022-06-15]. https://www.cert.org.cn/publish/main/upload/File/first-half%20%20year%20cyberseurity%20report%202021.pdf.
	国家计算机网络应急技术处理协调中心. 2021年上半年我国互联网网络安全监测数据分析报告[EB/OL]. (2021-07-31)[2022-06-15]. https://www.cert.org.cn/publish/main/upload/File/first-half%20%20year%20cyberseurity%20report%202021.pdf.
[2]	KIM D, SHIN G, HAN M. Analysis of Feature Importance and Interpretation for Malware Classification[J]. Computers, Materials & Continua, 2020, 65(3): 1891-1904.
[3]	LIU Liu, WANG Baosheng, YU Bo, et al. Automatic Malware Classification and New Malware Detection Using Machine Learning[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 18(9): 1336-1347.
[4]	RAFF E, SYLVESTER J, NICHOLAS C. Learning the PE Header, Malware Detection with Minimal Domain Knowledge[C]// ACM. The 10th ACM Workshop on Artificial Intelligence and Security(AISec’17). New York: ACM, 2017: 121-132.
[5]	DAI Yusheng, LI Hui, QIAN Yekui, et al. SMASH: A Malware Detection Method Based on Multi-Feature Ensemble Learning[J]. IEEE Access, 2019, 7: 112588-112597.
[6]	FUJINO A, MURAKAMI J, MORI T. Discovering Similar Malware Samples Using Api Call Topics[C]// IEEE. 12th Annual IEEE Consumer Communications and Networking Conference(CCNC). New York: IEEE, 2015: 1-8.
[7]	LIM H, YAMAGUCHI Y, SHIMADA H, et al. Malware Classification Method Based on Sequence of Traffic Flow[C]// ACM. International Conference on Information Systems Security and Privacy(ICISSP). New York: ACM, 2015: 1-8.
[8]	TOBIYAMA S, YAMAGUCHIi Y, SHIMADA H, et al. Malware Detection with Deep Neural Network Using Process Behavior[C]// IEEE. 40th IEEE Annual Computer Software and Applications Conference. New York: IEEE, 2016: 577-582.
[9]	NATARAJ L, KARTHIKEYAN S, JACOB G, et al. Malware Images: Visualization and Automatic Classification[C]// ACM. The 8th International Symposium on Visualization for Cyber Security. New York: ACM, 2011: 1-7.
[10]	HAN Xiaoguang, QU Wu, YAO Xuanxia, et al. Research on Malicious Code Variants Detection Based on Texture Fingerprint[J]. Journal on Communications, 2014, 35(8): 125-136.
[11]	GUPTA S, BANSAL P, KUMAR S. ULBP-RF: A Hybrid Approach for Malware Image Classification[C]// IEEE. 2018 International Conference on Parallel, Distributed and Grid Computing(PDGC). New York: ACM, 2018: 115-119.
[12]	ZHAO Yuntao, XU Chunyu, BO Bo, et al. Maldeep: A Deep Learning Classification Framework Against Malware Variants Based on Texture Visualization[EB/OL]. (2019-04-01)[2022-06-15].https://doi.org/10.1155/2019/4895984.
[13]	YAKURA H, SHINOZAKI S, NISHIMURA R, et al. Malware Analysis of Imaged Binary Samples by Convolutional Neural Network with Attention Mechanism[C]// ACM. 8th ACM Conference on Data and Application Security and Privacy. New York: ACM, 2018: 127-134.
[14]	CHEN Bingcai, REN Zhongru, YU Chao, et al. Adversarial Examples for CNN-Based Malware Detectors[J]. IEEE Access, 2019, 7: 54360-54371.
[15]	FAN Zhipeng, LI Jun, LIU Yuqiang, et al. Classification of Malware Based on Gray Texture Fingerprint[J]. Science Technology and Engineering, 2020, 20(29): 12014-12020.
	范志鹏, 李军, 刘宇强, 等. 基于灰度纹理指纹的恶意代码分类[J]. 科学技术与工程, 2020, 20(29):12014-12020.
[16]	HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely Connected Convolutional Networks[C]// IEEE. IEEE Conference on Computer Vision and Pattern Recognition(CVPR). New York: IEEE, 2017: 4700-4708.
[17]	RONEN R, RADU M, FEUERSTEIN C, et al. Microsoft Malware Classification Challenge[EB/OL]. (2018-02-22)[2022-07-18]. https://arxiv.org/abs/1802.10135.
[18]	TAN Ruhan, ZUO Liming, LIU Ergen, et al. Malicious Code Detection Based on Image Feature Fusion[J]. Netinfo Security, 2021, 21(10): 90-95.
	谭茹涵, 左黎明, 刘二根, 等. 基于图像特征融合的恶意代码检测[J]. 信息网络安全, 2021, 21(10): 91-95.
[19]	QIAO Yanchen, JIANG Qingshan, GU Liang, et al. Malware Classification Method Based on Word Vector of Assembly Instruction and CNN[J]. Netinfo Security, 2019, 19(4): 20-28.
	乔延臣, 姜青山, 古亮, 等. 基于汇编指令词向量与卷积神经网络的恶意代码分类方法研究[J]. 信息网络安全, 2019, 19(4):20-28.
[20]	CHEN Xiaohan, WEI Shuning, QIN Zhengze. Malware Family Classification Based on Deep Learning Visualization[J]. Computer Engineering and Applications, 2021, 57(22): 131-138.
	陈小寒, 魏书宁, 覃正泽. 基于深度学习可视化的恶意软件家族分类[J]. 计算机工程与应用, 2021, 57(22):131-138.

家族序号	恶意代码家族名称	样本数量/个	恶意代码类型
1	Ramnit	1541	Worm
2	Lollipop	2478	Adware
3	Kelihos ver 3	2942	Backdoor
4	Vundo	475	Trojan
5	Simda	42	Backdoor
6	Tracur	751	Trojan Downloader
7	Kelihos ver 1	398	Backdoor
8	Obfuscator.ACY	1228	Any Kind of Obuscator Malware
9	Gatak	1013	Backdoor

算法模型	宏平均			加权平均
算法模型	准确率	召回率	F1值	准确率	召回率	F1值
查表法 +DenseNet	97.80%	96.58%	97.09%	98.15%	98.10%	98.08%
本文算法 +DenseNet	96.93%	97.12%	96.93%	97.66%	97.55%	97.57%
本文算法 +DenseNet-SPP	98.35%	97.31%	97.75%	98.98%	98.96%	98.96%

算法	特征	分类器	准确率
文献[18]算法	HOG+Dense SIFT	SVM	94.5%
文献[15]算法	灰度纹理图像	CNN	96.2%
文献[19]算法	汇编指令词向量灰度图	LeNet5	98.56%
文献[20]算法	操作码SimHash灰度图	RNN+CNN	98.8%
本文算法	多尺度灰度图像	DenseNet-SPP	98.98%

基于多尺度卷积神经网络的恶意代码分类方法

Malware Classification Method Based on Multi-Scale Convolutional Neural Network

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 20

相关文章 15

编辑推荐

Metrics

本文评价

[1]	刘光杰, 段锟, 翟江涛, 秦佳禹. 基于多特征融合的移动流量应用识别[J]. 信息网络安全, 2022, 22(7): 18-26.
[2]	王浩洋, 李伟, 彭思维, 秦元庆. 一种基于集成学习的列车控制系统入侵检测方法[J]. 信息网络安全, 2022, 22(5): 46-53.
[3]	刘峰, 杨成意, 於欣澄, 齐佳音. 面向去中心化双重差分隐私的谱图卷积神经网络[J]. 信息网络安全, 2022, 22(2): 39-46.
[4]	张郅, 李欣, 叶乃夫, 胡凯茜. 融合多重风格迁移和对抗样本技术的验证码安全性增强方法[J]. 信息网络安全, 2022, 22(10): 129-135.
[5]	高昌锋, 肖延辉, 田华伟. 基于多阶段渐进式神经网络的图像相机指纹提取算法[J]. 信息网络安全, 2022, 22(10): 15-23.
[6]	弋晓洋, 张健. 基于图像的网络钓鱼邮件检测方法研究[J]. 信息网络安全, 2021, 21(9): 52-58.
[7]	李彦霖, 蔡满春, 芦天亮, 席荣康. 遗传算法优化CNN的网站指纹攻击方法[J]. 信息网络安全, 2021, 21(9): 59-66.
[8]	杨铭, 张健. 基于图像识别的恶意软件静态检测模型[J]. 信息网络安全, 2021, 21(10): 25-32.
[9]	徐国天, 盛振威. 基于融合CNN与LSTM的DGA恶意域名检测方法[J]. 信息网络安全, 2021, 21(10): 41-47.
[10]	朱朝阳, 周亮, 朱亚运, 林晴雯. 基于行为图谱筛的恶意代码可视化分类算法[J]. 信息网络安全, 2021, 21(10): 54-62.
[11]	马骁, 蔡满春, 芦天亮. 基于CNN改进模型的恶意域名训练数据生成技术[J]. 信息网络安全, 2021, 21(10): 69-75.
[12]	李桥, 龙春, 魏金侠, 赵静. 一种基于LMDR和CNN的混合入侵检测模型[J]. 信息网络安全, 2020, 20(9): 117-121.
[13]	王湘懿, 张健. 基于图像和机器学习的虚拟化平台异常检测[J]. 信息网络安全, 2020, 20(9): 92-96.
[14]	刘静, 张学谦, 刘全明. 混合Gabor的轻量级卷积神经网络的验证码识别研究[J]. 信息网络安全, 2020, 20(7): 77-84.
[15]	张蕾华, 黄进, 张涛, 王生玉. 视频侦查中人像智能分析应用及算法优化[J]. 信息网络安全, 2020, 20(5): 88-93.