信息网络安全 ›› 2019, Vol. 19 ›› Issue (4): 20-28.doi: 10.3969/j.issn.1671-1122.2019.04.003
收稿日期:
2018-12-10
出版日期:
2019-04-10
发布日期:
2020-05-11
作者简介:
作者简介:乔延臣(1988—),男,山东,助理研究员,博士,主要研究方向为网络安全、恶意代码;姜青山(1962—),男,河北,研究员,博士,主要研究方向为网络安全、数据挖掘、大数据分析与应用;古亮(1982—),男,四川,高级工程师,博士,主要研究方向为网络安全、云计算;吴晓明(1959—),男,辽宁,硕士,主要研究方向为通信网络管理、计算机通信及计算机网络管理。
基金资助:
Yanchen QIAO1,2(), Qingshan JIANG1, Liang GU2, Xiaoming WU3
Received:
2018-12-10
Online:
2019-04-10
Published:
2020-05-11
摘要:
针对目前恶意代码分类方法使用特征集过于依赖专家经验,以及特征维度较高导致的高复杂度问题,文章提出了一种基于汇编指令词向量与卷积神经网络(Convolutional Neural Network,CNN)的恶意代码分类方法。文章首先逆向恶意代码可执行文件获取汇编代码,将其中的汇编指令看作词,函数看作句子,从而将一个恶意代码转换为一个文档,然后对每个文档使用Word2Vec算法获取汇编指令的词向量,最后依据在训练样本集中统计的Top100汇编指令序列,将每个文档转换成一个矩阵。使用CNN在训练样本集上训练分类模型,结果表明该方法的平均准确率为98.56%。
中图分类号:
乔延臣, 姜青山, 古亮, 吴晓明. 基于汇编指令词向量与卷积神经网络的恶意代码分类方法研究[J]. 信息网络安全, 2019, 19(4): 20-28.
Yanchen QIAO, Qingshan JIANG, Liang GU, Xiaoming WU. Malware Classification Method Based on Word Vector of Assembly Instruction and CNN[J]. Netinfo Security, 2019, 19(4): 20-28.
[1] | AV-TEST INSTITUTE. Malware Statistics & Trends Report[EB/OL]..2018-6-15. |
[2] | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL].arXiv preprint arXiv:13013781, 2013-5-5. |
[3] | RONEN R, RADU M, FEUERSTEIN C, et al. Microsoft Malware Classification Challenge[EB/OL].arXiv preprint arXiv:180210135, 2018-6-15. |
[4] | SCHULTZ M G, ESKIN E, ZADOK F, et al.Data Mining Methods for Detection of New Malicious Executables[C]//IEEE. 2001 IEEE Symposium on Security and Privacy, May 14-16, 2001, Oakland, California, USA. New York: IEEE, 2001: 38-49. |
[5] | KOLTER J Z, MALOOF M A.Learning to Detect Malicious Executables in the Wild[C]//ACM. Proceedings of the 10th ACM SIGMOD International Conference on Knowledge Discovery and Data Mining, June 13-18, 2004, Paris, France. New York: ACM, 2004: 470-478. |
[6] | TIAN R, BATTEN L M, VERSTEEG S.Function Length as a Tool for Malware Classification[C]//IEEE. IEEE 3rd International Conference on Malicious and Unwanted Software, October 7-8, 2008, Alexandria, Virginia, USA. New York: IEEE, 2008: 69-76. |
[7] | SALEHI Z, GHIASI M, SAMI A.A Miner for Malware Detection Based on API Function Calls and Their Arguments[C]//IEEE. The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), May 2-3, 2012, Shiraz, Fars, Iran. New York: IEEE, 2012: 563-568. |
[8] | DAHL G E, STOKES J W, DENG L, et al.Large-scale Malware Classification Using Random Projections and Neural Networks[C]//IEEE. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, May 26-31, 2013, Vancouver, BC, Canada. New York: IEEE, 2013: 3422-3426. |
[9] | SAXE J, BERLIN K.Deep Neural Network Based Malware Detection Using two Dimensional Binary Program Features[C]//IEEE. IEEE 10th International Conference on Malicious and Unwanted Software, October 20-22, 2015, Fajardo, PR, USA. New York: IEEE, 2015: 11-20. |
[10] | NARI S, GHORBANI A A.Automated Malware Classification Based on Network Behavior[C]//IEEE. Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC), January 28-31, 2013, San Diego, California, USA. New York: IEEE, 2013: 642-647. |
[11] | PARK Y, REEVES D S, STAMP M.Deriving Common Malware Behavior through Graph Clustering[J]. Computers & Security, 2013, 39(6): 419-430. |
[12] | PASCANU R, STOKES J W, SANOSSIAN H, et al.Malware Classification with Recurrent networks[C]//IEEE. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apirl 19-24, 2015, South Brisbane, Queensland, Australia. New York: IEEE, 2015: 1916-1920. |
[13] | GIANNELLA C, BLOEDORN E.Spectral Malware Behavior Clustering[C]//IEEE. 2015 IEEE International Conference on Intelligence and Security Informatics (ISI), May 27-29, 2015, Baltimore, MD, USA. New York: IEEE, 2015: 7-12. |
[14] | HUANG Wenyi, STOKES J W.MtNet: A Multi-Task Neural Network for Dynamic Malware Classification[C]//Springer. Proceedings of the 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, July 7-8, 2016, San Sebastián, Spain. New York: Springer, 2016: 399-418. |
[15] | GAO Jin, HE Yahao, ZHANG Xiaoyan, et al.Duplicate Short Text Detection Based on Word2vec[C]//IEEE. Proceedings of 2017 IEEE 8th International Conference on Software Engineering and Service Science, November 24-26, 2017, Beijing, China. New York: IEEE, 2017: 53-57. |
[16] | ZHANG Dongwen, XU Hua, SU Zengcai, et al.Chinese Comments Sentiment Classification Based on Word2vec and SVM Perf[J].Expert Systems With Applications, 2015, 42(4): 1857-1863. |
[17] | POPOV I.Malware Detection Using Machine Learning Based on Word2vec Embeddings of Machine Code Instructions[C]//IEEE. IEEE 2017 Siberian Symposium on Data Science and Engineering (SSDSE), Apirl 12-13, 2017, Novosibirsk, Russia. New York: IEEE, 2017: 1-4. |
[18] | TRAN T K, SATO H.NLP-based Approaches for Malware Classification from API Sequences[C]//IEEE. The 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES 2017), November 15-17, 2017, Hanoi, Vietnam. New York: IEEE, 2017: 101-105. |
[19] | LE Q, MIKOLOV T.Distributed Representations of Sentences and Documents[C]//JMLR. The 31th International Conference on Machine Learning, June 21-26, 2014, Beijing, China. New York: JMLR, 2017: 1188-1196. |
[20] | CAKIR B, DOGDU E.Malware Classification Using Deep Learning Methods[C]//ACM. 2nd Annual Conference on Material Science and Engineering (ACMSE 2018), November 12-14, 2018, Dubai, United Arab Emirates. New York: ACM, 2018: 1-5. |
[21] | FRIEDMAN J H.Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2001, 29(5): 1189-1232. |
[22] | SHANKARAPANI M K, RAMAMOORTHY S, MOVVA R S, et al.Malware Detection Using Assembly and API Call Sequences[J].Journal in Computer Virology, 2011, 7(2): 107-119. |
[23] | FUKUSHIMA K.Neocognitron: A Hierarchical Neural Network Capable of Visual Pattern Recognition[J]. Neural Networks, 1988, 1(2): 119-130. |
[24] | YAN Zhicheng, JAGADEESH V, DECOSTE D, et al.HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification[C]//IEEE. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV 2015), December 7-13, 2015, Santiago, Chile. New York: IEEE, 2015: 2740-2748. |
[25] | KULKARNI P, ZEPEDA J, JURIE F, et al.Hybrid Multi-layer Deep CNN/Aggregator Feature for Image Classification[C]//IEEE. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 19-24, 2015, Brisbane, QLD, Australia. New York: IEEE, 2015: 1379-1383. |
[26] | WANG Jiang, YANG Yi, MAO Junhua, et al.CNN-RNN: A Unified Framework for Multi-label Image Classification[C]//IEEE. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 2016: 2285-2294. |
[27] | WEI Yunchao, XIA Wen, LIN Min, et al.HCP: A Flexible CNN Framework for Multi-label Image Classification[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2016, 38(9): 1901-1907. |
[28] | LECUN Y, BOTTOU L, BENGIO Y, et al.Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. |
[29] | KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet Classification with Deep Convolutional Neural Networks[C]//Springer. Proceedings of the 19th International Conference on Neural Information Processing, November 12-15, 2012, Doha, Qatar. New York: Springer, 2012: 1097-1105. |
[30] | SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[EB/OL]. arXiv preprint arXiv:1409.1556, 2014-3-15. |
[31] | SZEGEDY C, LIU W, JIA Y, et al.Going Deeper with Convolutions[C]//IEEE. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 2015: 1-9. |
[32] | HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al.Deep Residual Learning for Image Recognition[C]//IEEE. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 2016: 770-778. |
[33] | LECUN Y, BOSER B, DENKER J S, et al.Backpropagation Applied to Handwritten Zip Code Recognition[J]. Neural Comput, 1989, 1(4): 541-551. |
[1] | 李云春, 鲁文涛, 李巍. 基于Shapelet的恶意代码检测方法[J]. 信息网络安全, 2018, 18(3): 70-77. |
[2] | 周振飞, 方滨兴, 崔翔, 刘奇旭. 基于相似性分析的WordPress主题恶意代码检测[J]. 信息网络安全, 2017, 17(12): 47-53. |
[3] | 张谦, 高章敏, 刘嘉勇. 基于Word2vec的微博短文本分类研究[J]. 信息网络安全, 2017, 17(1): 57-62. |
[4] | 王毅, 唐勇, 卢泽新, 俞昕. 恶意代码聚类中的特征选取研究[J]. 信息网络安全, 2016, 16(9): 64-68. |
[5] | 蔡林, 陈铁明. Android移动恶意代码检测的研究概述与展望[J]. 信息网络安全, 2016, 16(9): 218-222. |
[6] | 张家旺, 李燕伟. 基于N-gram算法的恶意程序检测系统研究与设计[J]. 信息网络安全, 2016, 16(8): 74-80. |
[7] | 梁宏, 张慧云, 肖新光. 基于社会工程学的邮件样本关联分析[J]. 信息网络安全, 2015, 15(9): 180-185. |
[8] | 芦天亮, 周运伟, 曹巍. 移动互联网攻击技术及违法犯罪手段分析[J]. 信息网络安全, 2014, 14(9): 176-179. |
[9] | 任伟, 柳坤, 周金. AnDa:恶意代码动态分析系统[J]. 信息网络安全, 2014, 14(8): 28-33. |
[10] | . 电力移动智能终端安全技术研究[J]. , 2014, 14(4): 70-. |
[11] | 温志渊;翟健宏;徐径山;欧阳建国. 基于攻击行为树的恶意代码检测平台[J]. , 2013, 13(9): 0-0. |
[12] | 田庆宜. iOS系统恶意代码检测平台设计与实现[J]. , 2013, 13(10): 0-0. |
[13] | 李向东;刘晓;夏冰;郑秋生. 恶意代码检测技术及其在等级保护工作中的应用[J]. , 2012, 12(8): 0-0. |
[14] | 贾菲;刘威. 基于Android平台恶意代码逆向分析技术的研究[J]. , 2012, 12(4): 0-0. |
[15] | 何世平. 2011年12月网络安全监测数据分析[J]. , 2012, 12(2): 0-0. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||