信息网络安全 ›› 2020, Vol. 20 ›› Issue (12): 72-82.doi: 10.3969/j.issn.1671-1122.2020.12.010
收稿日期:
2020-09-19
出版日期:
2020-12-10
发布日期:
2021-01-12
通讯作者:
张磊
E-mail:zhanglei2018@scu.edu.cn
作者简介:
谭杨(1993—),女,重庆,硕士研究生,主要研究方向为恶意代码检测|刘嘉勇(1962—),男,四川,教授,博士,主要研究方向为信息安全、网络通信与网络安全|张磊(1983—),男,四川,助理研究员,博士,主要研究方向为恶意代码分析
基金资助:
TAN Yang, LIU Jiayong, ZHANG Lei()
Received:
2020-09-19
Online:
2020-12-10
Published:
2021-01-12
Contact:
ZHANG Lei
E-mail:zhanglei2018@scu.edu.cn
摘要:
恶意代码作者通常会不断演化软件版本,形成恶意软件家族,现有的恶意软件家族分类方法,在特征选择的鲁棒性和分类算法的有效性、准确性方面还有待改进。为此,文章提出一种基于混合特征的深度自动编码的恶意软件分类方法。首先,通过提取恶意样本的动态API序列特征和静态字节熵特征作为混合特征,可以获取恶意样本的全局结构;然后,利用深度自编码器对高维特征进行降维处理;最后,将获得的低维特征输入到极端梯度提升(eXtreme Gradient Boosting,XGBoost)算法分类器中,获得恶意软件的家族分类。实验结果表明,该方法可以正确、有效地区分不同恶意软件家族,分类的微平均AUC(Micro-average Area Under Curve)达到98.3%,宏平均AUC (Macro-average Area Under Curve)达到97.9%。
中图分类号:
谭杨, 刘嘉勇, 张磊. 基于混合特征的深度自编码器的恶意软件家族分类[J]. 信息网络安全, 2020, 20(12): 72-82.
TAN Yang, LIU Jiayong, ZHANG Lei. Malware Familial Classification of Deep Auto-encoder Based on Mixed Features[J]. Netinfo Security, 2020, 20(12): 72-82.
表1
家族分类的基本实验环境
项目 | 特征提取环境 | 特征降维环境 | 分类环境 | |
---|---|---|---|---|
Cuckoo服务器 | 分析客户机 | |||
CPU | Intel Core i5-3210 | 虚拟机 | Intel Core i7-9700 | Intel Xeon E3-1231v3 |
内存 | 8G | 2G | 32G | 16G |
硬盘 | 240G SSD | 40G | 256G SSD | 256G SSD+ 1T机械 |
操作 系统 | Ubuntu 18.04 | Windows 7 | Tensorflow,Keras,Python 3.6 | Python 3.6 |
软件 环境 | VirtualBox+ Python 2.7 | Python 2.7 | Ubuntu 18.04LTS | Windows10 |
GPU | / | / | GTX2070 super | / |
表1
家族分类的基本实验环境
项目 | 特征提取环境 | 特征降维环境 | 分类环境 | |
---|---|---|---|---|
Cuckoo服务器 | 分析客户机 | |||
CPU | Intel Core i5-3210 | 虚拟机 | Intel Core i7-9700 | Intel Xeon E3-1231v3 |
内存 | 8G | 2G | 32G | 16G |
硬盘 | 240G SSD | 40G | 256G SSD | 256G SSD+ 1T机械 |
操作 系统 | Ubuntu 18.04 | Windows 7 | Tensorflow,Keras,Python 3.6 | Python 3.6 |
软件 环境 | VirtualBox+ Python 2.7 | Python 2.7 | Ubuntu 18.04LTS | Windows10 |
GPU | / | / | GTX2070 super | / |
[1] | McAfee. McAfee Threat Report[EB/OL]. https://www.mcafee.com/enterprise/en-us/threat-center/mcafee-labs/reports.html, 2020-07-18. |
[2] | HOSMER. Polymorphic & Metamorphic Malware[EB/OL]. https://www.blackhat.com/presentations/bh-usa-08/Hosmer/BH_US_08_Hosmer_Polymorphic_Malware.pdf, 2020-07-18. |
[3] | MA Zhou, GE Haoran, LIU Yang, et al. A Combination Method for Android Malware Detection Based on Control Flow Graphs and Machine Learning Algorithms[J]. IEEE Access, 2019(7):21235-21245. |
[4] | SIDDIQUI M, WANG M, LEE J. Data Mining Methods for Malware Detection Using Instruction Sequences[EB/OL]. https://www.researchgate.net/publication/234783325_Data_mining_methods_for_malware_detection_using_instruction_sequences, 2020-07-18. |
[5] | ZHOU Zizhan, WANG Junfeng. Research on Feature Extraction of Malware Bytecode Based on GPU Acceleration[J]. Journal of Sichuan University(Natural Science Edition), 2019,56(2):227-234. |
周紫瞻, 王俊峰. 基于GPU加速的恶意代码字节码特征提取方法研究[J]. 四川大学学报: 自然科学版, 2019,56(2):227-234. | |
[6] | YIN Heng, SONG D, EGELE M, et al. Panorama: Capturing System-Wide Information Flow for Malware Detection and Analysis[EB/OL]. https://dl.acm.org/doi/10.1145/1315245.1315261, 2020-07-18. |
[7] | ZHOU Huan. Malware Detection with Neural Network Using Combined Features[EB/OL]. https://xueshu.baidu.com/usercenter/paper/show?paperid=1q6g08407f5808k0c1200x1050097879&site=xueshu_se, 2020-07-18. |
[8] | ZHAO Jingling, ZHANG Suoxing, LIU Bohan, et al. Malware Detection Using Machine Learning Based on the Combination of Dynamic and Static Features[C]// IEEE. 27th International Conference on Computer Communication and Networks (ICCCN), July 30 - August 2, 2018, Hangzhou, China. New York: IEEE, 2018: 1-6. |
[9] | SU Mingyang, CHANG J, FUNG K T. Android Malware Detection Approaches in Combination with Static and Dynamic Features[J]. International Journal of Network Security, 2019,21(6):1031-1041. |
[10] | MANTOO B A, KHURANA S S. Static, Dynamic and Intrinsic Features Based Android Malware Detection Using Machine Learning[EB/OL]. https://link.springer.com/chapter/10.1007/978-3-030-29407-6_4, 2020-07-18. |
[11] | BOUNOUH T, BRAHIMI Z, AL-NEMRAT A, et al. A Scalable Malware Classification Based on Integrated Static and Dynamic Features[C]// Springer. International Conference on Global Security, Safety, and Sustainability. January 18-20, 2017. Northumbria Univ, London Campus, London, England. Switzerland: Springer, Cham, 2017: 113-124. |
[12] | TIWARI S R, SHUKLA R U. An Android Malware Detection Technique Using Optimized Permission and API with PCA[C]// IEEE. 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS). June 14-15, 2018. Vaigai Coll Engn, Madurai, India. New York: IEEE, 2018: 2611-2616. |
[13] | AZHAGUSUNDARI B, THANAMANI A S. Feature Selection Based on Information Gain[J]. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 2013,2(2):18-21. |
[14] | AGARAP A F. Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach Using Support Vector Machine (SVM) for Malware Classification[EB/OL]. https://arxiv.org/abs/1801.00318, 2020-07-18. |
[15] | MORALES-MOLINA C D, SANTAMARIA-GUERRERO D, SANCHEZ-PEREZ G, et al. Methodology for Malware Classification Using a Random Forest Classifier[C]// IEEE. 2018 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC). November 14-16, 2018. Ixtapa, Mexico. New York: IEEE, 2018: 1-6. |
[16] | WANG Jiong, LI Boquan, ZENG Yuwei. XGBoost-Based Android Malware Detection[C]// IEEE. 2017 13th International Conference on Computational Intelligence and Security (CIS). December 15-18, 2017. Hong Kong, China. New York: IEEE, 2017: 268-272. |
[17] |
SONG Runyi, LI Taoying, WANG Yan. Mammographic Classification Based on XGBoost and DCNN With Multi Features[J]. IEEE Access, 2020,8:75011-75021.
doi: 10.1109/Access.6287639 URL |
SONG Runyi, LI Taoying, WANG Yan. Mammographic Classification Based on XGBoost and DCNN With Multi Features[J]. IEEE Access, 2020,8:75011-75021.
doi: 10.1109/Access.6287639 URL |
|
[18] | DARUS F M, AHMAD N A, ARIFFIN A F M. Android Malware Classification Using XGBoost On Data Image Pattern[C]// IEEE. 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS). November 05-07, 2019. BALI, Indonesia. New York: IEEE, 2019: 118-122. |
DARUS F M, AHMAD N A, ARIFFIN A F M. Android Malware Classification Using XGBoost On Data Image Pattern[C]// IEEE. 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS). November 05-07, 2019. BALI, Indonesia. New York: IEEE, 2019: 118-122. | |
[19] | AAFER Y, DU W, YIN H. Droidapiminer: Mining API-Level Features for Robust Malware Detection in Android[C]// Springer. International conference on security and privacy in communication systems. September 25-28, 2013. Sydney, Australia. New York: Springer, 2013: 86-103. |
AAFER Y, DU W, YIN H. Droidapiminer: Mining API-Level Features for Robust Malware Detection in Android[C]// Springer. International conference on security and privacy in communication systems. September 25-28, 2013. Sydney, Australia. New York: Springer, 2013: 86-103. | |
[20] | NATANI P, VIDYARTHI D. Malware Detection Using API Function Frequency with Ensemble Based Classifier[C]// Springer. International Symposium on Security in Computing and Communication. August 22-24, 2013. Mysore, India. Berlin, Heidelberg: Springer, 2013: 378-388. |
NATANI P, VIDYARTHI D. Malware Detection Using API Function Frequency with Ensemble Based Classifier[C]// Springer. International Symposium on Security in Computing and Communication. August 22-24, 2013. Mysore, India. Berlin, Heidelberg: Springer, 2013: 378-388. | |
[21] | LIU Wu, REN Ping, LIU Ke, et al. Behavior-Based Malware Analysis and Detection[C]// IEEE. 2011 first international workshop on complexity and data mining. September 24-28, 2011. Nanjing, Jiangsu, China. Los Alamitos, CA, USA: IEEE, 2011: 39-42. |
LIU Wu, REN Ping, LIU Ke, et al. Behavior-Based Malware Analysis and Detection[C]// IEEE. 2011 first international workshop on complexity and data mining. September 24-28, 2011. Nanjing, Jiangsu, China. Los Alamitos, CA, USA: IEEE, 2011: 39-42. | |
[22] | CHO I K, KIM T G, SHIM Y J, et al. Malware Similarity Analysis Using API Sequence Alignments[J]. Journal of Internet Services and Information Security (JISIS), 2014,4(4):103-114. |
CHO I K, KIM T G, SHIM Y J, et al. Malware Similarity Analysis Using API Sequence Alignments[J]. Journal of Internet Services and Information Security (JISIS), 2014,4(4):103-114. | |
[23] | KIM H, KHOO Weiming, LIÒ P. Polymorphic Attacks Against Sequence-based Software Birthmarks[EB/OL]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.310.2755, 2020-07-18. |
KIM H, KHOO Weiming, LIÒ P. Polymorphic Attacks Against Sequence-based Software Birthmarks[EB/OL]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.310.2755, 2020-07-18. | |
[24] |
ELHADI A A E, MAAROF M A, BARRY B. Improving the Detection of Malware Behaviour Using Simplified Data Dependent API Call Graph[J]. International Journal of Security and Its Applications, 2013,7(5):29-42.
doi: 10.14257/ijsia URL |
ELHADI A A E, MAAROF M A, BARRY B. Improving the Detection of Malware Behaviour Using Simplified Data Dependent API Call Graph[J]. International Journal of Security and Its Applications, 2013,7(5):29-42.
doi: 10.14257/ijsia URL |
|
[25] |
ZENG Zhiping, TUNG A K H, WANG Jianyong, et al. Comparing Stars: on Approximating Graph Edit Distance[J]. Proceedings of the VLDB Endowment, 2009,2(1):25-36.
doi: 10.14778/1687627.1687631 URL |
ZENG Zhiping, TUNG A K H, WANG Jianyong, et al. Comparing Stars: on Approximating Graph Edit Distance[J]. Proceedings of the VLDB Endowment, 2009,2(1):25-36.
doi: 10.14778/1687627.1687631 URL |
|
[26] | DING Yuxin, XIA Xiaoling, CHEN Sheng, et al. A Malware Detection Method Based on Family Behavior Graph[J]. Computers & Security, 2018,73:73-86. |
DING Yuxin, XIA Xiaoling, CHEN Sheng, et al. A Malware Detection Method Based on Family Behavior Graph[J]. Computers & Security, 2018,73:73-86. | |
[27] | ZARNI Aung W Z. Permission-based Android Malware Detection[J]. International Journal of Scientific & Technology Research, 2013,2(3):228-234. |
ZARNI Aung W Z. Permission-based Android Malware Detection[J]. International Journal of Scientific & Technology Research, 2013,2(3):228-234. | |
[28] | KARBAB E M B, DEBBABI M, ALRABAEE S, et al. DySign: Dynamic Fingerprinting for the Automatic Detection of Android Malware[C]// IEEE. 2016 11th International Conference on Malicious and Unwanted Software (MALWARE). October 18-21, 2016. Fajardo, PR. New York: IEEE, 2016: 1-8. |
KARBAB E M B, DEBBABI M, ALRABAEE S, et al. DySign: Dynamic Fingerprinting for the Automatic Detection of Android Malware[C]// IEEE. 2016 11th International Conference on Malicious and Unwanted Software (MALWARE). October 18-21, 2016. Fajardo, PR. New York: IEEE, 2016: 1-8. | |
[29] | CHAN P P K, SONG Wenkai. Static Detection of Android Malware by Using Permissions and API Calls[C]// IEEE. 2014 International Conference on Machine Learning and Cybernetics. July 13-16, 2014. Lanzhou, China. New York: IEEE, 2014,1:82-87. |
CHAN P P K, SONG Wenkai. Static Detection of Android Malware by Using Permissions and API Calls[C]// IEEE. 2014 International Conference on Machine Learning and Cybernetics. July 13-16, 2014. Lanzhou, China. New York: IEEE, 2014,1:82-87. | |
[30] | DING Yuxin, WU Rui, XUE Fuxing. Detecting Android Malware Using Bytecode Image[C]// Springer. International Conference on Cognitive Computing. June 25-30, 2018. Seattle, WA. Switzerland: Springer, Cham, 2018: 164-169. |
DING Yuxin, WU Rui, XUE Fuxing. Detecting Android Malware Using Bytecode Image[C]// Springer. International Conference on Cognitive Computing. June 25-30, 2018. Seattle, WA. Switzerland: Springer, Cham, 2018: 164-169. | |
[31] | KANG B, KANG B J, KIM J, et al. Android Malware Classification Method: Dalvik Bytecode Frequency Analysis[EB/OL]. https://dl.acm.org/doi/abs/10.1145/2513228.2513295, 2020-07-18. |
KANG B, KANG B J, KIM J, et al. Android Malware Classification Method: Dalvik Bytecode Frequency Analysis[EB/OL]. https://dl.acm.org/doi/abs/10.1145/2513228.2513295, 2020-07-18. | |
[32] |
WOGNSEN E R, KARLSEN H S, OLESEN M C, et al. Formalisation and Analysis of Dalvik Bytecode[J]. Science of Computer Programming, 2014,92:25-55.
doi: 10.1016/j.scico.2013.11.037 URL |
WOGNSEN E R, KARLSEN H S, OLESEN M C, et al. Formalisation and Analysis of Dalvik Bytecode[J]. Science of Computer Programming, 2014,92:25-55.
doi: 10.1016/j.scico.2013.11.037 URL |
|
[33] | RATHORE H, AGARWAL S, SAHAY S K, et al. Malware Detection Using Machine Learning and Deep Learning[C]// Springer. Big Data Analytics. 6th International Conference, BDA 2018. December 18-21, 2018. Warangal, India. Switzerland: Springer, Cham, 2018: 402-411. |
RATHORE H, AGARWAL S, SAHAY S K, et al. Malware Detection Using Machine Learning and Deep Learning[C]// Springer. Big Data Analytics. 6th International Conference, BDA 2018. December 18-21, 2018. Warangal, India. Switzerland: Springer, Cham, 2018: 402-411. | |
[34] |
PEKTAŞ A, ACARMAN T. Deep Learning for Effective Android Malware Detection Using API Call Graph Embeddings[J]. Soft Computing, 2020,24(2):1027-1043.
doi: 10.1007/s00500-019-03940-5 URL |
PEKTAŞ A, ACARMAN T. Deep Learning for Effective Android Malware Detection Using API Call Graph Embeddings[J]. Soft Computing, 2020,24(2):1027-1043.
doi: 10.1007/s00500-019-03940-5 URL |
|
[35] | ABDULHAMMED R, FAEZIPOUR M, MUSAFER H, et al. Efficient Network Intrusion Detection Using PCA-based Dimensionality Reduction of Features[C]// IEEE. 2019 International Symposium on Networks, Computers and Communications (ISNCC). June 18-20, 2019. Istanbul, Turkey. Piscataway, NJ, USA: IEEE, 2019: 1-6. |
ABDULHAMMED R, FAEZIPOUR M, MUSAFER H, et al. Efficient Network Intrusion Detection Using PCA-based Dimensionality Reduction of Features[C]// IEEE. 2019 International Symposium on Networks, Computers and Communications (ISNCC). June 18-20, 2019. Istanbul, Turkey. Piscataway, NJ, USA: IEEE, 2019: 1-6. | |
[36] |
ABDULHAMMED R, MUSAFER H, ALESSA A, et al. Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection[J]. Electronics, 2019,8(3):322.
doi: 10.3390/electronics8030322 URL |
ABDULHAMMED R, MUSAFER H, ALESSA A, et al. Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection[J]. Electronics, 2019,8(3):322.
doi: 10.3390/electronics8030322 URL |
|
[37] | BELAISSAOUI M, JURASSEC J. A Deep Convolutional Neural Network for Image Malware Classification[J]. International Journal of Smart Security Technologies (IJSST), 2019,6(1):49-60. |
BELAISSAOUI M, JURASSEC J. A Deep Convolutional Neural Network for Image Malware Classification[J]. International Journal of Smart Security Technologies (IJSST), 2019,6(1):49-60. | |
[38] | KRUCZKOWSKI M, SZYNKIEWICZ E N. Support Vector Machine for Malware Analysis and Classification[C]// IEEE. 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT). August 11-14, 2014. Univ Warsaw, Warsaw, Poland. New York: IEEE, 2014,2:415-420. |
KRUCZKOWSKI M, SZYNKIEWICZ E N. Support Vector Machine for Malware Analysis and Classification[C]// IEEE. 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT). August 11-14, 2014. Univ Warsaw, Warsaw, Poland. New York: IEEE, 2014,2:415-420. | |
[39] | VirusShare. VirusShare (2019)[EB/OL]. https://virusshare.com/, 2020-07-18. |
VirusShare. VirusShare (2019)[EB/OL]. https://virusshare.com/, 2020-07-18. | |
[40] | VirusTotal. VirusTotal[EB/OL]. https://www.virustotal.com/gui/home/url, 2020-07-18. |
VirusTotal. VirusTotal[EB/OL]. https://www.virustotal.com/gui/home/url, 2020-07-18. | |
[41] | SEBASTIán M, RIVERA R, KOTZIAS P, et al. Avclass: A Tool for Massive Malware Labeling[C]// Springer. 19th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID). September 19-21, 2016. Paris, France. Switzerland: Springer, Cham, 2016(9854):230-253. |
SEBASTIán M, RIVERA R, KOTZIAS P, et al. Avclass: A Tool for Massive Malware Labeling[C]// Springer. 19th International Symposium on Research in Attacks, Intrusions, and Defenses (RAID). September 19-21, 2016. Paris, France. Switzerland: Springer, Cham, 2016(9854):230-253. | |
[42] | CHANG C C, LIN C J. LIBSVM: A Library for Support Vector Machines[J]. ACM transactions on intelligent systems and technology (TIST), 2011,2(3):1-27. |
CHANG C C, LIN C J. LIBSVM: A Library for Support Vector Machines[J]. ACM transactions on intelligent systems and technology (TIST), 2011,2(3):1-27. | |
[43] | SCHÖLKOPF B, WILLIAMSON R C, SMOLA A J, et al. Support Vector Method for Novelty Detection[EB/OL]. https://papers.nips.cc/paper/1999/file/8725fb777f25776ffa9076e44fcfd776-Paper.pdf, 2020-07-18. |
SCHÖLKOPF B, WILLIAMSON R C, SMOLA A J, et al. Support Vector Method for Novelty Detection[EB/OL]. https://papers.nips.cc/paper/1999/file/8725fb777f25776ffa9076e44fcfd776-Paper.pdf, 2020-07-18. |
[1] | 文伟平, 陈夏润, 杨法偿. 基于Rootkit隐藏行为特征的Linux恶意代码取证方法[J]. 信息网络安全, 2020, 20(11): 32-42. |
[2] | 侯留洋, 罗森林, 潘丽敏, 张笈. 融合多特征的Android恶意软件检测方法[J]. 信息网络安全, 2020, 20(1): 67-74. |
[3] | 乔延臣, 姜青山, 古亮, 吴晓明. 基于汇编指令词向量与卷积神经网络的恶意代码分类方法研究[J]. 信息网络安全, 2019, 19(4): 20-28. |
[4] | 刘延华, 高晓玲, 朱敏琛, 苏培煌. 基于数据特征学习的网络安全数据分类方法研究[J]. 信息网络安全, 2019, 19(10): 50-56. |
[5] | 张阳, 姚原岗. 基于Xgboost算法的网络入侵检测研究[J]. 信息网络安全, 2018, 18(9): 102-105. |
[6] | 李云春, 鲁文涛, 李巍. 基于Shapelet的恶意代码检测方法[J]. 信息网络安全, 2018, 18(3): 70-77. |
[7] | 周振飞, 方滨兴, 崔翔, 刘奇旭. 基于相似性分析的WordPress主题恶意代码检测[J]. 信息网络安全, 2017, 17(12): 47-53. |
[8] | 王毅, 唐勇, 卢泽新, 俞昕. 恶意代码聚类中的特征选取研究[J]. 信息网络安全, 2016, 16(9): 64-68. |
[9] | 蔡林, 陈铁明. Android移动恶意代码检测的研究概述与展望[J]. 信息网络安全, 2016, 16(9): 218-222. |
[10] | 张家旺, 李燕伟. 基于N-gram算法的恶意程序检测系统研究与设计[J]. 信息网络安全, 2016, 16(8): 74-80. |
[11] | 梁宏, 张慧云, 肖新光. 基于社会工程学的邮件样本关联分析[J]. 信息网络安全, 2015, 15(9): 180-185. |
[12] | 芦天亮, 周运伟, 曹巍. 移动互联网攻击技术及违法犯罪手段分析[J]. 信息网络安全, 2014, 14(9): 176-179. |
[13] | 任伟, 柳坤, 周金. AnDa:恶意代码动态分析系统[J]. 信息网络安全, 2014, 14(8): 28-33. |
[14] | . 电力移动智能终端安全技术研究[J]. , 2014, 14(4): 70-. |
[15] | 温志渊;翟健宏;徐径山;欧阳建国. 基于攻击行为树的恶意代码检测平台[J]. , 2013, 13(9): 0-0. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||