信息网络安全 ›› 2025, Vol. 25 ›› Issue (1): 159-172.doi: 10.3969/j.issn.1671-1122.2025.01.014

• 技术研究 • 上一篇    下一篇

基于集成学习的恶意代码动态检测方法

刘强1,2, 王坚1, 王亚男1(), 王珊3   

  1. 1.空军工程大学防空反导学院,西安 710051
    2.空军工程大学研究生院,西安 710051
    3.中国人民解放军94789部队,南京 210018
  • 收稿日期:2024-09-25 出版日期:2025-01-10 发布日期:2025-02-14
  • 通讯作者: 王亚男 E-mail:wyn1988814@163.com
  • 作者简介:刘强(1993—),男,陕西,助理工程师,硕士研究生,主要研究方向为网络空间安全和恶意代码检测|王坚(1982—),男,陕西,副教授,硕士,主要研究方向为智能信息处理和网络安全防护|王亚男(1988—),女,陕西,讲师,博士,主要研究方向为网络信息安全和人工智能|王珊(1989—),女,江苏,工程师,硕士,主要研究方向为信息通信技术
  • 基金资助:
    国家自然科学基金(61806219);国家自然科学基金(61703426);国家自然科学基金(61876189);陕西省高校科协青年人才托举计划(20190108);陕西省高校科协青年人才托举计划(20220106);陕西省创新能力支撑计划(2020KJXX-065)

A Dynamic Malware Detection Method Based on Ensemble Learning

LIU Qiang1,2, WANG Jian1, WANG Yanan1(), WANG Shan3   

  1. 1. School of Air Defense and Antimissile, Air Force Engineering University, Xi’an 710051, China
    2. Graduate School of Air Force Engineering University, Xi’an 710051, China
    3. 94789 Troop of PLA, Nanjing 210018, China
  • Received:2024-09-25 Online:2025-01-10 Published:2025-02-14
  • Contact: WANG Yanan E-mail:wyn1988814@163.com

摘要:

在当前网络环境中,不断升级的恶意代码变种为网络安全带来了巨大挑战。现有的人工智能模型虽然在恶意代码检测方面成效明显,但仍存在两个不可忽视的缺点。一是泛化能力较差,虽然在训练数据上表现优异,但受概念漂移现象的影响,在实际测试中性能不够理想;二是鲁棒性不佳,容易受到对抗样本的攻击。为解决上述问题,文章提出一种基于集成学习的恶意代码动态检测方法,根据API序列的不同特征,分别构建统计特征分析模块、语义特征分析模块和结构特征分析模块,各模块针对性地进行恶意代码检测,最后融合各模块分析结果,得出最终检测结论。在Speakeasy数据集上的实验结果表明,与现有研究方法相比,该方法各项性能指标具有明显优势,同时具有较好的鲁棒性,能够有效抵抗针对API序列的两种对抗攻击。

关键词: 恶意代码检测, n-gram算法, Transformer编码器, 图神经网络, 对抗性攻击

Abstract:

In the current network environment, constantly upgrading variants of malicious code pose significant challenges to network security. Although existing artificial intelligence models have shown significant effectiveness in detecting malicious code, there are still two undeniable shortcomings. Firstly, their generalization ability is poor. Although they perform well on training data, their performance is not ideal in actual testing due to the phenomenon of concept drift. Secondly, their robustness is poor and they are susceptible to attacks from adversarial samples. To solve the above problems, this paper proposed a dynamic detection method for malicious code based on ensemble learning. According to the different features of API sequences, statistical feature analysis module, semantic feature analysis module, and structural feature analysis module were respectively constructed. Each module performed targeted malicious code detection, and finally integrated the analysis results of each module to obtain the final detection conclusion. The experimental results on the Speakeasy dataset show that compared with existing research methods, this method has significant advantages in various performance indicators and good robustness, which can effectively resist two adversarial attack methods against API sequences.

Key words: malware detection, n-gram algorithm, Transformer encoder, graph neural network, adversarial attack

中图分类号: