信息网络安全 ›› 2025, Vol. 25 ›› Issue (10): 1579-1588.doi: 10.3969/j.issn.1671-1122.2025.10.009

• 理论研究 • 上一篇    下一篇

基于CNN-BiLSTM-CBAM的多特征融合恶意PDF文档检测方法

王友贺, 孙奕()   

  1. 信息工程大学密码工程学院,郑州 450001
  • 收稿日期:2025-05-25 出版日期:2025-10-10 发布日期:2025-11-07
  • 通讯作者: 孙奕 E-mail:11112072@bjtu.edu.cn
  • 作者简介:王友贺(1998—),男,河南,硕士研究生,主要研究方向为网络与信息安全、恶意检测|孙奕(1979—),女,河南,教授,博士,主要研究方向为网络与信息安全、数据安全交换
  • 基金资助:
    河南省自然科学基金(242300420297)

Multi-Feature Fusion for Malicious PDF Document Detection Based on CNN-BiLSTM-CBAM

WANG Youhe, SUN Yi()   

  1. School of Cryptography Engineering, Information Engineering University, Zhengzhou 450001, China
  • Received:2025-05-25 Online:2025-10-10 Published:2025-11-07
  • Contact: SUN Yi E-mail:11112072@bjtu.edu.cn

摘要:

为应对现有恶意PDF文档检测方法忽视特征之间语义关系以及局限于单一类型的特征分析等问题,文章提出一种检测方案,将CNN-BiLSTM-CBAM的模型和多特征融合应用于恶意PDF文档检测中。该方法不仅融合了静态分析中提取的常规信息和结构信息,还结合了动态分析捕获的API序列信息,构建了一个全面多维的特征集。首先,该模型利用卷积神经网络提取特征集中的局部特征;然后,利用双向长短时记忆(BiLSTM)网络捕获特征间的依赖性和上下文语义关系特征,通过卷积块注意力模块(CBAM)为不同特征分配不同的权重,筛选出较具区分性的关键特征;最后,利用Softmax分类器计算检测结果。实验结果表明,与现有方法相比,该模型在准确率、召回率和F1分数等关键性能指标上均展现出显著优势,有效提升了恶意PDF文档的检测性能。

关键词: 恶意PDF文档检测, 多特征融合, 卷积块注意力模块, 双向长短时记忆网络

Abstract:

In order to solve the problems that the existing detection methods of malicious PDF documents ignore the semantic relationship between features and are often limited to a single type of feature analysis, this paper proposed a detection scheme, which applied the CNN-BiLSTM-CBAM model and multi-feature fusion to the detection of malicious PDF documents. This method not only integrated the conventional and structural information extracted from static analysis, but also combined the API sequence information captured by dynamic analysis to build a comprehensive multi-dimensional feature set. First, the model used convolutional neural network to extract local features of feature set. Secondly, BiLSTM was used to capture the dependency and context-semantic relationship between features, and convolution block attention module (CBAM) was used to assign different weights to different features to screen out the most distinguishable key features. Finally, Softmax classifier was used to calculate the detection results. The experimental results show that compared with the existing methods, the proposed model shows significant advantages in key performance indicators such as accuracy, recall and F1 score, and effectively improves the detection performance of malicious PDF documents.

Key words: malicious PDF document detection, multi-feature fusion, convolutional block attention module, BiLSTM

中图分类号: