信息网络安全 ›› 2025, Vol. 25 ›› Issue (8): 1231-1239.doi: 10.3969/j.issn.1671-1122.2025.08.005

• 理论研究 • 上一篇    下一篇

基于多维度特征和LightGBM-AdaBoost的WebShell检测方法

高见1, 何俊鹏1,2(), 苗青青1   

  1. 1.中国人民公安大学信息网络安全学院,北京 100038
    2.自贡市公安局,自贡 643002
  • 收稿日期:2025-06-05 出版日期:2025-08-10 发布日期:2025-09-09
  • 通讯作者: 何俊鹏 E-mail:ku3n@qq.com
  • 作者简介:高见(1982—),男,山东,副教授,博士,主要研究方向为网络安全|何俊鹏(2000—),男,四川,硕士研究生,主要研究方向为网络安全、APT攻击|苗青青(2003—),女,山东,硕士研究生,主要研究方向为网络安全
  • 基金资助:
    国家重点研发计划(2022YFC3301101);中央高校基本科研业务费专项资金(2024JKF17)

WebShell Detection Method Based on Multi-Dimensional Features and LightGBM-AdaBoost

GAO Jian1, HE Junpeng1,2(), MIAO Qingqing1   

  1. 1. School of Information Network Security, People's Public Security University of China, Beijing 100038, China
    2. Zigong Municipal Public Security Bureau, Zigong 643002, China
  • Received:2025-06-05 Online:2025-08-10 Published:2025-09-09

摘要:

针对传统文本检测方法在WebShell文件检测中的准确率较低、现有机器学习或深度学习算法多聚焦于PHP 类型的WebShell检测,同时特征选取存在一定局限性,文章提出构建涵盖文件本体特征、官方标准特征以及BERT语义特征的高维度特征空间,并设计了LightGBM-AdaBoost集成检测模型,以解决复杂语言下简单特征难以区分正常文件和WebShell的问题,实现了PHP与JSP类型WebShell的高效区分。实验结果表明,基于多维度特征和LightGBM-AdaBoost的WebShell检测方法,在PHP与JSP类型WebShell检测任务中准确率分别高达99.81%和98.93%。相比于现有方法,文章所提方法显著提升了检测准确率,并扩展了检测类型。

关键词: WebShell检测, 多维度特征, LightGBM算法, AdaBoost算法

Abstract:

To address the low accuracy of traditional text-based detection methods in identifying WebShell files, as well as the limitations of existing machine learning and deep learning approaches, which tended to focus primarily on PHP WebShell and involved constrained feature selection, this paper proposed the construction of a high-dimensional feature space that incorporates file-intrinsic features, official standard features and BERT-based semantic features, additionally, a LightGBM-AdaBoost ensemble detection model was designed to tackle the challenge of distinguishing between benign files and WebShell in complex language scenarios where simple features fell short. The proposed method enabled efficient detection of both PHP and JSP WebShell types. Experimental results demonstrate that the proposed method achieves high detection accuracies of 99.81% for PHP WebShell and 98.93% for JSP WebShell. Compared with existing methods, this approach significantly improves detection accuracy and expands the types of detection.

Key words: WebShell detection, multi-dimensional features, LightGBM algorithm, AdaBoost algorithm

中图分类号: