信息网络安全 ›› 2025, Vol. 25 ›› Issue (8): 1231-1239.doi: 10.3969/j.issn.1671-1122.2025.08.005
收稿日期:2025-06-05
出版日期:2025-08-10
发布日期:2025-09-09
通讯作者:
何俊鹏
E-mail:ku3n@qq.com
作者简介:高见(1982—),男,山东,副教授,博士,主要研究方向为网络安全|何俊鹏(2000—),男,四川,硕士研究生,主要研究方向为网络安全、APT攻击|苗青青(2003—),女,山东,硕士研究生,主要研究方向为网络安全
基金资助:
GAO Jian1, HE Junpeng1,2(
), MIAO Qingqing1
Received:2025-06-05
Online:2025-08-10
Published:2025-09-09
摘要:
针对传统文本检测方法在WebShell文件检测中的准确率较低、现有机器学习或深度学习算法多聚焦于PHP 类型的WebShell检测,同时特征选取存在一定局限性,文章提出构建涵盖文件本体特征、官方标准特征以及BERT语义特征的高维度特征空间,并设计了LightGBM-AdaBoost集成检测模型,以解决复杂语言下简单特征难以区分正常文件和WebShell的问题,实现了PHP与JSP类型WebShell的高效区分。实验结果表明,基于多维度特征和LightGBM-AdaBoost的WebShell检测方法,在PHP与JSP类型WebShell检测任务中准确率分别高达99.81%和98.93%。相比于现有方法,文章所提方法显著提升了检测准确率,并扩展了检测类型。
中图分类号:
高见, 何俊鹏, 苗青青. 基于多维度特征和LightGBM-AdaBoost的WebShell检测方法[J]. 信息网络安全, 2025, 25(8): 1231-1239.
GAO Jian, HE Junpeng, MIAO Qingqing. WebShell Detection Method Based on Multi-Dimensional Features and LightGBM-AdaBoost[J]. Netinfo Security, 2025, 25(8): 1231-1239.
表4
不同检测模型在 PHP 和 JSP 数据集上的性能对比
| 算法模型 | 数据类型 | Accuracy | Precision | Recall | F1值 |
|---|---|---|---|---|---|
| RNN | PHP | 99.14% | 99.14% | 99.13% | 99.13% |
| LSTM | PHP | 99.23% | 99.26% | 99.20% | 99.23% |
| BiLSTM | PHP | 99.33% | 99.35% | 99.31% | 99.33% |
| XGBoost | PHP | 99.62% | 99.64% | 99.59% | 99.62% |
| LightGBM | PHP | 99.81% | 99.82% | 99.80% | 99.81% |
| CatBoost | PHP | 99.81% | 99.82% | 99.79% | 99.81% |
| RNN | JSP | 98.56% | 98.35% | 98.06% | 98.31% |
| LSTM | JSP | 98.79% | 97.82% | 98.33% | 98.06% |
| BiLSTM | JSP | 98.36% | 97.26% | 97.50% | 97.36% |
| XGBoost | JSP | 97.84% | 96.92% | 96.10% | 96.51% |
| LightGBM | JSP | 98.45% | 98.33% | 96.65% | 97.47% |
| CatBoost | JSP | 96.12% | 96.39% | 94.54% | 95.37% |
| 本文方法 | PHP | 99.81% | 99.80% | 99.81% | 99.81% |
| JSP | 98.93% | 98.49% | 98.82% | 98.62% |
表5
消融实验结果
| 特征组合 | Accuracy | Precision | Recall | F1值 |
|---|---|---|---|---|
| 全特征(本体+标准库+BERT) | 98.93% | 98.49% | 98.82% | 98.62% |
| 去除 BERT 语义特征 | 91.81% | 91.33% | 89.32% | 90.14% |
| 去除官方标准库 特征 | 97.41% | 97.22% | 96.71% | 96.92% |
| 去除文件本体特征 | 97.58% | 97.42% | 96.95% | 97.16% |
| 仅使用BERT语义 特征 | 97.32% | 97.42% | 96.31% | 96.79% |
| 仅使用官方标准库特征 | 87.92% | 90.16% | 81.27% | 84.01% |
| 仅使用文件本体 特征 | 84.47% | 81.89% | 81.28% | 81.53% |
| [1] | EMPOSHA M. WebShell Detector-Detect and Remove Malicious PHP Scripts[EB/OL]. (2015-10-05)[2025-05-22]. https://github.com/emposha/PHP-Shell-Detector. |
| [2] | D-Shield Project. WebShell Detection Tool Official Website[EB/OL]. (2025-04-19)[2025-05-22]. https://www.d99net.net/. |
| [3] | NBS System. PHP Malware Finder-Detect PHP Backdoors and Obfuscated Code[EB/OL]. (2022-02-13)[2025-05-22]. https://github.com/nbs-system/php-malware-finder. |
| [4] | HIPPO Security. Hippo WebShell Scanner Official Website[EB/OL]. (2023-11-30)[2025-05-22]. https://n.shellpub.com/. |
| [5] | DENG L Y, LEE D L, CHEN Y H, et al. Lexical Analysis for the WebShell Attacks[C]// IEEE. 2016 International Symposium on Computer, Consumer and Control. New York: IEEE, 2016: 579-582. |
| [6] | HANNOUSSE A, YAHIOUCHE S. Handling WebShell Attacks: A Systematic Mapping and Survey[EB/OL]. (2021-09-01)[2025-06-02]. https://doi.org/10.1016/j.cose.2021.102366. |
| [7] | MA Mingrui, HAN Lansheng, ZHOU Chunjie. Research and Application of Artificial Intelligence Based WebShell Detection Model: A Literature Review[EB/OL]. (2024-05-01)[2025-05-22]. https://doi.org/10.48550/arXiv.2405.00066. https://doi.org/10.48550/arXiv.2405.00066 |
| [8] | PAN Zulie, CHEN Yuanchao, CHEN Yu, et al. WebShell Detection Based on Executable Data Characteristics of PHP Code[J]. Wireless Communications and Mobile Computing, 2021(1): 1-12. |
| [9] | WANG Huidi. Research on WebShell Detection Based on Abstract Syntax Tree[D]. Chongqing: Chongqing University of Posts and Telecommunications, 2022. |
| 王晖迪. 基于抽象语法树的WebShell检测研究[D]. 重庆: 重庆邮电大学, 2022. | |
| [10] | DONG Chengfeng, LI Daofeng. AST-DF: A New WebShell Detection Method Based on Abstract Syntax Tree and Deep Forest[EB/OL]. (2024-04-13) [2025-05-22]. https://doi.org/10.3390/electronics13081482. |
| [11] | SHANG Mengchuan, HAN Xueying, ZHAO Changzhi, et al. Multi-Language WebShell Detection Based on Abstract Syntax Tree and TreeLSTM[C]// IEEE. 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD). New York: IEEE, 2024: 377-382. |
| [12] | XIE Bailin, LI Qi. WebShell Detection Based on Explicit Duration Recurrent Network[C]// Springer. 13th International Symposium on Cyberspace Safety and Security. Heidelberg: Springer, 2022: 55-65. |
| [13] |
LI Tingting, REN Chunhui, FU Yusheng, et al. WebShell Detection Based on the Word Attention Mechanism[J]. IEEE Access, 2019, 7: 185140-185147.
doi: 10.1109/ACCESS.2019.2959950 |
| [14] | BAI Lu, ZHU Yiqun. WebShellHunter: A New WebShell Detection Method Based on Abstract Syntax Tree and CNN-BiLSTM[C]// ACM. The 2025 5th International Conference on Computer Network Security and Software Engineering. New York: ACM, 2025: 356-362. |
| [15] | AN Tongjian, SHUI Xuefei, GAO Hongkui. Deep Learning Based WebShell Detection Coping with Long Text and Lexical Ambiguity[C]// Springer. International Conference on Information and Communications Security. Heidelberg: Springer, 2022: 438-457. |
| [16] | PU Ao, FENG Xia, ZHANG Yuhan, et al. BERT-Embedding-Based JSP WebShell Detection on Bytecode Level Using XGBoost[EB/OL]. (2022-08-31) [2025-05-22]. https://doi.org/10.1155/2022/4315829. |
| [17] | ALSHINGITI Z, ALAQEL R, AL-MUHTADI J, et al. A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN[EB/OL]. (2023-01-03)[2025-05-22]. https://doi.org/10.3390/electronics12010232. |
| [18] | SASIKALA D, CHANDRAKANTH D, REDDY C S P, et al. Inhibiting WebShell Attacks by Random Forest Ensembles with XGBoost[J]. Journal of Information Technology and Digital World, 2022, 4(3): 153-166. |
| [19] | WU Yalun, SONG Minglu, LI Yike, et al. Improving Convolutional Neural Network-Based WebShell Detection through Reinforcement Learning[C]// Springer. International Conference on Information and Communications Security. Heidelberg: Springer, 2021: 368-383. |
| [20] | LIU Zhiqiang, LI Daofeng, WEI Lulu. A New Method for WebShell Detection Based on Bidirectional GRU and Attention Mechanism[J]. Security and Communication Networks, 2022(1): 1-11. |
| [21] | GOGOI B, AHMED T, DINDA R G. PHP WebShell Detection through Static Analysis of AST Using LSTM-Based Deep Learning[C]// IEEE. 2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition. New York: IEEE, 2022: 1-6. |
| [22] | MA Mingrui, HAN Lansheng, ZHOU Chunjie. Large Language Models are Few-Shot Generators: Proposing Hybrid Prompt Algorithm to Generate WebShell Escape Samples[EB/OL]. (2024-06-05)[2025-05-22]. https://doi.org/10.48550/arXiv.2402.07408. |
| [23] | HAN Feijiang, ZHANG Jiaming, DENG Chuyi, et al. Can LLMs Handle WebShell Detection? Overcoming Detection Challenges with Behavioral Function-Aware Framework[EB/OL]. (2025-04-14)[2025-05-22]. https://doi.org/10.48550/arXiv.2504.13811. |
| [24] | DONG Shi, SHU Longhui, NIE Shan. Android Malware Detection Method Based on CNN and DNN Hybrid Mechanism[J]. IEEE Transactions on Industrial Informatics, 2024, 20(5): 7744-7753. |
| [25] | DONG Shi, SAREM M. DDoS Attack Detection Method Based on Improved KNN with the Degree of DDoS Attack in Software-Defined Networks[J]. IEEE Access, 2019, 8: 5039-5048. |
| [26] | XIA Yuanjun, DONG Shi, PENG Tao, et al. Wireless Network Abnormal Traffic Detection Method Based on Deep Transfer Reinforcement Learning[C]// IEEE. 2021 17th International Conference on Mobility, Sensing and Networking. New York: IEEE, 2021: 528-535. |
| [27] | WANG Guanyu, KO H J, CHIANG C P, et al. WebShell Detection Based on CodeBERT and Deep Learning Model[C]// ACM. The 2024 5th International Conference on Computing, Networks and Internet of Things. New York: ACM, 2024: 484-489. |
| [1] | . Linux下基于SVM分类器的WebShell检测方法研究[J]. , 2014, 14(5): 5-. |
| [2] | 南文倩;郭斌;於志文;皇甫深龙. 基于群智感知的校园活动信息采集与分享系统[J]. , 2013, 13(12): 0-0. |
| [3] | 朱桂斌;江铁;连天;赵植. 基于人脸检测和人眼跟踪的个人计算机安全保护系统[J]. , 2012, 12(9): 0-0. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||