信息网络安全 ›› 2026, Vol. 26 ›› Issue (4): 654-664.doi: 10.3969/j.issn.1671-1122.2026.04.012
收稿日期:2026-02-03
出版日期:2026-04-10
发布日期:2026-04-29
通讯作者:
吕萍
E-mail:lp@hzzekj.com
作者简介:董英娟(1978—),女,陕西,副教授,硕士,主要研究方向为工业设计、人工智能|吕萍(1982—),女,湖北,高级工程师,硕士,主要研究方向为网络安全、数据安全|刘兵(1982—),男,北京,硕士,主要研究方向为人工智能网络安全、数据安全、卫星互联网安全
DONG Yingjuan1, LYU Ping2(
), LIU Bing3
Received:2026-02-03
Online:2026-04-10
Published:2026-04-29
摘要:
近年来,网络攻击呈现高度组织化和自动化趋势。在以大语言模型为主的人工智能技术加持下,攻击者能够快速编写和派生恶意代码,并基于僵尸网络构建针对特定目标的自动化和分布式的侦查与攻击流程,给网络安全防护带来了威胁和挑战。为有效应对以上挑战,文章提出并设计一种基于多智能体架构的自动化渗透测试系统,将传统渗透测试任务拆解为具有原子性的子任务,并交由各智能体联合完成。实验结果表明,该系统在多项测试指标上均显著领先于传统漏洞扫描工具,能够全面识别被测信息系统中的多类型安全漏洞,并为漏洞披露提供高度可信的证据链。此外,该系统能够生成可执行的修复建议,实现渗透测试流程的自动化与工程化,为机构开展常态化网络安全漏洞管理工作提供一种先进、高效和稳定的解决方案。
中图分类号:
董英娟, 吕萍, 刘兵. 一种基于多智能体架构的自动化渗透测试系统[J]. 信息网络安全, 2026, 26(4): 654-664.
DONG Yingjuan, LYU Ping, LIU Bing. An Automated Penetration Testing System Based on Multi-Agent Architecture[J]. Netinfo Security, 2026, 26(4): 654-664.
表3
本文系统与PentestGPT的对比结果
| 试题名称 | 分类 | PentestGPT | 本文系统 | ||
|---|---|---|---|---|---|
| 测试/次 | 成功/次 | 成功/次 | 测试/次 | ||
| login | Web | 5 | 5 | 5 | 5 |
| advance-potion-making | forensics | 5 | 3 | 3 | 4 |
| spelling-quiz | crypto | 5 | 4 | 3 | 5 |
| caas | Web | 5 | 2 | 5 | 5 |
| XtrOrdinary | crypto | 5 | 5 | 3 | 5 |
| tripplesecure | crypto | 5 | 3 | 2 | 5 |
| clutteroverflow | binary | 5 | 1 | 3 | 5 |
| not crypto | reverse | 5 | 0 | 0 | 5 |
| scrambled-bytes | forensics | 5 | 0 | 0 | 5 |
| breadth | reverse | 5 | 0 | 0 | 5 |
| notepad | Web | 5 | 1 | 4 | 5 |
| college-rowing-team | crypto | 5 | 2 | 1 | 5 |
| fermat-strings | binary | 5 | 0 | 0 | 5 |
| corrupt-key-1 | crypto | 5 | 0 | 0 | 5 |
| SaaS | binary | 5 | 0 | 0 | 5 |
| riscy business | reverse | 5 | 0 | 0 | 5 |
| homework | binary | 5 | 0 | 0 | 5 |
| lockdown-horses | binary | 5 | 0 | 0 | 5 |
| corrupt-key-2 | crypto | 5 | 0 | 0 | 5 |
| vr-school | binary | 5 | 0 | 0 | 5 |
| MATRIX | reverse | 5 | 0 | 0 | 5 |
表4
本文系统与基线项目的对比结果
| 测试项 | AWVS | Xray | 本文系统 |
|---|---|---|---|
| Reflected XSS into HTML context with nothing encoded | 成功 | 成功 | 成功 |
| Stored XSS into HTML context with nothing encoded | 成功 | 失败 | 成功 |
| DOMXSS in document.write sink using source location.search | 成功 | 失败 | 成功 |
| DOM XSS in innexHTML sink using source location.search | 成功 | 失败 | 成功 |
| DOM XSS in jQuery anchor href attribute sink using location.search source | 失败 | 失败 | 成功 |
| DOM XSS in jQuery selector sink usinga hashchange event | 失败 | 失败 | 成功 |
| Reflected XSS into attribute with angle brackets HTML-encoded | 成功 | 成功 | 成功 |
| Stored XSS into anchor href attribute with double quotes HTML-encoded | 失败 | 失败 | 失败 |
| Reflected XSS into Javascript string with angle brackets HTML encoded | 成功 | 成功 | 成功 |
| DOM XSS in document.write sink using source location.search inside a select element | 成功 | 成功 | 成功 |
| DOM XSS in AngularJs expression with angle brackets and double quotes HTML-encoded | 成功 | 成功 | 成功 |
| Reflected DOM XSS | 失败 | 失败 | 失败 |
| Reflected XSS into HTML context with most tags and attributes blocked | 成功 | 成功 | 失败 |
| Reflected XSS into HTML context with all tags blocked except custom ones | 成功 | 成功 | 成功 |
| [1] | BISHOP M. About Penetration Testing[J]. IEEE Security & Privacy Magazine, 2007, 5(6): 84-87. |
| [2] | NIST SP 800-115 Technical Guide to Information Security Testing and Assessment[S]. Gaithersburg: National Institute of Standards and Technology, 2008. |
| [3] | ANTUNES N, VIEIRA M. Benchmarking Vulnerability Detection Tools for Web Services[C]// IEEE. 2010 IEEE International Conference on Web Services. New York: IEEE, 2010: 203-210. |
| [4] | XIONG Pulei, PEYTON L. A Model-Driven Penetration Test Framework for Web Applications[C]// IEEE. 2010 Eighth International Conference on Privacy, Security and Trust. New York: IEEE, 2010: 173-180. |
| [5] | ROY S S, THOTA P, NARAGAM K V, et al. From Chatbots to Phishbots?: Phishing Scam Generation in Commercial Large Language Models[C]// IEEE. 2024 IEEE Symposium on Security and Privacy (SP). New York: IEEE, 2024: 36-54. |
| [6] | BECKERICH M, PLEIN L, CORONADO S. RatGPT: Turning Online LLMs into Proxies for Malware Attacks[EB/OL].(2023-09-07)[2025-09-02]. https://arxiv.org/abs/2308.09183. |
| [7] | MOHAMED F M F, ELBREIKI W, ABDULLAHI I, et al. WormGPT: A Large Language Model Chatbot for Criminals[C]// IEEE. 2023 24th International Arab Conference on Information Technology (ACIT). New York: IEEE, 2023: 1-6. |
| [8] | HOU Xinyi, ZHAO Yanjie, LIU Yue, et al. Large Language Models for Software Engineering: A Systematic Literature Review[J]. ACM Transactions on Software Engineering and Methodology, 2024, 33(8): 1-79. |
| [9] | YANG Zhou, SUN Zhensu, YUE T Z, et al. Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code[EB/OL].(2024-03-12)[2025-09-02]. https://arxiv.org/abs/2403.07506. |
| [10] | OH S, LEE K, PARK S, et al. Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers’ Coding Practices with Insecure Suggestions from Poisoned AI Models[C]// IEEE. 2024 IEEE Symposium on Security and Privacy (SP). New York: IEEE, 2024: 1141-1159. |
| [11] | SCHUSTER R, SONG Congzheng, TROMER E, et al. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion[C]// USENIX. The 30th USENIX Security Symposium. Berkely: USENIX Association, 2021: 1559-1575. |
| [12] | NGUYEN P T, DI S C, DI R J, et al. Adversarial Attacks to API Recommender Systems: Time to Wake up and Smell the Coffee?[C]// IEEE. 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). New York: IEEE, 2021: 253-265. |
| [13] | QI Shiyi, YANG Yuanhang, GAO S, et al. BadCS: A Backdoor Attack Framework for Code Search[EB/OL].(2023-05-09)[2025-09-02]. https://arxiv.org/abs/2305.05503. |
| [14] | SUN Weisong, CHEN Yuchen, TAO Guanhong, et al. Backdooring Neural Code Search[EB/OL].(2023-06-12)[2025-09-02]. https://arxiv.org/abs/2305.17506. |
| [15] | WAN Yao, ZHANG Shijie, ZHANG Hongyu, et al. You See What I Want You to See: Poisoning Vulnerabilities in Neural Code Search[C]// ACM. The 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2022: 1233-1245. |
| [16] | BODDY M, GOHDE J, HAIGH T, et al. Course of Action Generation for Cyber Security Using Classical Planning[C]// ICAPS. International Conference on Automated Planning and Scheduling. Palo Alto: AAAI, 2005: 16-21. |
| [17] | OBES J L, SARRAUTE C, RICHARTE G. Attack Planning in the Real World[EB/OL].(2013-06-19)[2025-09-02]. https://arxiv.org/abs/1306.4044. |
| [18] | ROBERTS M, HOWE A, RAY I, et al. Personalized Vulnerability Analysis through Automated Planning[EB/OL]. [2025-09-02]. https://www.researchgate.net/publication/228946141_Personalized_Vulnerability_Analysis_through_Automated_Planning. |
| [19] | DENG Gelei, LIU Yi, MAYORAL-VILCHES V, et al. PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing[C]// USENIX. The 33rd USENIX Security Symposium. Berkely: USENIX Association, 2024: 847-864. |
| [20] | HAPPE A, KAPLAN A, CITO J. LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks[EB/OL].(2023-10-17)[2025-09-02]. https://arxiv.org/abs/2310.11409. |
| [21] | FANG R, BINDU R, GUPTA A, et al. LLM Agents Can Autonomously Hack Websites[EB/OL].(2024-02-06)[2025-09-02]. https://arxiv.org/abs/2402.06664. |
| [22] | FANG R, BINDU R, GUPTA A, et al. LLM Agents Can Autonomously Exploit One-Day Vulnerabilities[EB/OL].(2024-04-11)[2025-09-02]. https://arxiv.org/abs/2404.08144. |
| [23] | OpenAI. What is the Difference between the GPT-4 Models[EB/OL]. [2025-10-26]. https://help.openai.com/en/articles/7127966-what-is-the-difference-between-the-gpt-4-models. |
| [24] | LIU N F, LIN K, HEWITT J, et al. Lost in the Middle: How Language Models Use Long Contexts[EB/OL].(2023-07-06)[2025-09-02]. https://arxiv.org/abs/2307.03172. |
| [25] | BANG Yejin, CAHYAWIJAYA S, LEE N, et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity[EB/OL].(2023-02-08)[2025-09-02]. https://arxiv.org/abs/2302.04023. |
| [26] | MITRE Corporation. MITRE ATT&CK® Matrix for Enterprise (Knowledge Base)[EB/OL].(2025-10-28)[2026-01-27]. https://attack.mitre.org/matrices/enterprise/. |
| [27] | SINGH G P, BHARTI V, HOODA M K. A Review on NIST, ISO 27001, HIPAA and MITRE ATT&CK Cybersecurity Frameworks[J]. Webology, 2021, 18(6): 1872-1880. |
| [28] | AMMANN P, WIJESEKERA D, KAUSHIK S. Scalable, Graph-Based Network Vulnerability Analysis[C]// ACM. The 9th ACM Conference on Computer and Communications Security. New York: ACM, 2002: 217-224. |
| [29] | Anthropic. Introducing the Model Context Protocol[EB/OL].(2024-11-25)[2025-09-02]. https://www.anthropic.com/news/model-context-protocol. |
| [30] | LYON G F. Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning[M]. Rockland, MA: Insecure. Com LLC, 2009. |
| [31] | Hack The Box. Hack The Box: Hacking Training for the Best[EB/OL]. [2025-11-05]. http://www.hackthebox.com/.https://www.acunetix.com/. |
| [32] | picoCTF. picoCTF 2021 Redpwn Competition[EB/OL]. [2025-11-21]. https://picoctf.org/competitions/2021-redpwn.html. |
| [33] | PortSwigger. Cross-Site Scripting (XSS)[EB/OL]. [2025-11-21]. https://portswigger.net/web-security/cross-site-scripting. |
| [34] | Acunetix. Acunetix Web Vulnerability Scanner[EB/OL]. [2025-11-21]. https://www.acunetix.com/.https://arxiv.org/abs/2404.08144. |
| [35] | Chaitin Technology. X-Ray Vulnerability Scanner[EB/OL].(2026-01-01)[2026-02-02]. https://www.chaitin.cn/en/xrayhttps://www.chaitin.cn/en/xray. |
| [1] | 崔津华, 董亮, 杨新. 大语言模型推理隐私保护技术综述[J]. 信息网络安全, 2026, 26(4): 503-520. |
| [2] | 李海龙, 张运豪, 沈燮阳, 邢宇航, 崔治安. 基于机器学习的恶意软件检测方法综述[J]. 信息网络安全, 2026, 26(4): 521-541. |
| [3] | 郑东, 刘雁荣, 秦宝东. 一种安全可扩展的变体阈值多方隐私集合求交协议[J]. 信息网络安全, 2026, 26(4): 542-551. |
| [4] | 张艳硕, 孔佳音, 周幸妤, 秦晓宏, 胡荣磊. 基于国密算法SM9的可否认环签密方案的设计[J]. 信息网络安全, 2026, 26(4): 552-565. |
| [5] | 易文哲, 徐枭洋, 石磊, 庄泳, 王鹃. 基于知识迁移和冻结的模型反演防御方法[J]. 信息网络安全, 2026, 26(4): 566-578. |
| [6] | 李锦凯, 王靖雯, 董立波, 姚文翰, 刘成杰, 文伟平. 基于时序图注意力网络的区块链异常交易检测方法[J]. 信息网络安全, 2026, 26(4): 579-590. |
| [7] | 李岩, 杨文章, 薛吟兴. 基于LLM翻译与差分测试的跨语言编译器模糊测试[J]. 信息网络安全, 2026, 26(4): 591-604. |
| [8] | 于淼, 郭松辉, 宋帅超, 杨烨铭. 面向派生定密的图神经网络文本匹配模型研究[J]. 信息网络安全, 2026, 26(4): 605-614. |
| [9] | 胡勉宁, 李欣, 李明锋, 袁得嵛. 基于大语言模型的多策略增强中文网络威胁情报实体抽取研究[J]. 信息网络安全, 2026, 26(4): 615-625. |
| [10] | 舒展, 马依兰, 聂凯峰, 李宗鹏. 基于OOD技术的网络告警日志高置信度研判方法[J]. 信息网络安全, 2026, 26(4): 626-641. |
| [11] | 袁小刚, 裴桓, 安德智, 万建鑫. 基于多特征感知和注意力机制的深度伪造图像检测研究[J]. 信息网络安全, 2026, 26(4): 642-653. |
| [12] | 袁明, 邹其霖, 袁文骐, 王群. 大语言模型提示词注入攻击与防御综述[J]. 信息网络安全, 2026, 26(3): 341-354. |
| [13] | 李馥娟, 王群. 网络靶场研究进展[J]. 信息网络安全, 2026, 26(3): 355-366. |
| [14] | 徐衍微, 涂敏, 张亮. 深度伪造语音真实性鉴定研究综述[J]. 信息网络安全, 2026, 26(3): 367-377. |
| [15] | 胡文涛, 丁伟杰. DiffGuard:基于扩散模型与自适应序列学习的网络流量异常检测框架[J]. 信息网络安全, 2026, 26(3): 378-388. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||