一种基于多智能体架构的自动化渗透测试系统

doi:10.3969/j.issn.1671-1122.2026.04.012

摘要/Abstract

摘要：

近年来，网络攻击呈现高度组织化和自动化趋势。在以大语言模型为主的人工智能技术加持下，攻击者能够快速编写和派生恶意代码，并基于僵尸网络构建针对特定目标的自动化和分布式的侦查与攻击流程，给网络安全防护带来了威胁和挑战。为有效应对以上挑战，文章提出并设计一种基于多智能体架构的自动化渗透测试系统，将传统渗透测试任务拆解为具有原子性的子任务，并交由各智能体联合完成。实验结果表明，该系统在多项测试指标上均显著领先于传统漏洞扫描工具，能够全面识别被测信息系统中的多类型安全漏洞，并为漏洞披露提供高度可信的证据链。此外，该系统能够生成可执行的修复建议，实现渗透测试流程的自动化与工程化，为机构开展常态化网络安全漏洞管理工作提供一种先进、高效和稳定的解决方案。

关键词: 渗透测试系统, 多智能体架构, 自主任务规划, 系统与网络安全

Abstract:

In recent years, cyberattacks have become increasingly organized and automated. With the support of artificial intelligence technologies, particularly large language models, attackers are able to rapidly write and derive malicious code, and construct automated and distributed reconnaissance and attack processes targeting specific objectives through botnets. This has posed severe threats and risks to cybersecurity defenses. To effectively address these challenges, this thesis proposed and designed a novel automated penetration testing system based on a multi-agent architecture. The system decomposed traditional penetration testing tasks into atomic sub-tasks, which were then collaboratively completed by multiple agents. Experimental results show that the system significantly outperforms traditional vulnerability scanning tools across multiple testing metrics, being capable of comprehensively identifying various types of security vulnerabilities in the target information system, and providing highly credible evidence chains for vulnerability disclosure. Furthermore, the system can generate executable remediation recommendations, achieving the automation and engineering of the penetration testing process, thus offering an advanced, efficient, and stable solution for organizations to conduct regular network security vulnerability management.

Key words: penetration testing system, multi-agent architecture, autonomous mission planning, system and network security

中图分类号:

TP309

董英娟, 吕萍, 刘兵. 一种基于多智能体架构的自动化渗透测试系统[J]. 信息网络安全, 2026, 26(4): 654-664.

DONG Yingjuan, LYU Ping, LIU Bing. An Automated Penetration Testing System Based on Multi-Agent Architecture[J]. Netinfo Security, 2026, 26(4): 654-664.

图/表 7

图1

表1

图2

图3

表2

表3

表4

参考文献 35

[1]	BISHOP M. About Penetration Testing[J]. IEEE Security & Privacy Magazine, 2007, 5(6): 84-87.
[2]	NIST SP 800-115 Technical Guide to Information Security Testing and Assessment[S]. Gaithersburg: National Institute of Standards and Technology, 2008.
[3]	ANTUNES N, VIEIRA M. Benchmarking Vulnerability Detection Tools for Web Services[C]// IEEE. 2010 IEEE International Conference on Web Services. New York: IEEE, 2010: 203-210.
[4]	XIONG Pulei, PEYTON L. A Model-Driven Penetration Test Framework for Web Applications[C]// IEEE. 2010 Eighth International Conference on Privacy, Security and Trust. New York: IEEE, 2010: 173-180.
[5]	ROY S S, THOTA P, NARAGAM K V, et al. From Chatbots to Phishbots?: Phishing Scam Generation in Commercial Large Language Models[C]// IEEE. 2024 IEEE Symposium on Security and Privacy (SP). New York: IEEE, 2024: 36-54.
[6]	BECKERICH M, PLEIN L, CORONADO S. RatGPT: Turning Online LLMs into Proxies for Malware Attacks[EB/OL].(2023-09-07)[2025-09-02]. https://arxiv.org/abs/2308.09183.
[7]	MOHAMED F M F, ELBREIKI W, ABDULLAHI I, et al. WormGPT: A Large Language Model Chatbot for Criminals[C]// IEEE. 2023 24th International Arab Conference on Information Technology (ACIT). New York: IEEE, 2023: 1-6.
[8]	HOU Xinyi, ZHAO Yanjie, LIU Yue, et al. Large Language Models for Software Engineering: A Systematic Literature Review[J]. ACM Transactions on Software Engineering and Methodology, 2024, 33(8): 1-79.
[9]	YANG Zhou, SUN Zhensu, YUE T Z, et al. Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code[EB/OL].(2024-03-12)[2025-09-02]. https://arxiv.org/abs/2403.07506.
[10]	OH S, LEE K, PARK S, et al. Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers’ Coding Practices with Insecure Suggestions from Poisoned AI Models[C]// IEEE. 2024 IEEE Symposium on Security and Privacy (SP). New York: IEEE, 2024: 1141-1159.
[11]	SCHUSTER R, SONG Congzheng, TROMER E, et al. You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion[C]// USENIX. The 30th USENIX Security Symposium. Berkely: USENIX Association, 2021: 1559-1575.
[12]	NGUYEN P T, DI S C, DI R J, et al. Adversarial Attacks to API Recommender Systems: Time to Wake up and Smell the Coffee?[C]// IEEE. 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). New York: IEEE, 2021: 253-265.
[13]	QI Shiyi, YANG Yuanhang, GAO S, et al. BadCS: A Backdoor Attack Framework for Code Search[EB/OL].(2023-05-09)[2025-09-02]. https://arxiv.org/abs/2305.05503.
[14]	SUN Weisong, CHEN Yuchen, TAO Guanhong, et al. Backdooring Neural Code Search[EB/OL].(2023-06-12)[2025-09-02]. https://arxiv.org/abs/2305.17506.
[15]	WAN Yao, ZHANG Shijie, ZHANG Hongyu, et al. You See What I Want You to See: Poisoning Vulnerabilities in Neural Code Search[C]// ACM. The 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2022: 1233-1245.
[16]	BODDY M, GOHDE J, HAIGH T, et al. Course of Action Generation for Cyber Security Using Classical Planning[C]// ICAPS. International Conference on Automated Planning and Scheduling. Palo Alto: AAAI, 2005: 16-21.
[17]	OBES J L, SARRAUTE C, RICHARTE G. Attack Planning in the Real World[EB/OL].(2013-06-19)[2025-09-02]. https://arxiv.org/abs/1306.4044.
[18]	ROBERTS M, HOWE A, RAY I, et al. Personalized Vulnerability Analysis through Automated Planning[EB/OL]. [2025-09-02]. https://www.researchgate.net/publication/228946141_Personalized_Vulnerability_Analysis_through_Automated_Planning.
[19]	DENG Gelei, LIU Yi, MAYORAL-VILCHES V, et al. PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing[C]// USENIX. The 33rd USENIX Security Symposium. Berkely: USENIX Association, 2024: 847-864.
[20]	HAPPE A, KAPLAN A, CITO J. LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks[EB/OL].(2023-10-17)[2025-09-02]. https://arxiv.org/abs/2310.11409.
[21]	FANG R, BINDU R, GUPTA A, et al. LLM Agents Can Autonomously Hack Websites[EB/OL].(2024-02-06)[2025-09-02]. https://arxiv.org/abs/2402.06664.
[22]	FANG R, BINDU R, GUPTA A, et al. LLM Agents Can Autonomously Exploit One-Day Vulnerabilities[EB/OL].(2024-04-11)[2025-09-02]. https://arxiv.org/abs/2404.08144.
[23]	OpenAI. What is the Difference between the GPT-4 Models[EB/OL]. [2025-10-26]. https://help.openai.com/en/articles/7127966-what-is-the-difference-between-the-gpt-4-models.
[24]	LIU N F, LIN K, HEWITT J, et al. Lost in the Middle: How Language Models Use Long Contexts[EB/OL].(2023-07-06)[2025-09-02]. https://arxiv.org/abs/2307.03172.
[25]	BANG Yejin, CAHYAWIJAYA S, LEE N, et al. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity[EB/OL].(2023-02-08)[2025-09-02]. https://arxiv.org/abs/2302.04023.
[26]	MITRE Corporation. MITRE ATT&CK^® Matrix for Enterprise (Knowledge Base)[EB/OL].(2025-10-28)[2026-01-27]. https://attack.mitre.org/matrices/enterprise/.
[27]	SINGH G P, BHARTI V, HOODA M K. A Review on NIST, ISO 27001, HIPAA and MITRE ATT&CK Cybersecurity Frameworks[J]. Webology, 2021, 18(6): 1872-1880.
[28]	AMMANN P, WIJESEKERA D, KAUSHIK S. Scalable, Graph-Based Network Vulnerability Analysis[C]// ACM. The 9th ACM Conference on Computer and Communications Security. New York: ACM, 2002: 217-224.
[29]	Anthropic. Introducing the Model Context Protocol[EB/OL].(2024-11-25)[2025-09-02]. https://www.anthropic.com/news/model-context-protocol.
[30]	LYON G F. Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning[M]. Rockland, MA: Insecure. Com LLC, 2009.
[31]	Hack The Box. Hack The Box: Hacking Training for the Best[EB/OL]. [2025-11-05]. http://www.hackthebox.com/.https://www.acunetix.com/.
[32]	picoCTF. picoCTF 2021 Redpwn Competition[EB/OL]. [2025-11-21]. https://picoctf.org/competitions/2021-redpwn.html.
[33]	PortSwigger. Cross-Site Scripting (XSS)[EB/OL]. [2025-11-21]. https://portswigger.net/web-security/cross-site-scripting.
[34]	Acunetix. Acunetix Web Vulnerability Scanner[EB/OL]. [2025-11-21]. https://www.acunetix.com/.https://arxiv.org/abs/2404.08144.
[35]	Chaitin Technology. X-Ray Vulnerability Scanner[EB/OL].(2026-01-01)[2026-02-02]. https://www.chaitin.cn/en/xrayhttps://www.chaitin.cn/en/xray.

MITRE ATT&CK 各阶段	对应工具模块
侦查	侦查、指纹识别
资源开发	有效载荷生成
初始访问	漏洞利用脚本、黑盒测试
执行	本地Shell、远程控制
持久化/横向移动	远程控制、本地文件操作
信息收集	网页爬虫、无头Chromium、数据结构化

靶场名称	难度	PentestGPT		本文系统
靶场名称	难度	测试次数/次	成功次数/次	成功次数/次	测试次数/次
Sau	简单	5	5	5	5
Pilgramage	简单	5	3	4	5
Topology	简单	5	0	2	5
PC	简单	5	4	3	5
MonitorsTwo	简单	5	3	5	5
Authority	中等	5	0	2	5
Sandworm	中等	5	0	3	5
Jupiter	中等	5	0	2	5
Agile	中等	5	2	4	5
OnlyForYou	中等	5	0	2	5
总计	—	50	17	32	50

试题名称	分类	PentestGPT		本文系统
试题名称	分类	测试/次	成功/次	成功/次	测试/次
login	Web	5	5	5	5
advance-potion-making	forensics	5	3	3	4
spelling-quiz	crypto	5	4	3	5
caas	Web	5	2	5	5
XtrOrdinary	crypto	5	5	3	5
tripplesecure	crypto	5	3	2	5
clutteroverflow	binary	5	1	3	5
not crypto	reverse	5	0	0	5
scrambled-bytes	forensics	5	0	0	5
breadth	reverse	5	0	0	5
notepad	Web	5	1	4	5
college-rowing-team	crypto	5	2	1	5
fermat-strings	binary	5	0	0	5
corrupt-key-1	crypto	5	0	0	5
SaaS	binary	5	0	0	5
riscy business	reverse	5	0	0	5
homework	binary	5	0	0	5
lockdown-horses	binary	5	0	0	5
corrupt-key-2	crypto	5	0	0	5
vr-school	binary	5	0	0	5
MATRIX	reverse	5	0	0	5

测试项	AWVS	Xray	本文系统
Reflected XSS into HTML context with nothing encoded	成功	成功	成功
Stored XSS into HTML context with nothing encoded	成功	失败	成功
DOMXSS in document.write sink using source location.search	成功	失败	成功
DOM XSS in innexHTML sink using source location.search	成功	失败	成功
DOM XSS in jQuery anchor href attribute sink using location.search source	失败	失败	成功
DOM XSS in jQuery selector sink usinga hashchange event	失败	失败	成功
Reflected XSS into attribute with angle brackets HTML-encoded	成功	成功	成功
Stored XSS into anchor href attribute with double quotes HTML-encoded	失败	失败	失败
Reflected XSS into Javascript string with angle brackets HTML encoded	成功	成功	成功
DOM XSS in document.write sink using source location.search inside a select element	成功	成功	成功
DOM XSS in AngularJs expression with angle brackets and double quotes HTML-encoded	成功	成功	成功
Reflected DOM XSS	失败	失败	失败
Reflected XSS into HTML context with most tags and attributes blocked	成功	成功	失败
Reflected XSS into HTML context with all tags blocked except custom ones	成功	成功	成功