信息网络安全 ›› 2025, Vol. 25 ›› Issue (8): 1254-1262.doi: 10.3969/j.issn.1671-1122.2025.08.007

• 理论研究 • 上一篇    下一篇

基于多智能体对抗学习的攻击路径发现方法

张国敏, 张俊峰(), 屠智鑫, 王梓澎   

  1. 陆军工程大学指挥控制工程学院,南京 210001
  • 收稿日期:2024-09-13 出版日期:2025-08-10 发布日期:2025-09-09
  • 通讯作者: 张俊峰 E-mail:zhjf0317@163.com
  • 作者简介:张国敏(1979—),男,山东,副教授,博士,主要研究方向为软件定义网络、网络安全、网络测量和分布式系统|张俊峰(1995—),男,山东,硕士研究生,主要研究方向为网络安全|屠智鑫(1997—),男,江苏,硕士研究生,主要研究方向为网络安全|王梓澎(2000—),男,辽宁,硕士研究生,主要研究方向为网络安全
  • 基金资助:
    国家自然科学基金(62172432)

An Attack Path Discovery Method Based on Multi-Agent Adversarial Learning

ZHANG Guomin, ZHANG Junfeng(), TU Zhixin, WANG Zipeng   

  1. Institute of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210001, China
  • Received:2024-09-13 Online:2025-08-10 Published:2025-09-09

摘要:

攻击路径发现是智能化渗透测试的一项重要技术,由于安防机制触发、安防人员介入等原因,目标网络往往处于动态变化状态,然而现有研究方法基于静态虚拟网络环境进行训练,智能体因经验失效问题难以适应环境的改变。为此,文章设计了一种基于完全竞争的智能体对抗博弈框架AGF,模拟红方在动态防御网络中攻击路径发现的红蓝智能体对抗博弈过程,并在PPO算法的基础上提出带有防御响应感知(DRP)机制的改进型算法PPODRP对状态和动作进行规划处理,从而使智能体具备对动态环境的适应性。实验结果表明,相比传统PPO算法,PPODRP方法在动态防御网络中的收敛效率更高,能够以更小的代价完成攻击路径发现任务。

关键词: 自动化渗透测试, PPO算法, 攻击路径发现, 对抗性强化学习

Abstract:

Attack path discovery is a key technology in intelligent penetration testing. Due to factors such as security measures, target networks are often in a dynamically changing state. However, existing research methods are trained based on static virtual network environments, and agents struggle to adapt to environmental changes due to the problem of experience invalidation. To address this issue, this paper designed a fully competitive agent adversarial game framework (AGF), which simulated the adversarial game process between red and blue agents in the red team's attack path discovery within dynamic defense networks. Moreover, based on the proximal policy optimization (PPO) algorithm, an improved algorithm named PPODRP was proposed to plan and process states and actions, thereby enabling agents to adapt to dynamic environments. Experimental results show that compared with the traditional PPO algorithm, the PPODRP method achieves higher convergence efficiency in dynamic defense networks and can complete the attack path discovery task at a lower cost.

Key words: automated penetration testing, PPO algorithm, attack path discovery, adversarial reinforcement learning

中图分类号: