信息网络安全 ›› 2023, Vol. 23 ›› Issue (9): 47-57.doi: 10.3969/j.issn.1671-1122.2023.09.005

• 技术研究 • 上一篇    下一篇

基于PPO算法的攻击路径发现与寻优方法

张国敏, 张少勇(), 张津威   

  1. 陆军工程大学指挥控制工程学院,南京 210007
  • 收稿日期:2023-05-22 出版日期:2023-09-10 发布日期:2023-09-18
  • 通讯作者: 张少勇 E-mail:1345150105@qq.com
  • 作者简介:张国敏(1979—),男,江苏,副教授,博士,CCF会员,主要研究方向为软件定义网络、网络安全、网络测量和分布式系统|张少勇(1999—),男,河北,硕士研究生,主要研究方向为深度强化学习和渗透测试|张津威(1998—),男,四川,硕士研究生,主要研究方向为深度强化学习、恶意流量诱捕和蜜罐主动防御技术
  • 基金资助:
    国家自然科学基金(62172432)

Discovery and Optimization Method of Attack Paths Based on PPO Algorithm

ZHANG Guomin, ZHANG Shaoyong(), ZHANG Jinwei   

  1. Institute of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China
  • Received:2023-05-22 Online:2023-09-10 Published:2023-09-18
  • Contact: ZHANG Shaoyong E-mail:1345150105@qq.com

摘要:

基于策略网络选择渗透动作发现最优攻击路径,是自动化渗透测试的一项关键技术。然而,现有方法在训练过程中存在无效动作过多、收敛速度慢等问题。为了解决这些问题,文章将PPO(Proximal Policy Optimization)算法用于解决攻击路径寻优问题,并提出带有渗透动作选择模块的改进型PPO算法IPPOPAS(Improved PPO with Penetration Action Selection),该算法在获取回合经验时,根据渗透测试场景进行动作筛选。文章设计实现IPPOPAS算法的各个组件,包括策略网络、价值网络和渗透动作选择模块等,对动作选择过程进行改进,并进行参数调优和算法优化,提高了算法的性能和效率。实验结果表明,IPPOPAS算法在特定网络场景中的收敛速度优于传统深度强化学习算法DQN(Deep Q Network)及其改进算法,并且随着主机中漏洞数量的增加,该算法的收敛速度更快。此外,实验还验证了在网络规模扩大的情况下IPPOPAS算法的有效性。

关键词: 自动化渗透测试, 策略网络, PPO算法, 攻击路径发现

Abstract:

Selecting penetration actions based on policy networks and discovering the optimal attack path is a crucial technology in automated penetration testing. However, existing methods have issues such as excessive ineffective actions and slow convergence speed during the training process. To address these problems, this paper applied the proximal policy optimization (PPO) algorithm to the attack path optimization problem and proposed an improved version called improved PPO with penetration action selection (IPPOPAS) that incorporated a penetration action selection module. This module enabled the algorithm to select actions based on the penetration testing scenario during the experience collection phase. The paper designd and implemented various components of the IPPOPAS algorithm, including policy networks, value networks, and the penetration action selection module, to enhance the action selection process. Parameter tuning and algorithm optimization were also performed to improve the performance and efficiency of the algorithm. Experimental results demonstrate that the IPPOPAS algorithm achieves faster convergence speed compared to traditional DQN algorithms and their variations in specific network scenarios. Additionally, the algorithm exhibits even faster convergence speed with an increasing number of vulnerabilities in the host. Furthermore, the effectiveness of the IPPOPAS algorithm is validated in scenarios with expanded network scales.

Key words: automated penetration testing, policy network, PPO algorithm, attack path discovery

中图分类号: