信息网络安全 ›› 2026, Vol. 26 ›› Issue (1): 91-101.doi: 10.3969/j.issn.1671-1122.2026.01.008

• 专题论文:网络主动防御 • 上一篇    下一篇

一种基于分层强化学习的网络防御自主决策研究

王焕臻1, 徐洪平2, 李旷代1, 刘洋1, 姚琳元1()   

  1. 1.北京宇航系统工程研究所,北京 100076
    2.中国运载火箭技术研究院,北京 100076
  • 收稿日期:2025-03-17 出版日期:2026-01-10 发布日期:2026-02-13
  • 通讯作者: 姚琳元 linyuan_yao@126.com
  • 作者简介:王焕臻(2000—),男,吉林,硕士研究生,主要研究方向为网络主动防御|徐洪平(1969—),男,河南,研究员,硕士,主要研究方向为飞行器总体设计|李旷代(1984—),男,黑龙江,研究员,硕士,主要研究方向为指挥信息系统|刘洋(1994—),男,安徽,高级工程师,硕士,主要研究方向为网络安全|姚琳元(1988—),男,天津,高级工程师,博士,主要研究方向为网络安全
  • 基金资助:
    国家重点研发计划(2021YFB3101900)

A Study on Autonomous Decision-Making for Network Defense Based on Hierarchical Reinforcement Learning

WANG Huanzhen1, XU Hongping2, LI Kuangdai1, LIU Yang1, YAO Linyuan1()   

  1. 1. Beijing Institute of Astronautical System Engineering, Beijing 100076, China
    2. China Academy of Launch Vehicle Technology, Beijing 100076, China
  • Received:2025-03-17 Online:2026-01-10 Published:2026-02-13

摘要:

针对传统网络防御决策方法难以有效应对复杂动态的网络环境和多样化网络攻击的问题,结合高保真网络攻防仿真环境,文章提出一种基于分层强化学习的网络防御自主决策方法。通过构建一个基于不完全信息的马尔可夫网络攻防博弈模型,分析攻防双方的动态交互过程,形式化表示最优防御策略。通过顶层控制代理与底层执行代理的协同工作,分解了由于攻击者类型未知所造成的复杂防御决策任务。不同攻防场景下的仿真实验结果表明,该方法对两类渗透攻击模式均能进行灵活且高效的决策响应,维持弹性防御并生成可解释的动作分布。与现有相关工作的对比分析进一步证实了该方法在防御效能方面的优越性。

关键词: 网络防御决策, 马尔可夫博弈, 分层强化学习, 自主网络运营

Abstract:

To address the issue that traditional network defense decision-making methods are unable to effectively cope with complex dynamic network environments and diverse network attacks, this paper proposed a network defense autonomous decision-making method based on hierarchical reinforcement learning, combined with a high-fidelity network attack and defense simulation environment. A Markov network attack and defense game model based on incomplete information was constructed to analyze the dynamic interaction process of the attacker and defender and to formally represent the optimal defense strategy. The complex defense decision-making task caused by the unknown type of attacker was decomposed through the collaborative work of the top-level control agent and the bottom-level execution agent. Simulation experiment results under different attack and defense scenarios show that this method can make flexible and efficient decision responses to two types of penetration attack patterns, maintain resilient defense, and generate interpretable action distributions. Comparative analysis with existing related work further confirms the superiority of the proposed method in defense effectiveness.

Key words: network defense decision, Markov game, hierarchical reinforcement learning, autonomous cyber operation

中图分类号: