信息网络安全 ›› 2021, Vol. 21 ›› Issue (6): 26-35.doi: 10.3969/j.issn.1671-1122.2021.06.004

• 技术研究 • 上一篇    下一篇

基于Q-Learning的自动入侵响应决策方法

刘璟(), 张玉臣, 张红旗   

  1. 中国人民解放军战略支援部队信息工程大学密码工程学院,郑州 450001
  • 收稿日期:2021-01-21 出版日期:2021-06-10 发布日期:2021-07-01
  • 通讯作者: 刘璟 E-mail:cybersecuritys@163.com
  • 作者简介:刘璟(1977—),女,河南,博士研究生,主要研究方向为态势感知|张玉臣(1977—),男,河南,教授,博士,主要研究方向为信息系统安全|张红旗(1962—),男,河北,教授,博士,主要研究方向为网络信息安全
  • 基金资助:
    国家重点研发计划(2016YFF0204002);国家重点研发计划(2016YFF0204003);国家自然科学基金(61902427);国家自然科学基金(61471344)

Automatic Intrusion Response Decision-making Method Based on Q-Learning

LIU Jing*(), ZHANG Yuchen, ZHANG Hongqi   

  1. Department of Cryptogram Engineering, Information Engineering University of PLA, Zhengzhou 450001, China
  • Received:2021-01-21 Online:2021-06-10 Published:2021-07-01
  • Contact: LIU Jing* E-mail:cybersecuritys@163.com

摘要:

针对现有自动入侵响应决策自适应性差的问题,文章提出一种基于Q-Learning的自动入侵响应决策方法——Q-AIRD。Q-AIRD基于攻击图对网络攻防中的状态和动作进行形式化描述,通过引入攻击模式层识别不同能力的攻击者,从而做出有针对性的响应动作;针对入侵响应的特点,采用Softmax算法并通过引入安全阈值θ、稳定奖励因子μ和惩罚因子ν进行响应策略的选取;基于投票机制实现对策略的多响应目的评估,满足多响应目的的需求,在此基础上设计了基于Q-Learning的自动入侵响应决策算法。仿真实验表明,Q-AIRD具有很好的自适应性,能够实现及时、有效的入侵响应决策。

关键词: 强化学习, 自动入侵响应, Softmax算法, 多目标决策

Abstract:

Aiming at the problem of poor adaptability of existing automatic intrusion response decision-making, this paper proposes an automatic intrusion response decision-making method based on Q-Learning (Q-AIRD). Q-AIRD formalizes the states and actions of network attack and defense based on the attack graph, and introduces the attack mode layer to identify attackers with different abilities, so as to make more targeted response actions. According to the characteristics of intrusion response, the Softmax algorithm is adopted and the security threshold θ, stable reward factor μ and penalty factor ν are introduced to select the response strategy. Based on the voting mechanism, the multi-response purpose evaluation of the strategy is realized to meet the needs of the multi-response purpose. On this basis, an automatic intrusion response decision algorithm based on Q-Learning is designed. The simulation results show that Q-AIRD has good adaptability and can realize timely and effective intrusion response decision-making.

Key words: reinforcement learning, automatic intrusion response, Softmax algorithm, multi-objective decision-making

中图分类号: