信息网络安全 ›› 2023, Vol. 23 ›› Issue (11): 38-47.doi: 10.3969/j.issn.1671-1122.2023.11.005

• 技术研究 • 上一篇    下一篇

基于博弈论对手建模的物联网SSH自适应蜜罐策略

宋丽华, 张津威(), 张少勇   

  1. 中国人民解放军陆军工程大学指挥控制工程学院,南京 210007
  • 收稿日期:2023-06-08 出版日期:2023-11-10 发布日期:2023-11-10
  • 通讯作者: 张津威 2250675396@qq.com
  • 作者简介:宋丽华(1976—),女,江苏,教授,博士,CCF会员,主要研究方向为网络空间安全|张津威(1998—),男,四川,硕士研究生,主要研究方向为深度强化学习、恶意流量诱捕和蜜罐主动防御技术|张少勇(1999—),男,河北,硕士研究生,主要研究方向为深度强化学习与渗透测试
  • 基金资助:
    国家自然科学基金(62172432)

An Adaptive IoT SSH Honeypot Strategy Based on Game Theory Opponent Modeling

SONG Lihua, ZHANG Jinwei(), ZHANG Shaoyong   

  1. Institute of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007, China
  • Received:2023-06-08 Online:2023-11-10 Published:2023-11-10

摘要:

物联网设备数量迅速增多使得针对物联网的攻击越来越多,网络安全人员急需使用主动防御技术将被动转化为主动。SSH(Secure Shell)蜜罐技术的引入让防御方能够捕获攻击者的交互信息,对物联网安全具有十分重要的意义。然而,传统蜜罐由于特征或行为模式固定,很容易被攻击者识别和利用。文章从博弈论的角度出发,建立蜜罐与攻击者的交互模型,并使用SAC(Soft Actor-Critic)算法进行求解,通过计算得到防御方的最佳响应策略。仿真结果表明,将强化学习与博弈论相结合的自适应蜜罐能够在多种场景下快速找出最优交互策略,并且加入策略网络的强化学习方法与攻击者的交互收益要优于仅基于价值网络的传统强化学习方法。

关键词: 物联网, 欺骗防御, 蜜罐, 强化学习, 博弈论

Abstract:

The proliferation of IoT devices has led to an increasing number of attacks against the Internet of things, it’s urgent for cybersecurity personnel to use proactive defense techniques to turn reactive defense into proactive defense. The introduction of SSH (secure shell) honeypot technology allows defenders to capture learn attackers’ interaction informationacting strategy, which is of great significance for IoT security. However, traditional honeypots are easily identified and exploited by attackers because of their fixed characteristics or behavioral patterns. From the perspective of game theory, this paper established an interaction model between honeypots and attackers, and we calculated the best response strategy of the defender by useing SAC (soft actor-critic) algorithm. Simulation results show that adaptive honeypot by combining reinforcement learning and game theory can quickly find the optimal interaction strategy in a variety of scenarios, and the reinforcement learning method added to the policy network is better than the traditional reinforcement learning method based on the value network alone.

Key words: Internet of things, deception defense, honeypot, reinforcement learning, game theory

中图分类号: