Netinfo Security ›› 2026, Vol. 26 ›› Issue (5): 713-724.doi: 10.3969/j.issn.1671-1122.2026.05.004

Previous Articles     Next Articles

Safe Optimization Algorithm for Zero-Sum Game of Nonlinear Cyber-Physical Systems Based on Model-Free Reinforcement Learning

XIE Xiangpeng1, ZHU Qi2()   

  1. 1 School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
    2 College of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2026-02-05 Online:2026-05-10 Published:2026-06-03

Abstract:

This paper proposed a safe optimization algorithm for zero-sum game of nonlinear cyber-physical systems based on model-free reinforcement learning, specifically targeting active suspension system in vehicles subjected to denial-of-service attacks. The algorithm aimed to address safety control issues in scenarios with unknown system models and network packet loss. By introducing a Bernoulli random sequence to characterize the packet loss process caused by denial-of-service attacks, the attacked system was modeled as a stochastic nonlinear system. A discounted cost function incorporating control effort and disturbance penalty was defined, transforming the security control problem into a zero-sum game. A model-free value iteration algorithm based on Q-learning was designed, which constructed a Q-function involving state, control, and disturbance to avoid reliance on the system model. Furthermore, a neural network-based evaluation execution interference architecture was adopted to achieve function approximation. The evaluation network was used to approximate the Q-function, and the execution network and interference network were used to generate control strategies and disturbance strategies. Theoretical analysis demonstrates that the proposed algorithm ensures monotonic convergence and uniform boundedness of the value function sequence. Simulation results indicate that the method effectively maintains the stability and control performance of the suspension system even under denial-of-service attacks.

Key words: denial of service attack, adaptive dynamic programming, Q-learning, critic-actor-disturbance structure

CLC Number: