In recent years, incidents threatening network security have become more frequent, hackers’ attack methods have become more and more sophisticated, and the difficulty of network security protection has continued to increase Aiming at the problem of the complex and changeable attack strategies and the imperfect rationality of the attacker in the actual network attack and defense environment, the article integrated the attack graph into the attack and defensive game model, and introduced a reinforcement learning algorithm to design a network active defense strategy generation method. The article first proposed a network vulnerability assessment model based on an improved attack graph, this model successfully compresses strategy space and effectively reduces the difficulty of modeling; then the article built a game model for network attack and defense, designed the attacker and defender’s decision-making on the network attack and defense strategy as a multi-stage random game model. At the same time, the article introduces reinforcement learning Minimax-Q Learning to design a self-learning network defense algorithm, through this algorithm, the defender can learn a series of attack behaviors to solve the optimal defense strategy for the attacker. Finally, the article verifies the effectiveness and advancement of the algorithm through simulation experiments. At the same time, the article introduced reinforcement learning Minimax-Q to design a self-learning network defense strategy selection algorithm, through this algorithm, the defender can learn a series of attack behaviors to solve the optimal defense strategy for the attacker. Finally, the article verified the effectiveness and advancement of the algorithm through simulation experiments., it shows that the proposed method has certain guiding significance for network defense.