Netinfo Security ›› 2023, Vol. 23 ›› Issue (3): 73-83.doi: 10.3969/j.issn.1671-1122.2023.03.008

Previous Articles     Next Articles

A Multi-Dimensional Root Cause Localization Algorithm for Microservices

SHI Yuan1,2, LI Yang1,2(), ZHAN Mengqi1,2   

  1. 1. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
    2. School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-11-13 Online:2023-03-10 Published:2023-03-14
  • Contact: LI Yang E-mail:liyang@iie.ac.cn

Abstract:

With the gradual maturity of virtualized container technologies such as Docker, because of its scalability, flexibility and other characteristics that perfectly fit the microservice architecture, the industry gradually deploys microservice architecture applications in container-based cloud environments, and use container orchestration tools such as Kubernetes to manage the full life cycle of the application. Under such a complex microservice architecture, how to use artificial intelligence technology to efficiently find abnormalities and locate the root cause becomes the top priority. First, the article summarized the main challenges and existing key technologies for anomaly detection and root cause localization in the context of microservice systems. Then, aiming at the problem that the coverage of existing anomaly detection was not comprehensive, we proposed a multi-dimensional anomaly detection method based on unsupervised learning, it combined service and machine resource utilization data for comprehensive analysis on the basis of call chain Trace data to ensure that service response time anomalies can be detected, and service resource utilization anomalies and environmental anomalies can also be identified. Finally, in the case of known anomalies, in order to reduce the root cause localization time, expand the localization range and reduce the granularity, we proposed a lightweight anomaly propagation subgraph-based method. It unified the data of the two dimensions of service interface and machine node into the anomaly propagation subgraph for root cause localization. The experiments results show that proposed method has shorter localization time compared with the existing methods, and not only broadens the root cause localization scenario, but also has a significant improvement in accuracy.

Key words: container, microservices, Kubernetes, abnormal detection, root cause localization

CLC Number: