Netinfo Security ›› 2024, Vol. 24 ›› Issue (7): 1062-1075.doi: 10.3969/j.issn.1671-1122.2024.07.008

Previous Articles     Next Articles

Context-Based Abnormal Root Cause Algorithm

ZHOU Shucheng1,2,3, LI Yang1,2,3(), LI Chuanrong1,3, GUO Lulu1,3, JIA Xinhong1,3, YANG Xinghua1   

  1. 1. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085,China
    2. School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049,China
    3. Key Laboratory of Cyberspace Security Defense, Beijing 100085,China
  • Received:2024-03-26 Online:2024-07-10 Published:2024-08-02

Abstract:

In the current era of large-scale industrial digital transformation, the integration of cloud-native architecture with microservices technology has become the core competitive advantage of transformation. This development model improves the integrity and flexibility of the software development, deployment, and testing processes. However, with the development of the Internet, the complexity of Trace data and timing issues in a microservices architecture have led to lower accuracy in anomaly detection and slower root cause localization. In response to these challenges, this paper initially proposed a time-based, multi-dimensional metric anomaly detection algorithm. This algorithm combined multi-dimensional metrics with time series anomaly detection to significantly increase the accuracy of anomaly detection. By improving the Service Trace Metric Vector, it addressed the lower accuracy issues in anomaly detection when physical resources were sufficient and overcomes the limitations of traditional anomaly detection methods through time series detection. Additionally, this paper proposed a root cause localization algorithm based on a “link-operation” graph combined with context. This algorithm effectively improved the accuracy of root cause localization by deeply analyzing the dependency relationships between services in historical Trace data. The algorithm merged structurally similar Trace graphs, not only saving a considerable amount of time in graph construction but also enhancing the efficiency and precision of root cause localization. Experiments results indicate that the methods proposed in this paper can identify and localize the root causes of anomalies more quickly and accurately compared to traditional methods.

Key words: cloud-native, microservices, Kubernetes, abnormal detection, root cause localization

CLC Number: