信息网络安全 ›› 2024, Vol. 24 ›› Issue (7): 1062-1075.doi: 10.3969/j.issn.1671-1122.2024.07.008

• 理论研究 • 上一篇    下一篇

基于上下文的异常根因算法

周书丞1,2,3, 李杨1,2,3(), 李传荣1,3, 郭璐璐1,3, 贾辛洪1,3, 杨兴华1   

  1. 1.中国科学院信息工程研究所,北京 100085
    2.中国科学院大学网络空间安全学院,北京 100049
    3.网络空间安全防御重点实验室,北京 100085
  • 收稿日期:2024-03-26 出版日期:2024-07-10 发布日期:2024-08-02
  • 通讯作者: 李杨 liyang@iie.ac.cn
  • 作者简介:周书丞(1999—),男,吉林,硕士研究生,主要研究方向为智能运维|李杨(1980—),女,北京,副研究员,博士,主要研究方向为大数据安全、网络安全、智能运维|李传荣(1989—),男,湖北,助理研究员,硕士,主要研究方向为数据安全、数据管理平台|郭璐璐(1996—),女,山西,工程师,硕士,主要研究方向为数据安全、风险监测|贾辛洪(1993—),男,北京,工程师,硕士,主要研究方向为数据安全|杨兴华(1985—),男,山东,工程师,硕士,主要研究方向为移动通信安全、安全智能运维。
  • 基金资助:
    国家自然科学基金(62372450)

Context-Based Abnormal Root Cause Algorithm

ZHOU Shucheng1,2,3, LI Yang1,2,3(), LI Chuanrong1,3, GUO Lulu1,3, JIA Xinhong1,3, YANG Xinghua1   

  1. 1. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100085,China
    2. School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049,China
    3. Key Laboratory of Cyberspace Security Defense, Beijing 100085,China
  • Received:2024-03-26 Online:2024-07-10 Published:2024-08-02

摘要:

在当今大规模产业数字化转型的时代,云原生架构与微服务技术的结合已经成为转型的核心竞争力。这种开发模式提高了软件开发、部署和测试流程的完整性与灵活性。然而,随着互联网的发展,微服务架构下Trace数据的复杂性和时序问题导致异常检测准确率较低、根因定位较慢。针对这些挑战,文章提出了一种基于时序的多维度指标异常检测算法。该算法将多维度指标与时序异常检测结合,显著提高了异常检测的准确率。通过改良服务Trace度量向量,该算法解决了在物理资源充足的情况下异常检测准确性较低的问题,并通过时序检测进一步克服传统异常检测方法的局限。此外,文章还提出了一种基于“链路-操作”图与上下文结合的根因定位算法。该算法通过深入分析历史Trace数据中服务间的依赖关系,有效提高了根因定位的准确性。该算法将结构相似的Trace图融合,不仅节省了大量的构图时间,而且提高了根因定位的效率和精度。实验结果表明,与传统方法相比,本文所提的方法能更快、更准确地识别并定位异常根因。

关键词: 云原生, 微服务, Kubernetes, 异常检测, 根因定位

Abstract:

In the current era of large-scale industrial digital transformation, the integration of cloud-native architecture with microservices technology has become the core competitive advantage of transformation. This development model improves the integrity and flexibility of the software development, deployment, and testing processes. However, with the development of the Internet, the complexity of Trace data and timing issues in a microservices architecture have led to lower accuracy in anomaly detection and slower root cause localization. In response to these challenges, this paper initially proposed a time-based, multi-dimensional metric anomaly detection algorithm. This algorithm combined multi-dimensional metrics with time series anomaly detection to significantly increase the accuracy of anomaly detection. By improving the Service Trace Metric Vector, it addressed the lower accuracy issues in anomaly detection when physical resources were sufficient and overcomes the limitations of traditional anomaly detection methods through time series detection. Additionally, this paper proposed a root cause localization algorithm based on a “link-operation” graph combined with context. This algorithm effectively improved the accuracy of root cause localization by deeply analyzing the dependency relationships between services in historical Trace data. The algorithm merged structurally similar Trace graphs, not only saving a considerable amount of time in graph construction but also enhancing the efficiency and precision of root cause localization. Experiments results indicate that the methods proposed in this paper can identify and localize the root causes of anomalies more quickly and accurately compared to traditional methods.

Key words: cloud-native, microservices, Kubernetes, abnormal detection, root cause localization

中图分类号: