信息网络安全 ›› 2023, Vol. 23 ›› Issue (3): 73-83.doi: 10.3969/j.issn.1671-1122.2023.03.008

• 技术研究 • 上一篇    下一篇

一种面向微服务的多维度根因定位算法

施园1,2, 李杨1,2(), 詹孟奇1,2   

  1. 1.中国科学院信息工程研究所,北京 100093
    2.中国科学院大学网络安全学院,北京 100049
  • 收稿日期:2022-11-13 出版日期:2023-03-10 发布日期:2023-03-14
  • 通讯作者: 李杨 E-mail:liyang@iie.ac.cn
  • 作者简介:施园(1996—),男,四川,硕士研究生,主要研究方向为智能运维|李杨(1980—),女,北京,副研究员,博士,主要研究方向为5G安全、大数据处理与分析、智能运维|詹孟奇(1996—),男,四川,博士研究生,主要研究方向为网络协议异常检测
  • 基金资助:
    国家重点研发计划(2019YFB1005200);国家重点研发计划(2019YFB1005201)

A Multi-Dimensional Root Cause Localization Algorithm for Microservices

SHI Yuan1,2, LI Yang1,2(), ZHAN Mengqi1,2   

  1. 1. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
    2. School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-11-13 Online:2023-03-10 Published:2023-03-14
  • Contact: LI Yang E-mail:liyang@iie.ac.cn

摘要:

伴随着Docker等虚拟化容器技术的逐渐成熟,因其可扩展性、灵活性等特点与微服务架构完美契合,工业界逐渐将微服务架构应用部署在基于容器的云环境下,并用Kubernetes等容器编排工具来管理应用的全生命周期。在这样复杂的微服务架构下,如何使用人工智能技术高效发现异常并且定位根因成为重中之重。首先,文章总结了在微服务系统环境下进行异常检测和根因定位所面临的主要挑战和现有的关键技术;然后,针对现有技术异常检测覆盖范围不全面的问题,文章提出了一种基于无监督学习的多维度的异常检测方法,在调用链Trace数据的基础上结合服务和机器资源利用数据进行综合分析,确保能够检测出服务响应时间异常的同时,也能够识别服务资源利用异常和环境异常;最后,在异常已知的情况下,为了减少根因定位时间,拓展定位范围和缩小粒度,文章提出了一种轻量的基于异常传播子图的方法,将服务接口和机器节点两种维度的数据统一到异常传播子图中进行根因定位。实验表明,文章所提方法与已有方法相比,定位时间更短,不仅拓宽了根因定位场景,而且准确率也有明显提升。

关键词: 容器, 微服务, Kubernetes, 异常检测, 根因定位

Abstract:

With the gradual maturity of virtualized container technologies such as Docker, because of its scalability, flexibility and other characteristics that perfectly fit the microservice architecture, the industry gradually deploys microservice architecture applications in container-based cloud environments, and use container orchestration tools such as Kubernetes to manage the full life cycle of the application. Under such a complex microservice architecture, how to use artificial intelligence technology to efficiently find abnormalities and locate the root cause becomes the top priority. First, the article summarized the main challenges and existing key technologies for anomaly detection and root cause localization in the context of microservice systems. Then, aiming at the problem that the coverage of existing anomaly detection was not comprehensive, we proposed a multi-dimensional anomaly detection method based on unsupervised learning, it combined service and machine resource utilization data for comprehensive analysis on the basis of call chain Trace data to ensure that service response time anomalies can be detected, and service resource utilization anomalies and environmental anomalies can also be identified. Finally, in the case of known anomalies, in order to reduce the root cause localization time, expand the localization range and reduce the granularity, we proposed a lightweight anomaly propagation subgraph-based method. It unified the data of the two dimensions of service interface and machine node into the anomaly propagation subgraph for root cause localization. The experiments results show that proposed method has shorter localization time compared with the existing methods, and not only broadens the root cause localization scenario, but also has a significant improvement in accuracy.

Key words: container, microservices, Kubernetes, abnormal detection, root cause localization

中图分类号: