Netinfo Security ›› 2024, Vol. 24 ›› Issue (7): 1062-1075.doi: 10.3969/j.issn.1671-1122.2024.07.008
Previous Articles Next Articles
ZHOU Shucheng1,2,3, LI Yang1,2,3(), LI Chuanrong1,3, GUO Lulu1,3, JIA Xinhong1,3, YANG Xinghua1
Received:
2024-03-26
Online:
2024-07-10
Published:
2024-08-02
CLC Number:
ZHOU Shucheng, LI Yang, LI Chuanrong, GUO Lulu, JIA Xinhong, YANG Xinghua. Context-Based Abnormal Root Cause Algorithm[J]. Netinfo Security, 2024, 24(7): 1062-1075.
Add to citation manager EndNote|Ris|BibTeX
URL: http://netinfo-security.org/EN/10.3969/j.issn.1671-1122.2024.07.008
向量 状态 | STMV ID | 响应时间a/ms | 响应时间b/ms | CPU 使用量 a/core | CPU 使用量 b/core | 内存 使用率 | 内存 使用率 | 硬盘 使用率 | 硬盘 使用率 | 数据库指标a | 数据库指标b | 集群摘要a | 集群摘要b |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
正常 | 1 | 210 | 198 | 0.382 | 0.266 | 22.616% | 21.933% | 29.399% | 25.539% | 2.4 | 2 | 1.854 | 10.542 |
正常 | 2 | 201 | 221 | 0.324 | 0.253 | 25.933% | 26.433% | 26.416% | 32.109% | 1.7 | 1.9 | 1.598 | 10.896 |
异常 | 3 | 1400 | 1286 | 0.314 | 0.321 | 23.321% | 24.215% | 30.214% | 27.256% | 2.1 | 2.4 | 1.578 | 10.317 |
异常 | 4 | 184 | 192 | 0.782 | 0.348 | 89.324% | 26.354% | 84.563% | 27.343% | 2.3 | 1.8 | 95.325 | 10.325 |
异常 | 5 | 198 | 231 | 0.332 | 0.278 | 24.954% | 22.436% | 33.532% | 35.513% | 45.9 | 1.9 | 16.235 | 10.235 |
异常 | 6 | 220 | 293 | 0.287 | 0.342 | 26.931% | 19.235% | 91.356% | 89.432% | 2.8 | 2.3 | 56.74 | 56.74 |
异常 | 7 | 1926 | 2030 | 0.815 | 0.362 | 75.432% | 25.677% | 32.346% | 91.349% | 2.1 | 2.5 | 258.3 | 258.3 |
算法名称 | 响应时间 异常检测 | 微服务负载 异常检测 | 充足物理资源异常检测 | 时序 异常检测 | ||||
---|---|---|---|---|---|---|---|---|
准确率 | 召回率 | 准确率 | 召回率 | 准确率 | 召回率 | 准确率 | 召回率 | |
DCW-MSA-AMC 算法[ | 53% | 75% | N/A | N/A | N/A | N/A | 75% | 79% |
DAEMON 算法[ | 52% | 61% | 41% | 53% | 59% | 82% | 82% | 86% |
Omni Anomaly 算法[ | 45% | 49% | 58% | 84% | 61% | 88% | 62% | 73% |
TraceAnomaly 算法[ | 98% | 97% | N/A | N/A | N/A | N/A | N/A | N/A |
基于多维度指标的异常检测算法[ | 97% | 98% | 89% | 92% | 91% | 95% | N/A | N/A |
本文算法 | 98% | 96% | 92% | 95% | 93% | 97% | 91% | 93% |
公式 | 定义 | 公式 | 定义 |
---|---|---|---|
Ochiai | Dstar2 | ||
Goodman | Sorensen | ||
Jaccard | RussellRao | ||
M2 | Dice |
Metric | Microscope 算法[ | MicroRCA算法[ | MicroRank 算法[ | 基于故障传播 子图算法[ | 本文算法 |
---|---|---|---|---|---|
微服务资源异常根因定位 | |||||
PR@1 | 39% | 71% | 79% | 85% | 90% |
PR@3 | 61% | 66% | 84% | 88% | 91% |
MAP | 57% | 67% | 86% | 88% | 91% |
集群主机物理资源充足情况下根因定位 | |||||
PR@1 | 46% | 72% | 71% | 89% | 91% |
PR@3 | 51% | 78% | 84% | 91% | 94% |
MAP | 55% | 76% | 86% | 91% | 93% |
时序异常的根因定位 | |||||
PR@1 | 62% | 52% | 71% | 75% | 89% |
PR@3 | 67% | 57% | 74% | 78% | 93% |
MAP | 71% | 58% | 73% | 81% | 95% |
[1] | JAMSHIDI P, PAHL C, MENDONCA N C, et al. Microservices: The Journey So Far and Challenges Ahead[J]. IEEE Software, 2018, 35(3): 24-35. |
[2] | RANNEY M. What I Wish I Had Known Before Scaling Uber To 1000 Services[EB/OL]. (2016-03-16)[2024-03-12]. http://gotocon.com/chicago-2016/presentation/What%20I%20Wish%20I%20Had%20Known%20Before%20Scaling%20Uber%20to%201000%20Services. |
[3] | FAN C F, JINDAL A, GERNDT M. Microservices vs Serverless: A Performance Comparison on a Cloud-Native Web Application[C]// IEEE. 10th International Conference on Cloud Computing and Services Science. New York: IEEE, 2020: 522-533. |
[4] | NEWMAN S. Building Microservices[M]. Sebastopol: O’Reilly Media, 2015. |
[5] | SONG Zhihua, ZHANG Han, ZHAO Yongmei, et al. An Intelligent Mission Planning Model for the Air Strike Operations against Islands Based on Neural Network and Simulation[EB/OL]. (2023-07-05)[2024-03-12]. https://onlinelibrary.wiley.com/doi/epdf/10.1155/2022/8172907. |
[6] | HAMILTON J D. Time Series Analysis[M]. Princeton: Princeton University Press, 2020. |
[7] | HOCHENBAUM J, VALLIS O S, KEJARIWAL A. Automatic Anomaly Detection in the Cloud via Statistical Learning[EB/OL]. (2017-04-24)[2024-03-12]. https://arxiv.org/abs/1704.07706. |
[8] | SÖYLEMEZ M, TEKINERDOGAN B, KOLUKıSA T A. Challenges and Solution Directions of Microservice Architectures: A Systematic Literature Review[EB/OL]. (2022-05-29)[2024-03-12]. https://doi.org/10.3390/app12115507. |
[9] | NANDI A, MANDAL A, ATREJA S, et al. Anomaly Detection Using Program Control Flow Graph Mining from Execution Logs[C]// ACM. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery. New York: ACM, 2016: 215-224. |
[10] | GÜNTHER C W, ALST WMVD. Fuzzy Mining- Adaptive Process Simplification Based on Multi-Perspective Metrics[C]// Springer. International Conference on Business Process Management. Heidelberg: Springer, 2007: 328-343. |
[11] | LOU J G, FU Q, YANG S, et al. Mining Program Workflow from Interleaved Traces[C]// ACM. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 613-622. |
[12] | LIU Ping, XU Haowen, OUYANG Qianyu, et al. Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks[C]// IEEE. 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). New York: IEEE, 2020: 48-58. |
[13] | NEDELKOSKI S, CARDOSO J, KAO O. Anomaly Detection and Classification Using Distributed Tracing and Deep Learning[C]// IEEE. 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). New York: IEEE, 2019: 241-250. |
[14] | WANG Tao, ZHANG Wenbo, XU Jiwei, et al. Workflow-Aware Automatic Fault Diagnosis for Microservice-Based Applications with Statistics[J]. IEEE Transactions on Network and Service Management, 2020, 17(4): 2350-2363. |
[15] | GULENKO A, SCHMIDT F, ACKER A, et al. Detecting Anomalous Behavior of Black-Box Services Modeled with Distance-Based Online Clustering[C]// IEEE. 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). New York: IEEE, 2018: 912-915. |
[16] | MARIANI L, PEZZÈ M, RIGANELLI O, et al. Predicting Failures in Multi-Tier Distributed Systems[EB/OL]. (2019-11-18)[2024-03-12]. https://doi.org/10.1016/j.jss.2019.110464. |
[17] | LI Wenze, PENG Xiaosheng, CHENG Kai, et al. A Short-Term Regional Wind Power Prediction Method Based on XGBoost and Multi-Stage Features Selection[C]// IEEE. 2020 IEEE 3rd Student Conference on Electrical Machines and Systems (SCEMS). New York: IEEE, 2020: 614-618. |
[18] | ZHOU Xiang, PENG Xin, XIE Tao, et al. Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System and Empirical Study[J]. IEEE Transactions on Software Engineering, 2018, 47(2): 243-260. |
[19] | MI Haibo, WANG Huaimin, ZHOU Yangfan, et al. Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2013, 24(6): 1245-1255. |
[20] | LIU Dewei, HE Chuan, PENG Xin, et al. MicroHECL: High-Efficient Root Cause Localization in Large-Scale Microservice Systems[C]// IEEE. 2021 IEEE/ACM 43rd International Conference on Software Engineering:Software Engineering in Practice (ICSE-SEIP). New York: IEEE, 2021: 338-347. |
[21] | XIN Ruyue, CHEN Peng, ZHAO Zhiming. CausalRCA: Causal Inference Based Precise Fine-Grained Root Cause Localization for Microservice Applications[EB/OL]. (2023-05-06)[2024-03-12]. https://doi.org/10.1016/j.jss.2023.111724. |
[22] | SHI Yuan, LI Yang, ZHAN Mengqi. A Multi-Dimensional Root Cause Localization Algorithm for Microservices[J]. Netinfo Security, 2023, 23(3): 73-83. |
施园, 李杨, 詹孟奇. 一种面向微服务的多维度根因定位算法[J]. 信息网络安全, 2023, 23 (3): 73-83. | |
[23] |
SHAN Chengang, WU Chuge, XIA Yuanqing, et al. Adaptive Resource Allocation for Workflow Containerization on Kubernetes[J]. Journal of Systems Engineering and Electronics, 2023, 34(3): 723-743.
doi: 10.23919/JSEE.2023.000073 |
[24] | CHODOROW K, DIROLF M. MongoDB - The Definitive Guide: Powerful and Scalable Data Storage[M]. Sebastopol: O’Reilly Media, 2010. |
[25] | SINGH V, PEDDOJU S K. Container-Based Microservice Architecture for Cloud Applications[C]// IEEE. 2017 International Conference on Computing, Communication and Automation (ICCCA). New York: IEEE, 2017: 847-852. |
[26] | SHLENS J. A Tutorial on Principal Component Analysis[EB/OL]. (2014-04-03)[2024-03-12]. https://doi.org/10.48550/arXiv.1404.1100. |
[27] | DOKUMENTOV A, HYNDMAN R J. Str: A Seasonal-Trend Decomposition Procedure Based on Regression[EB/OL]. (2021-07-02)[2024-03-12]. https://www.xueshufan.com/publication/3121710282. |
[28] | WEN Qingsong, GAO Jingkun, SONG Xiaomin, et al. RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series[EB/OL]. (2019-07-17)[2024-03-12]. https://doi.org/10.1609/aaai.v33i01.33015409. |
[29] | BOX G E, JENKINS G M, REINSEL G C, et al. Time Series Analysis: Forecasting and Control[M]. New York: John Wiley & Sons, 2015. |
[30] | CHEN Xuanhao, DENG Liwei, HUANG Feiteng, et al. DAEMON: Unsupervised Anomaly Detection and Interpretation for Multivariate Time Series[C]// IEEE. 2021 IEEE 37th International Conference on Data Engineering (ICDE). New York: IEEE, 2021: 2225-2230. |
[31] | ZHOU Xiang, PENG Xin, XIE Tao, et al. Benchmarking Microservice Systems for Software Engineering Research[EB/OL]. (2018-05-27)[2024-03-12]. https://doi.org/10.1145/3183440.3194991. |
[32] | QI Sibo, CHEN Juan, CHEN Peng, et al. An Effective Dynamic Cost-Sensitive Weighting Based Anomaly Multi-Classification Model for Imbalanced Multivariate Time Series[C]// Springer. International Conference on Web Information Systems Engineering. Heidelberg: Springer, 2023: 781-790. |
[33] | SU Ya, ZHAO Youjian, NIU Chenhao, et al. Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network[C]// ACM. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD’19). New York: ACM, 2019: 2828-2837. |
[34] | YU Guangba, CHEN Pengfei, CHEN Hongyang, et al. MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments[EB/OL]. (2021-06-03)[2024-03-12]. https://doi.org/10.1145/3442381.3449905. |
[35] | LI M L, RAMACHANDRAN P, SAHOO S K, et al. Understanding the Propagation of Hard Errors to Software and Implications for Resilient System Design[J]. ACM Sigplan Notices, 2008, 43(3): 265-276. |
[36] | YANG Tianyi, SHEN Jiacheng, SU Yuxin, et al. Aid: Efficient Prediction of Aggregated Intensity of Dependency in Large-Scale Cloud Systems[C]// IEEE. 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021: 653-665. |
[37] | PEARSON S, CAMPOS J, JUST R, et al. Evaluating and Improving Fault Localization[C]// IEEE. 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). New York: IEEE, 2017: 609-620. |
[38] | LIN Jinjin, CHEN Pengfei, ZHENG Zibin. Microscope: Pinpoint Performance Issues with Causal Graphs in Micro-Service Environments[EB/OL]. (2018-11-07)[2024-03-12]. https://doi.org/10.1007/978-3-030-03596-9_1. |
[39] | WU L, TORDSSON J, ELMROTH E, et al. MicroRCA: Root Cause Localization of Performance Issues in Microservices[EB/OL]. [2024-03-12]. https://www.xueshufan.com/publication/2999561215. |
[1] | SHI Yuan, LI Yang, ZHAN Mengqi. A Multi-Dimensional Root Cause Localization Algorithm for Microservices [J]. Netinfo Security, 2023, 23(3): 73-83. |
[2] | Lianqun YANG, Jinying WEN, Shufa LIU, Feng WANG. An Improved Graph Partitioning Algorithm for User Behavior Abnormal Detection [J]. Netinfo Security, 2016, 16(6): 35-40. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||