Netinfo Security ›› 2023, Vol. 23 ›› Issue (3): 73-83.doi: 10.3969/j.issn.1671-1122.2023.03.008
Previous Articles Next Articles
SHI Yuan1,2, LI Yang1,2(), ZHAN Mengqi1,2
Received:
2022-11-13
Online:
2023-03-10
Published:
2023-03-14
Contact:
LI Yang
E-mail:liyang@iie.ac.cn
CLC Number:
SHI Yuan, LI Yang, ZHAN Mengqi. A Multi-Dimensional Root Cause Localization Algorithm for Microservices[J]. Netinfo Security, 2023, 23(3): 73-83.
Add to citation manager EndNote|Ris|BibTeX
URL: http://netinfo-security.org/EN/10.3969/j.issn.1671-1122.2023.03.008
向量状态 | STMV ID | a的响应时间/ms | b的响应时间/ms | a的cpu使用量/core | b的cpu使用量/core | a的内存使用量/byte | b内存使用量/byte | a的 摘要 | b的 摘要 |
---|---|---|---|---|---|---|---|---|---|
正常 | 1 | 222 | 209 | 0.207 | 0.219 | 2765402112 | 3866624 | 0.8279 | 1.7778 |
正常 | 2 | 198 | 203 | 0.244 | 0.239 | 2768973824 | 3903488 | 0.8465 | 1.8079 |
正常 | 3 | 212 | 204 | 0.232 | 0.219 | 2780045312 | 4788224 | 0.8562 | 1.7898 |
异常 | 4 | 1302 | 1138 | 0.209 | 0.282 | 2680045312 | 4687824 | 0.8892 | 1.7989 |
异常 | 5 | 1102 | 1305 | 0.392 | 0.246 | 2700045312 | 4695824 | 0.8578 | 1.7789 |
异常 | 6 | 232 | 198 | 2.827 | 0.267 | 2710045435 | 4995912 | 1.012 | 1.8789 |
异常 | 7 | 221 | 178 | 0.401 | 0.298 | 2980045345 | 3803483 | 1000.2 | 28.2 |
算法名称 | 响应时间 异常 | 调用路径 异常 | 微服务负载 异常 | 微服务系统 环境异常 | ||||
---|---|---|---|---|---|---|---|---|
准确率 | 召回率 | 准确率 | 召回率 | 准确率 | 召回率 | 准确率 | 召回率 | |
Hard-coded Rule算法 | 0.89 | 0.79 | N/A | N/A | N/A | N/A | N/A | N/A |
Multimodal LSTM 算法 | 0.62 | 0.95 | N/A | 0.93 | 0.53 | 0.67 | 0.69 | 0.89 |
Auto-Encoding Variational Bayes算法 | 0.16 | 0.52 | 0.17 | 0.98 | 0.64 | 0.57 | 0.71 | 0.84 |
Omni- Anomaly 算法 | 0.45 | 0.49 | 0.46 | 0.94 | 0.6 | 0.94 | 0.65 | 0.91 |
Trace- Anomaly 算法 | 0.98 | 0.97 | N/A | 0.89 | N/A | N/A | N/A | N/A |
本文算法 | 0.97 | 0.98 | N/A | 0.91 | 0.91 | 0.98 | 0.93 | 0.99 |
Metric | RS 算法 | MonitorRank 算法 | Microscope 算法 | MicroRCA 算法 | 本文所提 算法 |
---|---|---|---|---|---|
服务接口响应超时根因 | |||||
PR@1 | 0.18 | 0.24 | 0.71 | 0.87 | 0.89 |
PR@3 | 0.43 | 0.57 | 0.89 | 0.93 | 0.95 |
MAP | 0.43 | 0.57 | 0.89 | 0.93 | 0.95 |
服务资源利用异常根因 | |||||
PR@1 | 0.21 | 0.17 | 0.45 | 0.76 | 0.89 |
PR@3 | 0.46 | 0.33 | 0.67 | 0.71 | 0.91 |
MAP | 0.42 | 0.45 | 0.65 | 0.7 | 0.92 |
云主机资源利用异常根因 | |||||
PR@1 | 0.1 | 0.6 | 0.52 | 0.78 | 0.91 |
PR@3 | 0.26 | 0.65 | 0.56 | 0.83 | 0.93 |
MAP | 0.41 | 0.61 | 0.6 | 0.81 | 0.94 |
[1] | LI Zhenhao. Development and Impact Analysis of Microservice Architecture[J]. China CIO News, 2017, 1: 154-155. |
李贞昊. 微服务架构的发展与影响分析[J]. 信息系统工程, 2017, 1: 154-155. | |
[2] |
BOETTIGER C. An Introduction to Docker for Reproducible Research, with Examples from the R Environment[J]. ACM SIGOPS Operating Systems Review, 2015, 49(1): 71-79.
doi: 10.1145/2723872.2723882 URL |
[3] |
BERNSTEIN D. Containers and Cloud: From LXC to Docker to Kubernetes[J]. IEEE Cloud Computing, 2014, 1(3): 81-84.
doi: 10.1109/MCC.2014.51 URL |
[4] | ZHANG Xuejian, ZHANG Yu, CHUAN Tao, et al. Construction Method of It Operation and Maintenance Data Management System Based on Big Data Technology[J]. Electronic Science and Technology, 2018, 31(4): 84-86. |
张雪坚, 张榆, 钏涛, 等. 基于大数据技术的IT运维数据管理系统构建方法[J]. 电子科技, 2018, 31(4): 84-86. | |
[5] | WANG Wei, SHEN Xudong. Research on Anomaly Detection Al- Gorithm of Migration Time Series Based on Instance[J]. Netinfo Security, 2019, 19(3): 11-18. |
王伟, 沈旭东. 基于实例的迁移时间序列异常检测算法研究[J]. 信息网络安全, 2019, 19(3): 11-18. | |
[6] | LAN Qing. Application Analysis of Intelligent Operation and Maintenance in Enterprise Mmanagement[J]. Electronic World, 2020(5): 89-90. |
兰清. 智能运维在企业IT管理中的应用分析[J]. 电子世界, 2020(5): 89-90. | |
[7] | GUO Hongcheng, LIN Xingyu, YANG Jian, et al. TransLog: A Unified Transformer-Based Framework for Log Anomaly Detection[EB/OL]. (2022-01-17)[2022-11-10]. https://doi.org/10.48550/arXiv.2201.00016. |
[8] | HE Shilin, ZHU Jieming, HE Pinjia, et al. Loghub: A Large Collection of System Log Datasets Towards Automated Log Analytics[EB/OL]. (2020-09-14)[2022-11-10]. https://doi.org/10.48550/arXiv.2008.06448. |
[9] | JACOB D, CHANG Mingwei, KENTON L, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[EB/OL]. (2019-05-24)[2022-11-10]. https://doi.org/10.48550/arXiv.1810.04805. |
[10] | HALL S. Encoding/Decoding[M]. New York: Culture, Media, Language, 1980. |
[11] | TIMCENKO V, GALIN S. Ensemble Classifiers for Supervised Anomaly Based Network Intrusion Detection[C]// IEEE. 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP). New York: IEEE, 2017: 13-19. |
[12] |
WANG W, KANNEG D. An Integrated Classifier for Gear System Monitoring[J]. Mechanical Systems and Signal Processing, 2009, 23(4): 1298-1312.
doi: 10.1016/j.ymssp.2008.10.006 URL |
[13] | XU Yong, ZHU Yaokang, QIAO Bo, et al. Tracelingo: Trace Representation and Learning for Performance Issue Diagnosis in Cloud Services[C]// IEEE. 2021 IEEE/ACM International Workshop on Cloud Intelligence (Cloudintelligence). New York: IEEE, 2021: 37-40. |
[14] | ZHANG Shenglin, LIN Xiaofei, SUN Yongqian, et al. Research on Unsupervised KPI Anomaly Detection Based on Deep Learning[J]. Frontiers of Data and Computing, 2020, 2(3): 87-100. |
[15] | LIU Ping, XU Haowen, OUYANG Qianyu, et al. Unsupervised Detection of Microservice Ttrace Anomalies Through Service-Level Deep Bayesian Networks[C]// IEEE. 2020 IEEE 31st International Symposium on Software Reliability Engineering(ISSRE). New York: IEEE, 2020: 48-58. |
[16] | CHANDOLA V, BANERJEE A, KUMAR V. Anomaly Detection: A Survey[J]. ACM Computing Surveys, 2009, 41(3): 1-58. |
[17] | AHMED T. Online Anomaly Detection Using KDE[C]// IEEE. Proceedings of the 28th IEEE conference Global Telecommunications. New York: IEEE, 2009: 1009-1016. |
[18] |
AHMED F, ERMAN J, GE Zihui, et al. Detecting and Localizing End-to-End Performance Degradation for Cellular Data Services Based on TCP Loss Ratio and Round Trip Time[J]. IEEE/ACM Transactions on Networking, 2017, 25(6): 3709-3722.
doi: 10.1109/TNET.2017.2761758 URL |
[19] | LIN F, MUZUMDAR K, LAPTEV N P, et al. Fast Dimensional Analysis for Root Cause Investigation in a Large-Scale Service Environment[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2020, 4(2): 1-23. |
[20] |
SUN Yongqian, ZHAO Youjian, SU Ya, et al. Hotspot: Anomaly Localization for Additive KPIs with Multi-Dimensional Attributes[J]. IEEE Access, 2018, 6: 10909-10923.
doi: 10.1109/ACCESS.2018.2804764 URL |
[21] | GU Jiazhen, LUO Chuan, QIN Si, et al. Efficient Incident Identification from Multi-Dimensional Issue Rports via Meta-Heuristic Search[C]// ACM. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2020: 292-303. |
[22] | XING Wenpu, GHORBANI A. Weighted PageRank Algorithm[C]// IEEE. Proceedings Of the Second Annual Conference on Communication Networks and Services Research,. New York: IEEE, 2004: 305-314. |
[23] |
FOUSS F, PIROTTE A, RENDERS J M, et al. Random-Walk Computation of Similarities Between Nodes of a Graph with Application to Collaborative Recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 355-369.
doi: 10.1109/TKDE.2007.46 URL |
[24] |
KIM M, SUMBALY R. Root Cause Detection in a Service-Oriented Architecture[J]. ACM SIGMETRICS Performance Evaluation Review, 2013, 41(1): 93-104.
doi: 10.1145/2494232.2465753 URL |
[25] |
BRIN S, PAGE L. The Anatomy of a Large-Scale Hypertextual Web Search Engine[J]. Computer Networks and ISDN Systems, 1998, 30(1): 107-117.
doi: 10.1016/S0169-7552(98)00110-X URL |
[26] |
TAI Liyuan, TIAN Chunqi, WANG Wei. Anomaly Detection of Large Scale Microservice Architecture Software System Based on Log Parsing[J]. Computer Science and Application, 2019, 9(12): 2266-2276.
doi: 10.12677/CSA.2019.912252 URL |
[27] | JIA Tong, YANG Lin, CHEN Pengfei, et al. LogSed: Anomaly Diagnosis Through Mining Time-Weighted Control Flow Graph in Logs[C]// IEEE. IEEE International Conference on Cloud Computing. New York: IEEE, 2017: 447-455. |
[28] | GULEMKO A, SCHMIDT F, ACKER A, et al. Ddetecting Anomalous Behavior of Black-Box Services Modeled with Distance-Based Online Clustering[C]// IEEE. 2018 IEEE 11th International Conference on Cloud Computing. New York: IEEE, 2018: 912-915. |
[29] | SAMIR A, PAHL C. DLA: Detecting and Localizing Anomalies in Containerized Microservice Architectures Using Markov Models[C]// IEEE. 2019 7th International Conference on Future Internet of Things and Cloud (FiCloud). New York: IEEE, 2019: 205-213. |
[30] | NEDELKOSKI S, CARDOSO J, KAO O. Anomaly Detection and Classification Using Distributed Tracing and Deep Learning[C]// IEEE. 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). New York: IEEE, 2019: 241-250. |
[31] | MA Meng, XU Jingmin, WANG Yuan, et al. AutoMAP: Diagnose Your Microservice-Based Web Applications Automatically[C]// ACM. Proceedings of the Web Conference 2020. New York: ACM, 2020: 246-258. |
[32] | AN J, CHO S. Variational Autoencoder Based Anomaly Detection Using Reconstruction Probability[J]. Special Lecture on IE, 2015, 2(1): 1-18. |
[33] | SHLENS J. A Tutorial on Principal Component Analysis[EB/OL]. (2014-04-03)[2022-11-10]. https://doi.org/10.48550/arXiv.1404.1100. |
[34] | KINGMA D P, WELLING M. Auto-Encoding Variational Bayes[EB/OL]. (2014-05-01)[2022-11-10]. https://max.book118.com/html/2021/0321/5314343041003201.shtm. |
[35] | WU Li, TORDSSON J, ELMROTH E, et al. MicroRCA: Root Cause Localization of Performance Issues in Microservices[C]// IEEE. NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium. New York: IEEE, 2020: 1-9. |
[36] | JEF G, WIDOM J. Scaling Personalized Web Search[C]// ACM. Proceedings of the 12th International Conference on World Wide Web. New York: ACM, 2003: 271-279. |
[1] | XIA Yihang, ZHANG Zhilong, WANG Muzi, CHEN Libo. Dependency-Based Vulnerability Detection Method in Container Supply Chain [J]. Netinfo Security, 2023, 23(2): 76-84. |
[2] | WEN Weiping, LIU Chengjie, SHI Lin. A Null Pointer Reference Mining System Based on Data Flow Tracing [J]. Netinfo Security, 2022, 22(9): 40-45. |
[3] | HUANG Zilong, ZHAN Dongyang, YE Lin, ZHANG Hongli. A Secure Container Management Approach Based on Virtual Machine Introspection [J]. Netinfo Security, 2022, 22(11): 55-61. |
[4] | ZHENG Jun, NIE Rong, WANG Shouxin, TAN Yu’an. Attribute Weight Snapshot Selection Strategy Based on Docker Container Fault Recovery [J]. Netinfo Security, 2021, 21(5): 12-18. |
[5] | XU Yuwei, ZHAO Baokang, SHI Xiangquan, SU Jinshu. Low-latency Optimal Orchestration of Containerized Security Service Function Chain [J]. Netinfo Security, 2020, 20(7): 11-18. |
[6] | BIAN Manlin, WANG Liming. Analysis and Research on Vulnerability of Docker Container Isolation in Cloud Environment [J]. Netinfo Security, 2020, 20(7): 85-95. |
[7] | LIU Yuan, QIAO Wei. Research and Optimization of Container Network Based on Kubernetes Cluster System in Cloud Environment [J]. Netinfo Security, 2020, 20(3): 36-44. |
[8] | Wei WANG, Jinda CHANG, Dong GUO. Research and Implement on a PaaS Platform Management System Based on Cloud Software [J]. Netinfo Security, 2018, 18(2): 10-10. |
[9] | Hui ZHANG, Wei WANG, Dong GUO. A Framework for Building Microservices-based Desktop Cloud [J]. Netinfo Security, 2017, 17(2): 35-42. |
[10] | Cancan CHEN, Haoliang CUI, Wen ZHANG, Shaozhang NIU. A Defense Scheme for Activity Hijack Based on Safe Container [J]. Netinfo Security, 2017, 17(12): 61-66. |
[11] | Lianqun YANG, Jinying WEN, Shufa LIU, Feng WANG. An Improved Graph Partitioning Algorithm for User Behavior Abnormal Detection [J]. Netinfo Security, 2016, 16(6): 35-40. |
[12] | Nan ZHANG. Information Security Risks and Countermeasures of Container-Based Virtualization in Cloud Computing Environment [J]. Netinfo Security, 2015, 15(9): 278-282. |
[13] | Dong GUO, Wei WANG, Guo-sun ZENG. A New Cloudware PaaS Platform Based on Microservices Architecture [J]. Netinfo Security, 2015, 15(11): 15-20. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||