Root Cause Analysis in Microservice Using Neural Granger Causal Discovery

Authors: Cheng-Ming Lin, Ching Chang, Wei-Yao Wang, Kuang-Da Wang, Wen-Chih Peng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on the synthetic and realworld microservice-based datasets demonstrate that RUN noticeably outperforms the state-of-the-art root cause analysis methods.
Researcher Affiliation Academia Cheng-Ming Lin, Ching Chang, Wei-Yao Wang, Kuang-Da Wang, Wen-Chih Peng National Yang Ming Chiao Tung University, Hsinchu, Taiwan zmlin1998.cs10@nycu.edu.tw, blacksnail789521@gmail.com, sf1638.cs05@nctu.edu.tw, gdwang.cs10@nycu.edu.tw, wcpeng@cs.nycu.edu.tw
Pseudocode No The paper describes its method in Section 4 'Methodology' but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is publicly available at https://github.com/zmlin1998/RUN.
Open Datasets No Dataset As no publicly available real-world dataset for root cause analysis is accessible due to data confidentiality, we test on a synthetic dataset and a test bed utilizing an actual microservice-based application. Sock-shop (Daniel Holbach 2022): The framework of sock-shop encompasses a total of 13 microservices, each developed using distinct technologies.
Dataset Splits No The paper mentions characteristics of the datasets but does not provide specific details on training, validation, or test splits (e.g., percentages or sample counts).
Hardware Specification Yes We implement our method on a machine with AMD EPYC 7302 16-Core CPU, NVIDIA RTX A5000 graphics cards.
Software Dependencies No The paper mentions using the Adam optimizer and setting learning rates/batch sizes but does not provide specific software library names with version numbers (e.g., 'PyTorch 1.9') which are typically considered software dependencies.
Experiment Setup Yes In the time series forecasting stage, the window size w is set to 32. We use the Adam (Kingma and Ba 2015) optimizer and set the learning rate as 0.001 and the batch size as 128. In the causal graph discovery stage, the threshold H is set to 0.5. The training epochs of the pre-training and fine-tuning stages are set to 50. In the diagnosis stage, we set the value of the personalization vector Pd as 1 and Pn as 0.5, similar to (Wang et al. 2021). The k in HR@k is set to 1, 3, and 5.