Root Cause Analysis of Failures in Microservices through Causal Discovery
Authors: Azam Ikram, Sarthak Chakraborty, Subrata Mitra, Shiv Saini, Saurabh Bagchi, Murat Kocaoglu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive evaluation of our proposed solution to answer the following questions: (1) How effective is RCD for finding the interventional target? and (2) How quickly can RCD find the interventional target? We report more detailed experiments in the appendix. |
| Researcher Affiliation | Collaboration | 1Purdue University, USA 2Adobe Research, India {mikram,sbagchi,mkocaoglu}@purdue.edu {sarchakr,sumitra,shsaini}@adobe.com |
| Pseudocode | Yes | Complete pseudo-code of the algorithm is given in Algorithm 1. Algorithm 1 Root Cause Discovery Algorithm (RCD) |
| Open Source Code | Yes | Our source code is available online at github.com/azamikram/rcd. |
| Open Datasets | No | The paper mentions generating synthetic data using pyAgrum, collecting data from a Sock-shop application test bed, and using real-world data collected from Grafana. However, it does not provide specific access information (e.g., link, DOI, or formal citation with authors and year) for any publicly available or open dataset used for training or general experimentation. |
| Dataset Splits | No | The paper describes data collection methods and sizes, such as 'we draw 10K samples for the normal and anomalous states' or 'we gathered 50 datasets', but it does not specify explicit train/validation/test splits (e.g., percentages or counts) or reference predefined splits for reproducibility. |
| Hardware Specification | No | The paper mentions experiments were run on a 'production-based microservice system hosted on AWS cloud-native system' for real data, but it does not specify any concrete hardware details such as GPU models, CPU types, or specific cloud instance specifications used for running the experiments. |
| Software Dependencies | Yes | We implemented Ψ-PC in Python using the causal-learn package2. 2github.com/cmu-phil/causal-learn To generate synthetic data, we used pyAgrum3 with a randomly generated DAG to draw samples for the normal and anomalous dataset. 3pyagrum.readthedocs.io/en/1.0.0/ |
| Experiment Setup | Yes | For all experiments, we set γ to 5 for RCD in all our experiments unless specified otherwise. Finally, to get statistically significant results, we ran all the experiments 100 times and plotted the average. The max in-degree of all the DAGs are set to 3 and the total number of states for every node is 6. Furthermore, for every experiment, we draw 10K samples for the normal and anomalous states of the system. |