Root Cause Analysis of Failures in Microservices through Causal Discovery

Authors: Azam Ikram, Sarthak Chakraborty, Subrata Mitra, Shiv Saini, Saurabh Bagchi, Murat Kocaoglu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive evaluation of our proposed solution to answer the following questions: (1) How effective is RCD for finding the interventional target? and (2) How quickly can RCD find the interventional target? We report more detailed experiments in the appendix.
Researcher Affiliation Collaboration 1Purdue University, USA 2Adobe Research, India {mikram,sbagchi,mkocaoglu}@purdue.edu {sarchakr,sumitra,shsaini}@adobe.com
Pseudocode Yes Complete pseudo-code of the algorithm is given in Algorithm 1. Algorithm 1 Root Cause Discovery Algorithm (RCD)
Open Source Code Yes Our source code is available online at github.com/azamikram/rcd.
Open Datasets No The paper mentions generating synthetic data using pyAgrum, collecting data from a Sock-shop application test bed, and using real-world data collected from Grafana. However, it does not provide specific access information (e.g., link, DOI, or formal citation with authors and year) for any publicly available or open dataset used for training or general experimentation.
Dataset Splits No The paper describes data collection methods and sizes, such as 'we draw 10K samples for the normal and anomalous states' or 'we gathered 50 datasets', but it does not specify explicit train/validation/test splits (e.g., percentages or counts) or reference predefined splits for reproducibility.
Hardware Specification No The paper mentions experiments were run on a 'production-based microservice system hosted on AWS cloud-native system' for real data, but it does not specify any concrete hardware details such as GPU models, CPU types, or specific cloud instance specifications used for running the experiments.
Software Dependencies Yes We implemented Ψ-PC in Python using the causal-learn package2. 2github.com/cmu-phil/causal-learn To generate synthetic data, we used pyAgrum3 with a randomly generated DAG to draw samples for the normal and anomalous dataset. 3pyagrum.readthedocs.io/en/1.0.0/
Experiment Setup Yes For all experiments, we set γ to 5 for RCD in all our experiments unless specified otherwise. Finally, to get statistically significant results, we ran all the experiments 100 times and plotted the average. The max in-degree of all the DAGs are set to 3 and the total number of states for every node is 6. Furthermore, for every experiment, we draw 10K samples for the normal and anomalous states of the system.