Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions

Authors: Ignavier Ng, Yujia Zheng, Jiji Zhang, Kun Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments validate the efficacy of the proposed procedure, and demonstrate that it scales up to hundreds of nodes with a high accuracy. In this work, we introduce several strategies to improve the scalability of exact search in the linear Gaussian setting, giving rise to a more reliable causal discovery procedure. Our main contributions can be summarized as follows: ... We demonstrate the efficacy of our super-structure estimation method and local search strategy by conducting extensive experiments, and show that it scales up to hundreds of nodes with a high accuracy.
Researcher Affiliation Academia Ignavier Ng1, Yujia Zheng1, Jiji Zhang2, Kun Zhang1 1 Carnegie Mellon University 2 Hong Kong Baptist University {ignavierng, yujiazh}@cmu.edu, zhangjiji@hkbu.edu.hk, kunz1@cmu.edu
Pseudocode Yes Algorithm 1 Local A*
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the methodology described. It mentions using 'scikit-learn' [26] and 'bnlearn R package' [37], which are third-party libraries, but not their own implementation code.
Open Datasets No The paper describes generating synthetic data using the Erdös Rényi model and simulating samples from a linear Gaussian model, but it does not provide concrete access information (link, DOI, specific citation with authors/year, or repository) for the specific datasets generated for the experiments. It only describes the simulation process.
Dataset Splits No The paper describes simulating samples (e.g., "n = {300, 10000} samples"), but it does not specify explicit train/validation/test splits, nor does it refer to predefined splits with citations or provide any split percentages or sample counts for each partition needed for reproduction.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions software like "scikit-learn" [26] and "bnlearn R package" [37] as tools used, but it does not specify any version numbers for these or any other software components, which is necessary for reproducibility.
Experiment Setup Yes In our experiments, the ground truth DAGs are simulated using the Erdös Rényi model [6] with different degrees and number of variables. We construct the weighted adjacency matrix of each DAG using edge weights sampled uniformly from [−0.8, −0.2] ∪ [0.2, 0.8]. Based on the weighted matrix constructed, we simulate n ∈ {300, 10000} samples using the linear Gaussian model with exogenous noise variances sampled uniformly from [1, 2]. We report the structural Hamming distance (SHD) over the complete partial DAGs (CPDAGs). We also compute the F1 score of the undirected and directed edges in the estimated CPDAGs.