Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions
Authors: Ignavier Ng, Yujia Zheng, Jiji Zhang, Kun Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments validate the efficacy of the proposed procedure, and demonstrate that it scales up to hundreds of nodes with a high accuracy. In this work, we introduce several strategies to improve the scalability of exact search in the linear Gaussian setting, giving rise to a more reliable causal discovery procedure. Our main contributions can be summarized as follows: ... We demonstrate the efficacy of our super-structure estimation method and local search strategy by conducting extensive experiments, and show that it scales up to hundreds of nodes with a high accuracy. |
| Researcher Affiliation | Academia | Ignavier Ng1, Yujia Zheng1, Jiji Zhang2, Kun Zhang1 1 Carnegie Mellon University 2 Hong Kong Baptist University {ignavierng, yujiazh}@cmu.edu, zhangjiji@hkbu.edu.hk, kunz1@cmu.edu |
| Pseudocode | Yes | Algorithm 1 Local A* |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. It mentions using 'scikit-learn' [26] and 'bnlearn R package' [37], which are third-party libraries, but not their own implementation code. |
| Open Datasets | No | The paper describes generating synthetic data using the Erdös Rényi model and simulating samples from a linear Gaussian model, but it does not provide concrete access information (link, DOI, specific citation with authors/year, or repository) for the specific datasets generated for the experiments. It only describes the simulation process. |
| Dataset Splits | No | The paper describes simulating samples (e.g., "n = {300, 10000} samples"), but it does not specify explicit train/validation/test splits, nor does it refer to predefined splits with citations or provide any split percentages or sample counts for each partition needed for reproduction. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions software like "scikit-learn" [26] and "bnlearn R package" [37] as tools used, but it does not specify any version numbers for these or any other software components, which is necessary for reproducibility. |
| Experiment Setup | Yes | In our experiments, the ground truth DAGs are simulated using the Erdös Rényi model [6] with different degrees and number of variables. We construct the weighted adjacency matrix of each DAG using edge weights sampled uniformly from [−0.8, −0.2] ∪ [0.2, 0.8]. Based on the weighted matrix constructed, we simulate n ∈ {300, 10000} samples using the linear Gaussian model with exogenous noise variances sampled uniformly from [1, 2]. We report the structural Hamming distance (SHD) over the complete partial DAGs (CPDAGs). We also compute the F1 score of the undirected and directed edges in the estimated CPDAGs. |