Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

Authors: Feng Xie, Biwei Huang, Zhengming Chen, Ruichu Cai, Clark Glymour, Zhi Geng, Kun Zhang

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both synthetic and three real-world data sets show the effectiveness of the proposed approach. Keywords: Causal Discovery, Latent Variable, Latent Hierarchical Structure, Latent Causal Graph, Non-Gaussianity. ... In this section, we show the simulation results on synthetic data to demonstrate the correctness of our proposed method. ... In this section, we apply our algorithm to three real-world data sets to show the efficacy of the proposed method.
Researcher Affiliation Academia Feng Xie EMAIL Department of Applied Statistics, Beijing Technology and Business University Beijing, 102488, China; Biwei Huang EMAIL Halicioglu Data Science Institute (HDSI), University of California San Diego La Jolla, San Diego, California, 92093, USA; Zhengming Chen EMAIL School of Computer Science, Guangdong University of Technology Guangzhou, 510006, China Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence Abu Dhabi, UAE; Ruichu Cai EMAIL School of Computer Science, Guangdong University of Technology Guangzhou, 510006, China; Clark Glymour EMAIL Department of Philosophy, Carnegie Mellon University Pittsburgh, PA 15213, USA; Zhi Geng EMAIL Department of Applied Statistics, Beijing Technology and Business University Beijing, 102488, China; Kun Zhang EMAIL Department of Philosophy, Carnegie Mellon University Pittsburgh, PA 15213, USA Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence Abu Dhabi, UAE
Pseudocode Yes The entire process is summarized in Algorithm 1. ... Algorithm 1 Latent Hierarchical Causal Structure Learning (La Hi Ca Sl) ... Algorithm 2 Locate Latent Variables ... Algorithm 3 Identify Global Causal Clusters ... Algorithm 4 Determine Latent Variables ... Algorithm 5 Update Active Data ... Algorithm 6 Locally Infer Causal Structure ... Algorithm 7 Test GIN ... Algorithm 8 Identify Global Causal Clusters+ ... Algorithm 9 Locally Infer Causal Structure+
Open Source Code No The paper mentions a CC-BY 4.0 license for the paper itself and references the TETRAD package for baseline algorithms, but does not provide specific access information or a direct statement for the open-sourcing of the authors' own implementation code.
Open Datasets Yes We demonstrate the efficacy of our algorithm on both synthetic and real-world datasets across three different domains. ... Barbara Byrne conducted a study to investigate the impact of organizational (role ambiguity, role conflict, classroom climate, and superior support, etc.) and personality (self-esteem, external locus of control) on three facets of burnout in full-time elementary teachers (Byrne, 2016). ... We applied our La Hi Ca Sl algorithm to a multitasking behavior model, represented by a hierarchical SEM (Himi et al., 2019). ... We finally apply our La Hi Ca Sl algorithm to a classic dataset, i.e., Holzinger & Swineford1939 dataset. This data set consists of mental ability test scores from 301 American 7th- and 8th-grade students. We focus on 9 out of the original 26 tests as done in J oreskog et al. (2016).
Dataset Splits No The paper mentions varying sample sizes (N = 3k, 5k, 10k) for synthetic data and total sample counts for real-world datasets, but does not explicitly provide training, validation, or test splits. For example, for the Teacher's Burnout Study, it states "The data set consists of 32 observed variables with 599 samples in total."
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using the "TETRAD package" for the BPC and FOFC algorithms (baselines), but does not specify a version number for TETRAD or any other software dependencies, libraries, or programming languages used in their own implementation.
Experiment Setup Yes The causal strengths bij were generated uniformly from [−2, 0.5] ∪ [0.5, 2], and the non-Gaussian noise terms were generated from the square of exponential distributions. ... The kernel width in the HSIC test was set to 0.05. ... The significance levels of La Hi Ca Sl, BPC, and FOFC algorithms were all set to 0.00001. ... The significance levels of La Hi Ca Sl, BPC, and FOFC were set to 0.001, 0.0001, and 0.000001 respectively.