Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Cluster Causal Diagrams: An Information-Theoretic Approach
Authors: Xueyan Niu, Xiaoyun Li, Ping Li
IJCAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and real data support the effectiveness of the proposed method. |
| Researcher Affiliation | Industry | Xueyan Niu, Xiaoyun Li, Ping Li Cognitive Computing Lab Baidu Research 10900 NE 8th St. Bellevue, WA 98004, USA EMAIL |
| Pseudocode | Yes | Algorithm 1 Learning C-DAG representation; Algorithm 2 Greedy Search |
| Open Source Code | No | The paper does not include any explicit statement or link indicating that the source code for their methodology is made publicly available. |
| Open Datasets | Yes | As an example of real-world application, we apply our method to the protein signaling dataset [Sachs et al., 2005], which contains the expression levels of n = 11 proteins and phospholipids in human immune system cells, with N = 7466 observations. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test splits for the datasets used in the experiments. It mentions N = 1000 data points for synthetic and N = 7466 observations for real data, but no specific splitting methodology. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions the 'pgmpy package in Python' and 'the traditional kNN-based non-parametric estimator [Kraskov et al., 2004]' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | In each scenario, we simulate N = 1000 data points and run Algorithm 1 with Z = 500 iterations. We set the parameter of the Erd os-R enyi graph ρ = 0.5 for random sampling and the channel noise parameters p1 = p2 = 0.1. We run our algorithms with ρ = 0.5 and Z = 500. The resulting C-DAG, shown in Figure 4b, complies with definition (2). In particular, the algorithm successfully discovered the two groups of closely related molecules, {Plcg, PIP3, PIP2} and {PKC, PKA, Jnk, Raf, P38, Mek, Erk, Akt}, in the biological process, as expected from the ground truth. |