Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Big Gaussian Bayesian Networks: Partition, Estimation and Fusion

Authors: Jiaying Gu, Qing Zhou

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive numerical experiments demonstrate the competitive performance of our PEF method, in terms of both speed and accuracy compared to existing methods. Our method can improve the accuracy of structure learning by 20% or more, while reducing running time up to two orders-of-magnitude.
Researcher Affiliation	Academia	Jiaying Gu EMAIL Qing Zhou EMAIL Department of Statistics University of California, Los Angeles Los Angeles, CA 90095, USA
Pseudocode	Yes	An outline of our clustering algorithm is shown in Algorithm 1. The complete algorithm for ﬁnding the candidate edge set A is summarized in Algorithm 2. The full fusion step is shown in Algorithm 3, which cycles through A iteratively until the structure of G does not change and return G.
Open Source Code	No	No explicit statement about the release of their own PEF method's source code or a direct link is provided. The paper only refers to third-party R packages used in their E-step (e.g., 'CCDr algorithm (Aragam and Zhou, 2015) in the R package sparsebn (Aragam et al., 2019)').
Open Datasets	Yes	All network structures were downloaded from the repository of the R package bnlearn (Scutari, 2010, 2017). The networks used in this work are: PATHFINDER, ANDES, DIABETES, PIGS, LINK, and MUNIN...
Dataset Splits	No	The paper mentions generating multiple datasets and using 'training data' and 'test datasets' (e.g., 'To this end, we generated 50 test datasets for each network, and calculated test data likelihood under an estimated DAG. [...] from training data.'), but it does not specify the exact methodology, percentages, or sample counts for splitting a single dataset into training, validation, or test sets.
Hardware Specification	Yes	When k was increased to 10, our device for running the tests, Mac Book Pro with 3.1 GHz Intel Core i7 processor, ran out of memory for the CCDr algorithm.
Software Dependencies	No	The paper mentions using several R packages (sparsebn, pcalg, bnlearn, igraph, Rcpp Armadillo) but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For all the experiments, we ran CCDr provided in the R package sparsebn. The CCDr algorithm outputs a solution path with an increasing number of edges. In order to enforce sparsity, we simply chose the DAG along the solution path with around 1.5p edges, and stopped running CCDr when the number of estimated edges on the path became greater than 2p by setting edge.threshold = 2p. [...] In our experiments, we set α = 10^-4 so that the PC algorithm can produce quite accurate PDAGs within a reasonable amount of time. [...] we limited this value to 3. [...] we set alpha = 0.001 and max.sx = 3. [...] the signiﬁcance level for all tests, including the α in Algorithm 2, is set to 0.001 in our implementation. [...] λ = 2 log p. We use this score when the number of nodes is large with p > n. When p n, we switch back to the regular BIC score, i.e. λ = log n.