Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ProDAG: Projected Variational Inference for Directed Acyclic Graphs

Authors: Ryan Thompson, Edwin V. Bonilla, Robert Kohn

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic and real data demonstrate that our approach often delivers more accurate inference than existing methods for DAG learning. Our toolkit for Pro DAG is available on Git Hub. In summary, our core contributions are: [...] 3. An extensive suite of empirical evaluations that demonstrate state-of-the-art results across a wide variety of synthetic datasets, and validated on a real biological dataset
Researcher Affiliation	Academia	Ryan Thompson University of Technology Sydney Edwin V. Bonilla CSIRO s Data61 Robert Kohn University of New South Wales
Pseudocode	Yes	Algorithm 1 Pro DAG Input: Initialization θ, data X Rn p, learning rate η > 0, no. of DAG samples L N while Not converged do Sample W (l) qθ( W) and set W (l) = proλ( W (l)) for l = 1, . . . , L Compute ˆ ELBO(θ) = 1/L PL l=1 log p(X \| W (l)) KL[qθ( W) p( W)] Update θ θ + η θ ˆ ELBO(θ) end while Output: Optimized parameters θ
Open Source Code	Yes	Our toolkit for Pro DAG is available on Git Hub. [...] 4. A user-friendly, well-documented, and open-source Julia implementation, made publicly available to promote adoption and reproducibility among researchers and practitioners. [...] Code, data, and instructions for reproducing the experiments is available.
Open Datasets	Yes	The flow cytometry data of Sachs et al. (2005) is a biological dataset designed to aid the discovery of protein signaling networks. [...] We generate semi-synthetic datasets using the MUNIN (subnetwork #1) DAG, a large medical diagnostic network from the Bayesian Network Repository (https://www.bnlearn.com/bnrepository).
Dataset Splits	Yes	We choose the specific value of λ using a separate validation set of size 0.1n . [...] Table 1: Performance on the flow cytometry dataset of Sachs et al. (2005). The averages and standard errors are measured over 10 splits of the data.
Hardware Specification	Yes	The experiments are run on a Linux workstation with an AMD Ryzen Threadripper PRO 5995WX CPU, 256GB RAM, and 2 NVIDIA Ge Force RTX 4090 GPUs.
Software Dependencies	Yes	Our implementation of Pro DAG uses the machine learning library Flux (Innes et al., 2018) and performs projections using parallel GPU (batched CUDA) implementations of the projection algorithms. For the benchmark methods, we use the respective authors open-source Python implementations: DAGMA: https://github.com/kevinsbello/dagma, v1.1.0, Apache License 2.0; Di BS and Di BS+: https://github.com/larslorch/dibs, v1.3.3, MIT License; Bayes DAG: https://github.com/microsoft/Project-Bayes DAG, v0.1.0, MIT License; and BOSS: https://github.com/cmu-phil/tetrad, v7.6.6, GNU General Public License v2.0.
Experiment Setup	Yes	We use the Adam optimizer (Kingma and Ba, 2015) with a learning rate of 0.1 to minimize the variational objective function. The posterior variances are held positive during optimization using a softplus transform. At each Adam iteration, 100 samples of W are drawn from the posterior to estimate the objective function. To project these sampled W and obtain W, we solve the projection using gradient descent with learning rates of 1/p and 0.25/p for the linear and nonlinear settings, respectively.