Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Coupling Generative Modeling and an Autoencoder with the Causal Bridge

Authors: Ruolin Meng, Ming-Yu Chung, Dhanajit Brahma, Ricardo Henao, Lawrence Carin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic and real-world data demonstrate the effectiveness of the proposed approach relative to state-of-the-art methodology for causal inference with proxy measurements. We validate our framework through experiments on synthetic and real-world datasets (Section 7), including comparison with a randomized control trial (RCT).
Researcher Affiliation	Academia	Duke University EMAIL
Pseudocode	No	The paper describes methods textually and mathematically, including equations and figures, but does not contain any structured pseudocode or algorithm blocks. There are no sections explicitly labeled 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	The source code used here can be found at https://github.com/ ruolinmeng/Causal Bridge Auto Encoder.
Open Datasets	Yes	The second experiment considers the d Sprite data introduced by [29]. ... URL https://github. com/deepmind/dsprites-dataset, page 27, 2020. ... Framingham is an observational longitudinal study... These data are in the public domain, and therefore these experiments can be replicated.
Dataset Splits	Yes	The data used here are the Offspring cohort, which consists of 3435 subjects split into 2404, 516, and 515 training, validation, and test samples, respectively. ... In both synthetic data experiments, we use k-fold cross-validation (k = 5) to select the learning rate from the range [10-3, 10-4, 10-5]. ... Once the hyperparameters are selected, the model is trained with a train/validation split of 0.8/0.2.
Hardware Specification	Yes	All models were developed using Py Torch, and each experiment can be executed in a few minutes on a Tesla V100 PCIe 16 GB GPU.
Software Dependencies	No	The paper mentions software like Py Torch and Py MC, but no specific version numbers are provided for these or any other libraries, which is required for a reproducible description of software dependencies.
Experiment Setup	Yes	In both synthetic data experiments, we use k-fold cross-validation (k = 5) to select the learning rate from the range [10-3, 10-4, 10-5]. ... For the demand experiment, we use the Adam W optimizer with a weight decay of 10-5 and a learning rate of 10-4 for all models. The autoencoder loss LθX is weighted with wx = 1, and LθZ is weighted with wz = 1. ... For the Framingham dataset, ... we use the Adam W optimizer with a weight decay of 10-3 and a learning rate of 10-1. The autoencoder loss LθX is weighted with wx = 0.5, and LθZ is weighted with wz = 0.1. The model architectures are detailed in Tables 2, 3, and 4.