Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adaptive Data-Borrowing for Improving Treatment Effect Estimation using External Controls

Authors: Qinwei Yang, Jingyi Li, Peng Wu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive simulations and real-world applications demonstrate that the proposed approach significantly enhances treatment effect estimation efficiency in RCTs, outperforming existing approaches. ... 8 Experiments To demonstrate the effectiveness of the proposed approaches, we conduct experiments on three datasets, including two synthetic datasets and a real-world dataset.
Researcher Affiliation	Academia	Qinwei Yang1, Jingyi Li2, and Peng Wu1 Beijing Technology and Business University1, National University of Singapore2
Pseudocode	No	The paper describes the proposed adaptive data-borrowing method through two main steps outlined in Section 5, but these steps are presented as descriptive text rather than structured pseudocode or an algorithm block.
Open Source Code	Yes	We provide the supplemental material for datasets and codes in a zip file to ensure easy reproduction of all reported results.
Open Datasets	Yes	In addition to synthetic datasets, we also utilize two real-world datasets from the National Supported Work (NSW) program (RCTs) [61] and the Population Survey of Income Dynamics (PSID) (external controls) [62]2. ... 2This data is available at https://users.nber.org/~rdehejia/nswdata2.html
Dataset Splits	Yes	The synthetic datasets consist of NE = 400 samples with Nt = 300 in the treated group and Nc = 100 in the control group, while the external control dataset includes NO = 800 samples. ... The NSW dataset consists of NE = 345 samples with Nt = 185 in the treated group and 260 in the control group, where we randomly selected Nc = 80 samples as the control group, while the PSID dataset includes NO = 123 samples.
Hardware Specification	No	Justification: All experimental results can be easily reproduced on a personal computer.
Software Dependencies	No	For real-world data, we train the outcome model of the RCT data or external control through Multi Layer Perceptron (MLP). In the estimation phase, the outcome regression ˆµ0, ˆµ1, and ˆm0, ˆµ0,O are modeled using a Multi-Layer Perceptron (MLP), while both the propensity score and selection score are estimated via logistic regression. ... Optimizer Adam.
Experiment Setup	Yes	Table A1: Implementation Details Hyperparameter ˆµ0 ˆµ1 ˆm0 ˆµ0,O Learning rate 0.0005 Batch size 32 Architecture 1 hidden layers [16] 2 hidden layers [16,8] 2 hidden layers [16,8] 2 hidden layers [16,8] Optimizer Adam Early stopping patience 20 Activation function (all layers) Re LU