Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Data Fusion for Partial Identification of Causal Effects

Authors: Quinn Lanners, Cynthia Rudin, Alexander Volfovsky, Harsh Parikh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our framework to the Project STAR study, which investigates the effect of classroom size on students third-grade standardized test performance. Our analysis reveals that the Project STAR results are robust to simultaneous violations of key assumptions, both on average and across various subgroups of interest. Section 6 is titled "Experimental Results" and includes "Simulation Study" and an application to "Project STAR" data.
Researcher Affiliation	Academia	Quinn Lanners Duke University EMAIL, Cynthia Rudin Duke University EMAIL, Alexander Volfovsky Duke University EMAIL, Harsh Parikh Yale University EMAIL. All listed affiliations are universities with .edu email domains.
Pseudocode	Yes	In this section, we include algorithms for the various components of our paper. We begin with the bias corrected estimator from Section 5, followed be procedures for constructing breakdown frontier plots, like those in Section 6, and for estimating the (in)compatibility of parameter pairs, described in Appendix D. For each algorithm, we also discuss hyperparameter choice considerations and note relevant computational considerations. Algorithm 1 outlines the cross-fitting procedure used to implement the bias-corrected estimators defined in Section 5. Algorithm 2 outlines the procedure for constructing a breakdown frontier plot like those in Section 6. Finally, we include include Algorithm 3 to outline the steps of the (in)compatibility test.
Open Source Code	Yes	Relevant source code to implement our algorithm and replicate these results can be found in the accompanying Git Hub repository.1 [Footnote 1: https://github.com/harsh-parikh/Partial-Identification-Data-Fusion]
Open Datasets	Yes	We apply our framework to the Project STAR study, which investigates the effect of classroom size on students third-grade standardized test performance... The Project STAR study and dataset are properly cited (Mosteller, 1995; Achilles et al., 2008) and no license is needed to use.
Dataset Splits	Yes	Algorithm 1 outlines the cross-fitting procedure used to implement the bias-corrected estimators defined in Section 5. ... Split data into k folds {Ej}k j=1, where Ej is the set of indices for the samples in the hold-out set for fold j, and Tj = {1, . . . , n} \ Ej is the corresponding training set indices. ... For all breakdown frontier plots based on simulated (Section 6.1), we use k = 2 and R = 100. For plots based on the Project STAR data (Section 6.2), we use k = 5 and R = 1000.
Hardware Specification	Yes	All experiments were run on a Slurm-managed cluster using VMware virtual machines, each equipped with an Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz. No GPU or specialized hardware was used.
Software Dependencies	No	Models used to estimate the nuisance functions: ˆg(j) s (x): sklearn.linear_model.Logistic Regression CV(n_jobs=1) ˆe(j) t (x, s): sklearn.linear_model.Logistic Regression CV(n_jobs=1) ˆµ(j)(x, s, t): sklearn.linear_model.Ridge CV(). The paper lists software components but does not provide specific version numbers for them.
Experiment Setup	Yes	Grid of (ρ, γ): We construct a grid over the (ρ, γ) parameter space by taking all pairwise combinations of values sampled uniformly from the intervals [0, 0.2]. Specifically, we define: γ linspace(0, 0.2, 50), ρ linspace(0, 0.2, 50)... Confidence level c: 0.95... Boltzmann smoothing parameter α: 10... For all breakdown frontier plots based on simulated (Section 6.1), we use k = 2 and R = 100. For plots based on the Project STAR data (Section 6.2), we use k = 5 and R = 1000.