Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Data Fusion for Partial Identification of Causal Effects
Authors: Quinn Lanners, Cynthia Rudin, Alexander Volfovsky, Harsh Parikh
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our framework to the Project STAR study, which investigates the effect of classroom size on students third-grade standardized test performance. Our analysis reveals that the Project STAR results are robust to simultaneous violations of key assumptions, both on average and across various subgroups of interest. Section 6 is titled "Experimental Results" and includes "Simulation Study" and an application to "Project STAR" data. |
| Researcher Affiliation | Academia | Quinn Lanners Duke University EMAIL, Cynthia Rudin Duke University EMAIL, Alexander Volfovsky Duke University EMAIL, Harsh Parikh Yale University EMAIL. All listed affiliations are universities with .edu email domains. |
| Pseudocode | Yes | In this section, we include algorithms for the various components of our paper. We begin with the bias corrected estimator from Section 5, followed be procedures for constructing breakdown frontier plots, like those in Section 6, and for estimating the (in)compatibility of parameter pairs, described in Appendix D. For each algorithm, we also discuss hyperparameter choice considerations and note relevant computational considerations. Algorithm 1 outlines the cross-fitting procedure used to implement the bias-corrected estimators defined in Section 5. Algorithm 2 outlines the procedure for constructing a breakdown frontier plot like those in Section 6. Finally, we include include Algorithm 3 to outline the steps of the (in)compatibility test. |
| Open Source Code | Yes | Relevant source code to implement our algorithm and replicate these results can be found in the accompanying Git Hub repository.1 [Footnote 1: https://github.com/harsh-parikh/Partial-Identification-Data-Fusion] |
| Open Datasets | Yes | We apply our framework to the Project STAR study, which investigates the effect of classroom size on students third-grade standardized test performance... The Project STAR study and dataset are properly cited (Mosteller, 1995; Achilles et al., 2008) and no license is needed to use. |
| Dataset Splits | Yes | Algorithm 1 outlines the cross-fitting procedure used to implement the bias-corrected estimators defined in Section 5. ... Split data into k folds {Ej}k j=1, where Ej is the set of indices for the samples in the hold-out set for fold j, and Tj = {1, . . . , n} \ Ej is the corresponding training set indices. ... For all breakdown frontier plots based on simulated (Section 6.1), we use k = 2 and R = 100. For plots based on the Project STAR data (Section 6.2), we use k = 5 and R = 1000. |
| Hardware Specification | Yes | All experiments were run on a Slurm-managed cluster using VMware virtual machines, each equipped with an Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz. No GPU or specialized hardware was used. |
| Software Dependencies | No | Models used to estimate the nuisance functions: ˆg(j) s (x): sklearn.linear_model.Logistic Regression CV(n_jobs=1) ˆe(j) t (x, s): sklearn.linear_model.Logistic Regression CV(n_jobs=1) ˆµ(j)(x, s, t): sklearn.linear_model.Ridge CV(). The paper lists software components but does not provide specific version numbers for them. |
| Experiment Setup | Yes | Grid of (ρ, γ): We construct a grid over the (ρ, γ) parameter space by taking all pairwise combinations of values sampled uniformly from the intervals [0, 0.2]. Specifically, we define: γ linspace(0, 0.2, 50), ρ linspace(0, 0.2, 50)... Confidence level c: 0.95... Boltzmann smoothing parameter α: 10... For all breakdown frontier plots based on simulated (Section 6.1), we use k = 2 and R = 100. For plots based on the Project STAR data (Section 6.2), we use k = 5 and R = 1000. |