Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Adaptive Experimentation with Noncompliance

Authors: Miruna Oprescu, Brian Cho, Nathan Kallus

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the practical effectiveness of our approach in both synthetic and semi-synthetic studies.
Researcher Affiliation	Academia	Miruna Oprescu Cornell University EMAIL Brian M Cho Cornell University EMAIL Nathan Kallus Cornell University & Netflix EMAIL
Pseudocode	Yes	Algorithm 1 AMRIV: Adaptive Multiply Robust IV Estimation
Open Source Code	Yes	The replication code is available at https://github.com/Causal ML/Adaptive-IV
Open Datasets	Yes	We also evaluate AMRIV on a semi-synthetic dataset based on the Trip Advisor customer simulator from Syrgkanis et al. [52]
Dataset Splits	Yes	We set T = 2000, T0 = 200, and run 1000 trajectories.
Hardware Specification	Yes	All experiments were run on a Perlmutter compute node with 256 CPU cores at the National Energy Research Scientific Computing Center (NERSC) [37]
Software Dependencies	No	Random Forests were implemented using scikit-learn [44]
Experiment Setup	Yes	Each estimator was evaluated on 1000 independent synthetic trials. Simulations were run over T = 2000 rounds with a T0 = 200 burn-in period, and nuisance estimators were updated in mini-batches of 200. For all adaptive methods, we applied the truncated optimal allocation policy from Eq. (7), with a truncation schedule kt = 2/0.999t. Oracle methods used ground-truth nuisance functions, while misspecified estimators were constructed by replacing µY (1, X) with a constant regressor fit to the average oracle value. Unless otherwise stated, outcome and residual variance functions were modeled via Random Forests with 100 trees, maximum depth 5, and minimum leaf size 5. The compliance model µA(1, X) was learned with a shallower forest (depth 3, minimum leaf size 30), and µA(0, X) was zero by construction due to one-sided noncompliance. For the A2IPW estimator, we followed Kato et al. [27] and estimated outcome means and second moments using random forests (depth 5, leaf size 100) and used a Neyman-style allocation based on observed outcomes.