Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Adaptive Experimentation with Noncompliance

Authors: Miruna Oprescu, Brian Cho, Nathan Kallus

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the practical effectiveness of our approach in both synthetic and semi-synthetic studies.
Researcher Affiliation Academia Miruna Oprescu Cornell University EMAIL Brian M Cho Cornell University EMAIL Nathan Kallus Cornell University & Netflix EMAIL
Pseudocode Yes Algorithm 1 AMRIV: Adaptive Multiply Robust IV Estimation
Open Source Code Yes The replication code is available at https://github.com/Causal ML/Adaptive-IV
Open Datasets Yes We also evaluate AMRIV on a semi-synthetic dataset based on the Trip Advisor customer simulator from Syrgkanis et al. [52]
Dataset Splits Yes We set T = 2000, T0 = 200, and run 1000 trajectories.
Hardware Specification Yes All experiments were run on a Perlmutter compute node with 256 CPU cores at the National Energy Research Scientific Computing Center (NERSC) [37]
Software Dependencies No Random Forests were implemented using scikit-learn [44]
Experiment Setup Yes Each estimator was evaluated on 1000 independent synthetic trials. Simulations were run over T = 2000 rounds with a T0 = 200 burn-in period, and nuisance estimators were updated in mini-batches of 200. For all adaptive methods, we applied the truncated optimal allocation policy from Eq. (7), with a truncation schedule kt = 2/0.999t. Oracle methods used ground-truth nuisance functions, while misspecified estimators were constructed by replacing ยตY (1, X) with a constant regressor fit to the average oracle value. Unless otherwise stated, outcome and residual variance functions were modeled via Random Forests with 100 trees, maximum depth 5, and minimum leaf size 5. The compliance model ยตA(1, X) was learned with a shallower forest (depth 3, minimum leaf size 30), and ยตA(0, X) was zero by construction due to one-sided noncompliance. For the A2IPW estimator, we followed Kato et al. [27] and estimated outcome means and second moments using random forests (depth 5, leaf size 100) and used a Neyman-style allocation based on observed outcomes.