Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Post-Contextual-Bandit Inference
Authors: Aurelien Bibaut, Maria Dimakopoulou, Nathan Kallus, Antoine Chambaz, Mark van der Laan
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical experiments using 57 Open ML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage. |
| Researcher Affiliation | Collaboration | Aur elien Bibaut Netflix EMAIL Maria Dimakopoulou Netflix EMAIL Nathan Kallus Cornell University and Netflix EMAIL Antoine Chambaz Universit e de Paris EMAIL Mark van der Laan University of California, Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 The CADR Estimator and Confidence Interval |
| Open Source Code | Yes | 1The code can be found at https://github.com/mdimakopoulou/post-contextual-bandit-inference. |
| Open Datasets | Yes | We use the public Open ML Curated Classification benchmarking suite 2018 (Open ML-CC18; BSD 3-Clause license) [Bischl et al., 2017] |
| Dataset Splits | Yes | 8 different training procedures (sequential cross-fitting vs. cross-time cross-fitting in Figures 3 and 3; misspecified vs. well-specified outcome model family in Figures 4 and 5; weighted vs. unweighted outcome model fitting in Figures 6 and 7; large data vs. small data in Figures 3 and 8). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'linear regression or decision-tree regression, both using default sklearn parameters' but does not specify version numbers for `sklearn` or any other software dependencies. |
| Experiment Setup | Yes | To generate our data, we set T = 10000 and use the following ϵ-greedy procedure. We pull arms uniformly at random until each arm has been pulled at least once. Then at each subsequent round t, we fit b Qt 1 using the data up to that time in the same fashion as used for the DM estimator above using decision-tree regressions. We set Ax(t) = arg maxa=1,...,K b Qt 1(a, X(t)) and ϵt = 0.01 t 1/3. We then let gt(a | x) = ϵt/K for a = Ax(t) and gt( Ax(t) | x) = 1 ϵt + ϵt/K. |