Post-Contextual-Bandit Inference

Authors: Aurelien Bibaut, Maria Dimakopoulou, Nathan Kallus, Antoine Chambaz, Mark van der Laan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical experiments using 57 Open ML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage.
Researcher Affiliation Collaboration Aur elien Bibaut Netflix abibaut@netflix.com Maria Dimakopoulou Netflix mdimakopoulou@netflix.com Nathan Kallus Cornell University and Netflix kallus@cornell.edu Antoine Chambaz Universit e de Paris antoine.chambaz@u-paris.fr Mark van der Laan University of California, Berkeley laan@stat.berkeley.edu
Pseudocode Yes Algorithm 1 The CADR Estimator and Confidence Interval
Open Source Code Yes 1The code can be found at https://github.com/mdimakopoulou/post-contextual-bandit-inference.
Open Datasets Yes We use the public Open ML Curated Classification benchmarking suite 2018 (Open ML-CC18; BSD 3-Clause license) [Bischl et al., 2017]
Dataset Splits Yes 8 different training procedures (sequential cross-fitting vs. cross-time cross-fitting in Figures 3 and 3; misspecified vs. well-specified outcome model family in Figures 4 and 5; weighted vs. unweighted outcome model fitting in Figures 6 and 7; large data vs. small data in Figures 3 and 8).
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'linear regression or decision-tree regression, both using default sklearn parameters' but does not specify version numbers for `sklearn` or any other software dependencies.
Experiment Setup Yes To generate our data, we set T = 10000 and use the following ϵ-greedy procedure. We pull arms uniformly at random until each arm has been pulled at least once. Then at each subsequent round t, we fit b Qt 1 using the data up to that time in the same fashion as used for the DM estimator above using decision-tree regressions. We set Ax(t) = arg maxa=1,...,K b Qt 1(a, X(t)) and ϵt = 0.01 t 1/3. We then let gt(a | x) = ϵt/K for a = Ax(t) and gt( Ax(t) | x) = 1 ϵt + ϵt/K.