Post-Contextual-Bandit Inference
Authors: Aurelien Bibaut, Maria Dimakopoulou, Nathan Kallus, Antoine Chambaz, Mark van der Laan
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical experiments using 57 Open ML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage. |
| Researcher Affiliation | Collaboration | Aur elien Bibaut Netflix abibaut@netflix.com Maria Dimakopoulou Netflix mdimakopoulou@netflix.com Nathan Kallus Cornell University and Netflix kallus@cornell.edu Antoine Chambaz Universit e de Paris antoine.chambaz@u-paris.fr Mark van der Laan University of California, Berkeley laan@stat.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 The CADR Estimator and Confidence Interval |
| Open Source Code | Yes | 1The code can be found at https://github.com/mdimakopoulou/post-contextual-bandit-inference. |
| Open Datasets | Yes | We use the public Open ML Curated Classification benchmarking suite 2018 (Open ML-CC18; BSD 3-Clause license) [Bischl et al., 2017] |
| Dataset Splits | Yes | 8 different training procedures (sequential cross-fitting vs. cross-time cross-fitting in Figures 3 and 3; misspecified vs. well-specified outcome model family in Figures 4 and 5; weighted vs. unweighted outcome model fitting in Figures 6 and 7; large data vs. small data in Figures 3 and 8). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'linear regression or decision-tree regression, both using default sklearn parameters' but does not specify version numbers for `sklearn` or any other software dependencies. |
| Experiment Setup | Yes | To generate our data, we set T = 10000 and use the following ϵ-greedy procedure. We pull arms uniformly at random until each arm has been pulled at least once. Then at each subsequent round t, we fit b Qt 1 using the data up to that time in the same fashion as used for the DM estimator above using decision-tree regressions. We set Ax(t) = arg maxa=1,...,K b Qt 1(a, X(t)) and ϵt = 0.01 t 1/3. We then let gt(a | x) = ϵt/K for a = Ax(t) and gt( Ax(t) | x) = 1 ϵt + ϵt/K. |