Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
Authors: Chicheng Zhang, Alekh Agarwal, Hal Daumé Iii, John Langford, Sahand Negahban
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate some of these algorithms on a large selection of datasets, showing that our approach is both feasible, and helpful in practice. |
| Researcher Affiliation | Collaboration | 1Microsoft Research 2University of Maryland 3Yale University. |
| Pseudocode | Yes | Algorithm 1: Adaptive Reweighting for Robustly Warmstarting Contextual Bandits (ARROW-CB) |
| Open Source Code | No | No explicit statement or link providing concrete access to the source code for the methodology described in this paper was found. |
| Open Datasets | Yes | We compare these approaches on 524 binary and multiclass classification datasets from Bietti et al. (2018), which in turn are from openml.org. |
| Dataset Splits | Yes | Partition S to E+1 equally sized sets Str, Sval 1 , . . . , Sval E . ... where a separate validation set is used to pick λ. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments were provided. |
| Software Dependencies | No | No specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment were provided. |
| Experiment Setup | Yes | All the algorithms (other than SUP-ONLY and MAJORITY, which do not explore) use ϵ-greedy exploration, with most of the results presented using ϵ = 0.0125. We additionally present the results for ϵ = 0.1 and ϵ = 0.0625 in Appendix J. ... We vary the number of warm-start examples ns in {0.005n, 0.01n, 0.02n, 0.04n}, and the number of CB examples nb in {0.92n, 0.46n, 0.23n, 0.115n}. |