Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Authors: Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad Ghavamzadeh
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Giro and its contextual variant on multiple synthetic and real-world problems, and observe that it performs well. ... We conduct two experiments. In Section 6.1, we evaluate Giro on multi-armed bandit problems. In Section 6.2, we evaluate Giro in the contextual bandit setting. |
| Researcher Affiliation | Collaboration | 1Google Research 2Deep Mind 3University of Alberta 4Mila, University of Montreal 5Adobe Research 6Facebook AI Research. |
| Pseudocode | Yes | Algorithm 1 General randomized exploration. ... Algorithm 2 Giro with [0, 1] rewards. ... Algorithm 3 Contextual Giro with [0, 1] rewards. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions using 'public optimization libraries' like scikit-learn and Keras, but this refers to third-party tools, not their own implementation of Giro. |
| Open Datasets | Yes | We use three datasets from Riquelme et al. (2018): Adult (d = 94, K = 14), Statlog (d = 9, K = 7), and Cov Type (d = 54, K = 7). |
| Dataset Splits | No | The paper describes online learning settings and horizons (e.g., 'The horizon is n = 50k rounds'), but it does not specify explicit train/validation/test dataset splits for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models, or memory specifications. |
| Software Dependencies | No | For linear and logistic models, we use scikit-learn (Pedregosa et al., 2011) with stochastic optimization and its default settings. For neural networks, we use Keras (Chollet et al., 2015) with a Re LU hidden layer and a sigmoid output layer, along with SGD and its default settings. While libraries are mentioned, specific version numbers are not provided. |
| Experiment Setup | Yes | We run Giro with three different values of a: 1, 1/3, and 1/10. ... The number of arms is K = 10 and their means are chosen uniformly at random from [0.25, 0.75]. The horizon is n = 10k rounds. ... The horizon is n = 50k rounds... In Giro, a = 1 in all experiments. ... The best schedule across all datasets was ϵt = b/t, where b is set to attain 1% exploration in n rounds. ... In linear and logistic models, we optimize until the error drops below 10 3. In neural networks, we make one pass over the whole history. |