reproducibilityindex.ai

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

Authors: Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad Ghavamzadeh

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Giro and its contextual variant on multiple synthetic and real-world problems, and observe that it performs well. ... We conduct two experiments. In Section 6.1, we evaluate Giro on multi-armed bandit problems. In Section 6.2, we evaluate Giro in the contextual bandit setting.
Researcher Affiliation	Collaboration	1Google Research 2Deep Mind 3University of Alberta 4Mila, University of Montreal 5Adobe Research 6Facebook AI Research.
Pseudocode	Yes	Algorithm 1 General randomized exploration. ... Algorithm 2 Giro with [0, 1] rewards. ... Algorithm 3 Contextual Giro with [0, 1] rewards.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions using 'public optimization libraries' like scikit-learn and Keras, but this refers to third-party tools, not their own implementation of Giro.
Open Datasets	Yes	We use three datasets from Riquelme et al. (2018): Adult (d = 94, K = 14), Statlog (d = 9, K = 7), and Cov Type (d = 54, K = 7).
Dataset Splits	No	The paper describes online learning settings and horizons (e.g., 'The horizon is n = 50k rounds'), but it does not specify explicit train/validation/test dataset splits for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models, or memory specifications.
Software Dependencies	No	For linear and logistic models, we use scikit-learn (Pedregosa et al., 2011) with stochastic optimization and its default settings. For neural networks, we use Keras (Chollet et al., 2015) with a Re LU hidden layer and a sigmoid output layer, along with SGD and its default settings. While libraries are mentioned, specific version numbers are not provided.
Experiment Setup	Yes	We run Giro with three different values of a: 1, 1/3, and 1/10. ... The number of arms is K = 10 and their means are chosen uniformly at random from [0.25, 0.75]. The horizon is n = 10k rounds. ... The horizon is n = 50k rounds... In Giro, a = 1 in all experiments. ... The best schedule across all datasets was ϵt = b/t, where b is set to attain 1% exploration in n rounds. ... In linear and logistic models, we optimize until the error drops below 10 3. In neural networks, we make one pass over the whole history.