Bounded Regret for Finite-Armed Structured Bandits

Authors: Tor Lattimore, Remi Munos

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested Algorithm 1 on a selection of structured bandits depicted in Figure 2 and compared to UCB [6, 8]. Rewards were sampled from normal distributions with unit variances. For UCB we chose α = 2, while we used the theoretically justified α = 4 for Algorithm 1. All code is available in the supplementary material. Each data-point is the average of 500 independent samples with the blue crosses and red squares indicating the regret of UCB-S and UCB respectively.
Researcher Affiliation Collaboration Tor Lattimore Department of Computing Science University of Alberta, Canada tlattimo@ualberta.ca R emi Munos INRIA Lille, France1 remi.munos@inria.fr 1Current affiliation: Google Deep Mind.
Pseudocode Yes Algorithm 1 UCB-S
Open Source Code Yes All code is available in the supplementary material.
Open Datasets No The paper generates synthetic data for its experiments ('Rewards were sampled from normal distributions with unit variances') rather than using a publicly available or open dataset with concrete access information.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology). The experiments are conducted in a multi-armed bandit setting where rewards are sampled sequentially, and the concept of static train/validation/test splits does not apply.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For UCB we chose α = 2, while we used the theoretically justified α = 4 for Algorithm 1. Each data-point is the average of 500 independent samples with the blue crosses and red squares indicating the regret of UCB-S and UCB respectively.