Bounded Regret for Finite-Armed Structured Bandits
Authors: Tor Lattimore, Remi Munos
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested Algorithm 1 on a selection of structured bandits depicted in Figure 2 and compared to UCB [6, 8]. Rewards were sampled from normal distributions with unit variances. For UCB we chose α = 2, while we used the theoretically justified α = 4 for Algorithm 1. All code is available in the supplementary material. Each data-point is the average of 500 independent samples with the blue crosses and red squares indicating the regret of UCB-S and UCB respectively. |
| Researcher Affiliation | Collaboration | Tor Lattimore Department of Computing Science University of Alberta, Canada tlattimo@ualberta.ca R emi Munos INRIA Lille, France1 remi.munos@inria.fr 1Current affiliation: Google Deep Mind. |
| Pseudocode | Yes | Algorithm 1 UCB-S |
| Open Source Code | Yes | All code is available in the supplementary material. |
| Open Datasets | No | The paper generates synthetic data for its experiments ('Rewards were sampled from normal distributions with unit variances') rather than using a publicly available or open dataset with concrete access information. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology). The experiments are conducted in a multi-armed bandit setting where rewards are sampled sequentially, and the concept of static train/validation/test splits does not apply. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | For UCB we chose α = 2, while we used the theoretically justified α = 4 for Algorithm 1. Each data-point is the average of 500 independent samples with the blue crosses and red squares indicating the regret of UCB-S and UCB respectively. |