reproducibilityindex.ai

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Authors: Richard Combes, Alexandre Proutiere

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The analytical results are supported by numerical experiments showing that OSUB performs signiﬁcantly better than the state-of-the-art algorithms. We conduct numerical experiments and show that OSUB signiﬁcantly outperforms LSE and other classi-cal bandit algorithms when the number of arms is much smaller than the time horizon.
Researcher Affiliation	Academia	Richard Combes RCOMBES@KTH.SE KTH, Royal Institute of technology, Stockholm, Sweden Alexandre Proutiere ALEPRO@KTH.SE KTH, Royal Institute of technology, Stockholm, Sweden
Pseudocode	Yes	Algorithm OSUB Input: graph G = (V, E) For n 1, select the arm k(n) where: L(n) if l L(n)(n) 1 γ+1 N, arg max k N(L(n)) bk(n) otherwise,
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper describes generating synthetic rewards for experiments (e.g., ‘K = 17 arms with Bernoulli rewards’), but does not use or provide concrete access information for a publicly available or open dataset.
Dataset Splits	No	The paper concerns multi-armed bandits, an online learning setting, and does not discuss or provide specific dataset split information (like train/validation/test percentages or sample counts) typically found in batch learning experiment setups.
Hardware Specification	No	The paper does not provide specific hardware details (like exact GPU/CPU models or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	In our ﬁrst experiment, we consider K = 17 arms with Bernoulli rewards of respective averages µ = (0.1, 0.2, ..., 0.9, 0.8, ..., 0.1). The parameters in LSE algorithm are chosen as suggested in Proposition 4.5 (Yu & Mannor, 2011). There are K = 10 arms whose expected rewards form a moving triangle: for k = 1, ..., K, µk(n) = (K 1)/K \|w(n) k\|/K, where w(n) = 1+(K 1)(1+sin(nσ))/2. The window size are set as follows for the various algorithms: τ = σ 4/5 for SW-UCB and SW-KL-UCB (the rationale for this choice is explained in (Combes & Proutiere, 2014)), τ = σ 3/4 log(1/σ)/8 for SW-UCB-U and OSUB.