Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Authors: Richard Combes, Alexandre Proutiere

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The analytical results are supported by numerical experiments showing that OSUB performs significantly better than the state-of-the-art algorithms. We conduct numerical experiments and show that OSUB significantly outperforms LSE and other classi-cal bandit algorithms when the number of arms is much smaller than the time horizon.
Researcher Affiliation Academia Richard Combes RCOMBES@KTH.SE KTH, Royal Institute of technology, Stockholm, Sweden Alexandre Proutiere ALEPRO@KTH.SE KTH, Royal Institute of technology, Stockholm, Sweden
Pseudocode Yes Algorithm OSUB Input: graph G = (V, E) For n 1, select the arm k(n) where: L(n) if l L(n)(n) 1 γ+1 N, arg max k N(L(n)) bk(n) otherwise,
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper describes generating synthetic rewards for experiments (e.g., ‘K = 17 arms with Bernoulli rewards’), but does not use or provide concrete access information for a publicly available or open dataset.
Dataset Splits No The paper concerns multi-armed bandits, an online learning setting, and does not discuss or provide specific dataset split information (like train/validation/test percentages or sample counts) typically found in batch learning experiment setups.
Hardware Specification No The paper does not provide specific hardware details (like exact GPU/CPU models or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes In our first experiment, we consider K = 17 arms with Bernoulli rewards of respective averages µ = (0.1, 0.2, ..., 0.9, 0.8, ..., 0.1). The parameters in LSE algorithm are chosen as suggested in Proposition 4.5 (Yu & Mannor, 2011). There are K = 10 arms whose expected rewards form a moving triangle: for k = 1, ..., K, µk(n) = (K 1)/K |w(n) k|/K, where w(n) = 1+(K 1)(1+sin(nσ))/2. The window size are set as follows for the various algorithms: τ = σ 4/5 for SW-UCB and SW-KL-UCB (the rationale for this choice is explained in (Combes & Proutiere, 2014)), τ = σ 3/4 log(1/σ)/8 for SW-UCB-U and OSUB.