Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
Authors: Richard Combes, Alexandre Proutiere
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The analytical results are supported by numerical experiments showing that OSUB performs significantly better than the state-of-the-art algorithms. We conduct numerical experiments and show that OSUB significantly outperforms LSE and other classi-cal bandit algorithms when the number of arms is much smaller than the time horizon. |
| Researcher Affiliation | Academia | Richard Combes RCOMBES@KTH.SE KTH, Royal Institute of technology, Stockholm, Sweden Alexandre Proutiere ALEPRO@KTH.SE KTH, Royal Institute of technology, Stockholm, Sweden |
| Pseudocode | Yes | Algorithm OSUB Input: graph G = (V, E) For n 1, select the arm k(n) where: L(n) if l L(n)(n) 1 γ+1 N, arg max k N(L(n)) bk(n) otherwise, |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper describes generating synthetic rewards for experiments (e.g., ‘K = 17 arms with Bernoulli rewards’), but does not use or provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper concerns multi-armed bandits, an online learning setting, and does not discuss or provide specific dataset split information (like train/validation/test percentages or sample counts) typically found in batch learning experiment setups. |
| Hardware Specification | No | The paper does not provide specific hardware details (like exact GPU/CPU models or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | In our first experiment, we consider K = 17 arms with Bernoulli rewards of respective averages µ = (0.1, 0.2, ..., 0.9, 0.8, ..., 0.1). The parameters in LSE algorithm are chosen as suggested in Proposition 4.5 (Yu & Mannor, 2011). There are K = 10 arms whose expected rewards form a moving triangle: for k = 1, ..., K, µk(n) = (K 1)/K |w(n) k|/K, where w(n) = 1+(K 1)(1+sin(nσ))/2. The window size are set as follows for the various algorithms: τ = σ 4/5 for SW-UCB and SW-KL-UCB (the rationale for this choice is explained in (Combes & Proutiere, 2014)), τ = σ 3/4 log(1/σ)/8 for SW-UCB-U and OSUB. |