Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Authors: Richard Combes, Alexandre Proutiere

ICML 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The analytical results are supported by numerical experiments showing that OSUB performs significantly better than the state-of-the-art algorithms. We conduct numerical experiments and show that OSUB significantly outperforms LSE and other classi-cal bandit algorithms when the number of arms is much smaller than the time horizon.
Researcher Affiliation Academia Richard Combes EMAIL KTH, Royal Institute of technology, Stockholm, Sweden Alexandre Proutiere EMAIL KTH, Royal Institute of technology, Stockholm, Sweden
Pseudocode Yes Algorithm OSUB Input: graph G = (V, E) For n 1, select the arm k(n) where: L(n) if l L(n)(n) 1 γ+1 N, arg max k N(L(n)) bk(n) otherwise,
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper describes generating synthetic rewards for experiments (e.g., ‘K = 17 arms with Bernoulli rewards’), but does not use or provide concrete access information for a publicly available or open dataset.
Dataset Splits No The paper concerns multi-armed bandits, an online learning setting, and does not discuss or provide specific dataset split information (like train/validation/test percentages or sample counts) typically found in batch learning experiment setups.
Hardware Specification No The paper does not provide specific hardware details (like exact GPU/CPU models or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes In our first experiment, we consider K = 17 arms with Bernoulli rewards of respective averages µ = (0.1, 0.2, ..., 0.9, 0.8, ..., 0.1). The parameters in LSE algorithm are chosen as suggested in Proposition 4.5 (Yu & Mannor, 2011). There are K = 10 arms whose expected rewards form a moving triangle: for k = 1, ..., K, µk(n) = (K 1)/K |w(n) k|/K, where w(n) = 1+(K 1)(1+sin(nσ))/2. The window size are set as follows for the various algorithms: τ = σ 4/5 for SW-UCB and SW-KL-UCB (the rationale for this choice is explained in (Combes & Proutiere, 2014)), τ = σ 3/4 log(1/σ)/8 for SW-UCB-U and OSUB.