Minimal Exploration in Structured Stochastic Bandits

Authors: Richard Combes, Stefan Magureanu, Alexandre Proutiere

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the efficiency of OSSB using numerical experiments in the case of the linear bandit problem and show that OSSB outperforms existing algorithms, including Thompson sampling.
Researcher Affiliation Academia Richard Combes Centrale-Supelec / L2S richard.combes@supelec.fr Stefan Magureanu KTH, EE School / ACL magur@kth.se Alexandre Proutiere KTH, EE School / ACL alepro@kth.se
Pseudocode Yes Algorithm 1 OSSB(ε,γ)
Open Source Code No The paper does not include any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes a synthetic experimental setup where parameters were generated uniformly at random, rather than using a pre-existing, publicly available dataset with concrete access information (e.g., URL, DOI, specific citation to an established benchmark).
Dataset Splits No The paper describes numerical experiments using synthetically generated parameters and mentions averaging over multiple trials, but it does not specify explicit training, validation, or test dataset splits, or cross-validation methods.
Hardware Specification No The paper does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing specifications used for running the experiments.
Software Dependencies No The paper mentions baselines (e.g., Thompson Sampling, GLM-UCB) but does not provide specific version numbers for any software, libraries, or dependencies used in the experiments.
Experiment Setup Yes In our implementation of OSSB, we use γ = ε = 0 since γ is typically chosen 0 in the literature (see [18]) and the performance of the algorithm does not appear sensitive to the choice of ε. As baselines we select the extension of Thompson Sampling presented in [4](using vt = R p 0.5dln(t/δ), we chose δ = 0.1, R = 1), GLM-UCB (using ρ(t) = p 0.5ln(t)), an extension of UCB [16] and the algorithm presented in [31].