Minimal Exploration in Structured Stochastic Bandits
Authors: Richard Combes, Stefan Magureanu, Alexandre Proutiere
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the efficiency of OSSB using numerical experiments in the case of the linear bandit problem and show that OSSB outperforms existing algorithms, including Thompson sampling. |
| Researcher Affiliation | Academia | Richard Combes Centrale-Supelec / L2S richard.combes@supelec.fr Stefan Magureanu KTH, EE School / ACL magur@kth.se Alexandre Proutiere KTH, EE School / ACL alepro@kth.se |
| Pseudocode | Yes | Algorithm 1 OSSB(ε,γ) |
| Open Source Code | No | The paper does not include any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes a synthetic experimental setup where parameters were generated uniformly at random, rather than using a pre-existing, publicly available dataset with concrete access information (e.g., URL, DOI, specific citation to an established benchmark). |
| Dataset Splits | No | The paper describes numerical experiments using synthetically generated parameters and mentions averaging over multiple trials, but it does not specify explicit training, validation, or test dataset splits, or cross-validation methods. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions baselines (e.g., Thompson Sampling, GLM-UCB) but does not provide specific version numbers for any software, libraries, or dependencies used in the experiments. |
| Experiment Setup | Yes | In our implementation of OSSB, we use γ = ε = 0 since γ is typically chosen 0 in the literature (see [18]) and the performance of the algorithm does not appear sensitive to the choice of ε. As baselines we select the extension of Thompson Sampling presented in [4](using vt = R p 0.5dln(t/δ), we chose δ = 0.1, R = 1), GLM-UCB (using ρ(t) = p 0.5ln(t)), an extension of UCB [16] and the algorithm presented in [31]. |