On Explore-Then-Commit strategies
Authors: Aurelien Garivier, Tor Lattimore, Emilie Kaufmann
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case. Numerical experiments illustrate and empirically support our results in Section 5. |
| Researcher Affiliation | Academia | Aurélien Garivier Institut de Mathématiques de Toulouse; UMR5219 Université de Toulouse; CNRS UPS IMT, F-31062 Toulouse Cedex 9, France aurelien.garivier@math.univ-toulouse.fr Emilie Kaufmann Univ. Lille, CNRS, Centrale Lille, Inria Seque L UMR 9189, CRISt AL Centre de Recherche en Informatique Signal et Automatique de Lille F-59000 Lille, France emilie.kaufmann@univ-lille1.fr Tor Lattimore University of Alberta 116 St & 85 Ave, Edmonton, AB T6G 2R3, Canada tor.lattimore@gmail.com |
| Pseudocode | Yes | Algorithm 1: FB-ETC algorithm; Algorithm 2: SPRT ETC algorithm; Algorithm 3: BAI-ETC algorithm; Algorithm 4: -UCB; Algorithm 5: UCB |
| Open Source Code | No | The paper does not contain an unambiguous statement or link to open-source code for the methodology described. |
| Open Datasets | No | The paper does not mention using any publicly available dataset or provide links/citations for data access. It performs numerical experiments with a simulated 'bandit problem'. |
| Dataset Splits | No | The paper describes '4.105 Monte-Carlo replications' for estimating regret but does not provide specific train/validation/test dataset splits. The experiments appear to be numerical simulations rather than based on a distinct dataset with such splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We represent here the regret of the five strategies presented in this article on a bandit problem with = 1/5, for different values of the horizon. The regret is estimated by 4.105 Monte-Carlo replications. |