On Explore-Then-Commit strategies

Authors: Aurelien Garivier, Tor Lattimore, Emilie Kaufmann

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore we provide empirical evidence that the theory also holds in practice and discuss extensions to non-gaussian and multiple-armed case. Numerical experiments illustrate and empirically support our results in Section 5.
Researcher Affiliation Academia Aurélien Garivier Institut de Mathématiques de Toulouse; UMR5219 Université de Toulouse; CNRS UPS IMT, F-31062 Toulouse Cedex 9, France aurelien.garivier@math.univ-toulouse.fr Emilie Kaufmann Univ. Lille, CNRS, Centrale Lille, Inria Seque L UMR 9189, CRISt AL Centre de Recherche en Informatique Signal et Automatique de Lille F-59000 Lille, France emilie.kaufmann@univ-lille1.fr Tor Lattimore University of Alberta 116 St & 85 Ave, Edmonton, AB T6G 2R3, Canada tor.lattimore@gmail.com
Pseudocode Yes Algorithm 1: FB-ETC algorithm; Algorithm 2: SPRT ETC algorithm; Algorithm 3: BAI-ETC algorithm; Algorithm 4: -UCB; Algorithm 5: UCB
Open Source Code No The paper does not contain an unambiguous statement or link to open-source code for the methodology described.
Open Datasets No The paper does not mention using any publicly available dataset or provide links/citations for data access. It performs numerical experiments with a simulated 'bandit problem'.
Dataset Splits No The paper describes '4.105 Monte-Carlo replications' for estimating regret but does not provide specific train/validation/test dataset splits. The experiments appear to be numerical simulations rather than based on a distinct dataset with such splits.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We represent here the regret of the five strategies presented in this article on a bandit problem with = 1/5, for different values of the horizon. The regret is estimated by 4.105 Monte-Carlo replications.