Pure Exploration with Multiple Correct Answers

Authors: Rémy Degenne, Wouter M. Koolen

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensure that the existing Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.
Researcher Affiliation Academia Rémy Degenne Centrum Wiskunde & Informatica Science Park 123, Amsterdam, NL remy.degenne@cwi.nl Wouter M. Koolen Centrum Wiskunde & Informatica Science Park 123, Amsterdam, NL wmkoolen@cwi.nl
Pseudocode Yes Algorithm 1 Sticky Track-and-Stop. Input: δ > 0, strict total order on I.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not mention specific datasets or their public availability for training.
Dataset Splits No The paper is theoretical and does not mention dataset splits for training, validation, or testing.
Hardware Specification No The paper does not provide any specific hardware details used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper focuses on theoretical algorithm design and analysis, and does not provide specific experimental setup details like hyperparameter values or training configurations.