Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits

Authors: Sivan Sabato

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study ϵ-best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an ϵ-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms.
Researcher Affiliation Academia Sivan Sabato Department of Computer Science Ben-Gurion University of the Negev Beer-Sheva, Israel 8410501 sabatos@cs.bgu.ac.il
Pseudocode Yes Algorithm 1 MAB-PPR: ϵ-Best-Arm-Identification with Pay-Per-Reward
Open Source Code No The paper does not provide any statements about open-sourcing code or links to a code repository.
Open Datasets No The paper is theoretical and does not conduct experiments on datasets, thus no dataset availability information for training is provided.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with datasets, therefore no dataset split information (training, validation, test) is provided.
Hardware Specification No The paper focuses on theoretical analysis and algorithm design and does not mention any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and describes an algorithm but does not mention any specific software dependencies with version numbers.
Experiment Setup No The paper defines 'universal constants' for its theoretical algorithm (MAB-PPR) and discusses how they are used in the analysis, but it does not describe an empirical experimental setup with concrete hyperparameter values or system-level training settings.