reproducibilityindex.ai

Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits

Authors: Sivan Sabato

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We study ϵ-best-arm identiﬁcation, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an ϵ-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with inﬁnitely many arms.
Researcher Affiliation	Academia	Sivan Sabato Department of Computer Science Ben-Gurion University of the Negev Beer-Sheva, Israel 8410501 sabatos@cs.bgu.ac.il
Pseudocode	Yes	Algorithm 1 MAB-PPR: ϵ-Best-Arm-Identiﬁcation with Pay-Per-Reward
Open Source Code	No	The paper does not provide any statements about open-sourcing code or links to a code repository.
Open Datasets	No	The paper is theoretical and does not conduct experiments on datasets, thus no dataset availability information for training is provided.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with datasets, therefore no dataset split information (training, validation, test) is provided.
Hardware Specification	No	The paper focuses on theoretical analysis and algorithm design and does not mention any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and describes an algorithm but does not mention any specific software dependencies with version numbers.
Experiment Setup	No	The paper defines 'universal constants' for its theoretical algorithm (MAB-PPR) and discusses how they are used in the analysis, but it does not describe an empirical experimental setup with concrete hyperparameter values or system-level training settings.