Epsilon-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits
Authors: Sivan Sabato
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study ϵ-best-arm identification, in a setting where during the exploration phase, the cost of each arm pull is proportional to the expected future reward of that arm. We term this setting Pay-Per-Reward. We provide an algorithm for this setting, that with a high probability returns an ϵ-best arm, while incurring a cost that depends only linearly on the total expected reward of all arms, and does not depend at all on the number of arms. Under mild assumptions, the algorithm can be applied also to problems with infinitely many arms. |
| Researcher Affiliation | Academia | Sivan Sabato Department of Computer Science Ben-Gurion University of the Negev Beer-Sheva, Israel 8410501 sabatos@cs.bgu.ac.il |
| Pseudocode | Yes | Algorithm 1 MAB-PPR: ϵ-Best-Arm-Identification with Pay-Per-Reward |
| Open Source Code | No | The paper does not provide any statements about open-sourcing code or links to a code repository. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on datasets, thus no dataset availability information for training is provided. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets, therefore no dataset split information (training, validation, test) is provided. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and describes an algorithm but does not mention any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper defines 'universal constants' for its theoretical algorithm (MAB-PPR) and discusses how they are used in the analysis, but it does not describe an empirical experimental setup with concrete hyperparameter values or system-level training settings. |