On Regret with Multiple Best Arms
Authors: Yinglun Zhu, Robert Nowak
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results confirm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art. We conduct three experiments to compare our algorithms with baselines. In Section 6.1, we compare the performance of each algorithm on problems with varying hardness levels. We examine how the regret curve of each algorithm increases on synthetic and real-world datasets in Section 6.2 and Section 6.3, respectively. |
| Researcher Affiliation | Academia | Yinglun Zhu Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706 yinglun@cs.wisc.edu Robert Nowak Department of Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 rdnowak@wisc.edu |
| Pseudocode | Yes | Algorithm 1: MOSS++ Input: Time horizon T and user-specified parameter β [1/2, 1]. Algorithm 2: MOSS Subroutine (SR) Input: Time horizon T and hardness level α. Algorithm 3: Parallel Input: Time horizon T and the optimal reward µ . |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for their algorithms (MOSS++, Parallel, emp MOSS++) is publicly available. |
| Open Datasets | Yes | We use a real-world dataset from the New Yorker Magazine Cartoon Caption Contest. The dataset of 1-3 star caption ratings/rewards for Contest 652 consists of n = 10025 captions. Available online at https://nextml.github.io/caption-contest-data. |
| Dataset Splits | No | The paper describes a multi-armed bandit problem, which involves sequential decision-making over a time horizon (T). This setting does not typically use traditional fixed training, validation, and test dataset splits like those found in supervised machine learning tasks, and no such splits are specified. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or specialized solvers). |
| Experiment Setup | Yes | We set the time horizon to T = 50000 and consider the total number of arms n = 20000. We choose β = 0.5 for MOSS++ and emp MOSS++ in all experiments. For this experiment, we generate best arms with expected reward 0.9 and sub-optimal arms with expected reward evenly distributed among {0.1, 0.2, 0.3, 0.4, 0.5}. All arms follow Bernoulli distribution. We set T = 105 and this results in a hardness level around α 0.43. |