On Regret with Multiple Best Arms

Authors: Yinglun Zhu, Robert Nowak

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results confirm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art. We conduct three experiments to compare our algorithms with baselines. In Section 6.1, we compare the performance of each algorithm on problems with varying hardness levels. We examine how the regret curve of each algorithm increases on synthetic and real-world datasets in Section 6.2 and Section 6.3, respectively.
Researcher Affiliation Academia Yinglun Zhu Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706 yinglun@cs.wisc.edu Robert Nowak Department of Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 rdnowak@wisc.edu
Pseudocode Yes Algorithm 1: MOSS++ Input: Time horizon T and user-specified parameter β [1/2, 1]. Algorithm 2: MOSS Subroutine (SR) Input: Time horizon T and hardness level α. Algorithm 3: Parallel Input: Time horizon T and the optimal reward µ .
Open Source Code No The paper does not provide any statement or link indicating that the source code for their algorithms (MOSS++, Parallel, emp MOSS++) is publicly available.
Open Datasets Yes We use a real-world dataset from the New Yorker Magazine Cartoon Caption Contest. The dataset of 1-3 star caption ratings/rewards for Contest 652 consists of n = 10025 captions. Available online at https://nextml.github.io/caption-contest-data.
Dataset Splits No The paper describes a multi-armed bandit problem, which involves sequential decision-making over a time horizon (T). This setting does not typically use traditional fixed training, validation, and test dataset splits like those found in supervised machine learning tasks, and no such splits are specified.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies No The paper does not list any specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or specialized solvers).
Experiment Setup Yes We set the time horizon to T = 50000 and consider the total number of arms n = 20000. We choose β = 0.5 for MOSS++ and emp MOSS++ in all experiments. For this experiment, we generate best arms with expected reward 0.9 and sub-optimal arms with expected reward evenly distributed among {0.1, 0.2, 0.3, 0.4, 0.5}. All arms follow Bernoulli distribution. We set T = 105 and this results in a hardness level around α 0.43.